Investigation into Self-Monitoring and Self-Healing (SMASH) Software Processes, 10-R9628

Printer Friendly Version

Principal Investigators
Lynne Randolph
Meredith Moczygemba
Adam Clauss

Inclusive Dates:  04/01/06 – Current

Background - During the last few years, various requests for proposals (RFPs) received by the Institute's Intelligent Transportation System (ITS) Department have contained numerous requirements related to system recovery and redundancy. These requirements have included the ability to rollover processes to different machines, provide database recovery, and allow for system clustering to be used. Additionally, many RFPs are requesting burn-in periods for software followed by maintenance contracts. With the more complex ITS systems that are being developed, the need for more intelligent monitoring of these processes is apparent. While we have developed tools for externally monitoring these processes, methods of internal monitoring have not been advanced.

Approach - This research program's intent is to determine what components of a software system are most prone to problems. From experience, the team is aware of issues that can occur with sockets, threads, and timers, but we will work to identify additional components that may be problematic by interviewing a diverse group of developers. Then, further investigation will be performed to determine what problems frequently occur and what can be done for repair when the problem is encountered. The program will create a design for a framework that can incorporate into a software process the ability to monitor the identified components. When problems occur, the process will then initiate a repair on the affected component. The framework will also allow custom components to be monitored. After design is complete, a library of tools will be developed that contain components that are monitored and a process monitor.

Accomplishments - The component investigation phase of the program is complete. Although many developers reported issues with sockets and threads, no other component was mentioned as frequently failing. As the system will allow custom components to be created, the base framework will contain threads, sockets, and timers. Some investigation has been done on repair methods, but the decision was made to defer this to the development portion of each component. The initial framework design has been completed in a language-independent method. As the initial library will be developed in .NET, a C# specific design was also generated. The language-independent design will be updated when a proof of design has been accomplished. The team is currently working on implementing the library of tools including a monitorable socket, thread, and timer and the process monitor class. For practicality, the team is making the external interface compatible with the work on external monitoring done on a previous internal research project.

2006 Program Home