Investigation into Process Management Technologies to Improve
System Monitoring and Repair, 10-9554

Printer Friendly Version

Principal Investigators
Roger L. Strain

Associate Investigators
John S. Brisco
Steven W. Dellenback, Ph.D., PMP

Inclusive Dates:  07/01/05 - Current

Background - Software systems are growing larger and more complex, characterized by numerous independent processes running simultaneously. With this trend, there is greater potential for problems or errors occurring in some part of the integrated system. Such problems often cascade, resulting in errors in other processes. To help manage these processes, programs in SwRI's Intelligent Transportation Systems (ITS) Department  have developed a master process. This master process can start, stop, and report the current status of the other processes comprising the system. However, the tools it provides are fairly simplistic. It only reports that there is or is not an error without providing any details on the error itself. The only method to correct an identified error is to restart a process. As SwRI deploys complex software systems in more locations and our clients are depending on the Institute to provide maintenance and support, it is becoming more important to monitor the behavior of those systems in a more detailed fashion and to effect repairs remotely.

Approach - This research program aims to gain a finer level of control over the many processes that make up a complex software system. This involves conducting research into the most useful status values that processes could report and the most useful control tools that could be made available to help address problems as they arise. For instance, consider two processes that must communicate with each other to perform their tasks. If the communication link closes in such a way that one process sees the link as broken but the other does not, the first process can report the lost communication to the master process. With sufficient intelligence, the master process could instruct the second process to restart itself and reestablish the communication link. Because future processes will have different features, the protocol used to allow processes to report status and issue commands will be defined in a generic way. This will allow processes to define and report whatever values are most relevant to them. A proof of concept prototype will be developed to demonstrate the usefulness of the system to clients who may not otherwise understand the need for this technology.

Accomplishments - The first phase of the project is currently underway. Input is being sought from SwRI staff both within and outside the ITS realm regarding those features that would be most useful to developers and maintenance personnel. The input gathered in this phase of the project will shape the direction of the latter phases, including design and prototyping.

2005 Program Home