Investigation into Process Management Technologies to Improve System Monitoring and Repair, 10-R9554

Printer Friendly Version

Principal Investigators
Roger L. Strain
John S. Brisco
Steven W. Dellenback, Ph.D., PMP

Inclusive Dates:  07/01/05 – 07/01/06

Background - As software systems grow larger and more complex, with numerous independent processes running simultaneously, there is greater potential for problems and errors in some part of the complete system. This greater potential poses an increased risk that one problem will cascade and begin causing errors in other processes. As the Institute deploys complex software systems in more locations and in other states, it is becoming increasingly important to be able to monitor the behavior of those systems and to affect repairs remotely.

Approach - This research program aims to gain a finer level of control over the many processes that make up a software system. This effort involves investigating the most useful status values that processes should report, and the most useful control tools they should make available to help address problems as they arise. Because future processes will have different features, the protocol used to allow processes to report status and controls will be defined in a generic way. Additionally, a proof of concept prototype is being developed to demonstrate the usefulness of the system to clients who may not otherwise understand the need for this technology.

Accomplishments - After soliciting input from other developers at SwRI, the project team created a more detailed concept of the advanced control systems originally envisioned for the project. These concepts were distilled into requirements, protocols, and eventually a prototype that showed proof of concept of the ideas behind the effort. Some additional areas of interest were identified, and initial investigations were begun, most notably in the area of automated system responses. A recurring idea when discussing this project was the ability to have a system attempt to recognize and repair well-known problems. To facilitate this methodology, an additional module was added to the conceptual design of the system that would monitor status reports from processes and compare incoming status against a predefined list of trigger signals. If a new status value matched one of these triggers, a predefined response, such as restarting a process or resetting a communication channel, would be invoked. Work on this project has already begun to be used in other internal research projects as well as client efforts. As software products continue to improve and evolve, the project team hopes to see the technology developed under this effort find its way into deployed systems.

2006 Program Home