Application of Data Mining Techniques to Health Information Exchange Failure Prediction, 10-R8112

Printer Friendly Version

Principal Investigator
Ted Wilmes

Inclusive Dates:  11/02/09 – 03/02/10

Background - As the United States moves further along the path to electronic medical records, new methods of exchanging and storing these records have emerged. A variety of organizations of different sizes are connecting to each other using health information exchanges (HIE). This research has focused on failure prediction in the HIE. A failure in this case means the HIE enters a state where it is either totally inaccessible or key portions are not working, which, in turn, are disrupting normal operations. Disruptions can lead to medical personnel being unable to access critical health information from the HIE when it is needed. This could cause a minor delay or, if timing is critical, affect the quality of care that a patient receives.

Approach - The objective of this project was to use existing open source tools to develop a prototype HIE failure prediction package. This package consists of a system monitoring and data mining portion. In a real-world scenario, a tool of this sort could act as an early warning system for administrative personnel and allow them to fix any issues that may eventually create a serious failure.

First, an open source HIE was set up. Scripts were written to induce realistic user loads on the HIE software and eventually produce failure situations. Infrastructure monitoring software was run as these load scripts were executed. This monitoring data was stored in a database for later analysis. Following script execution, a number of data mining tools were used to analyze the test data and build candidate failure prediction models. Models were built to predict both continuous values (time to failure) and binomial values (current system state is failed or not failed). These models were evaluated by running them against load test data that the models had not been exposed to during the training phase. This produced precision, recall and other measures of model quality that could be used for comparison and model evaluation.

Accomplishments - This project has produced two major findings: data mining techniques can be successfully applied to the problem of HIE failure prediction, and the classification based algorithms produced more accurate results than the regression methods. It was found that regression methods became more accurate as the systems neared the failure event. In light of this, it may be possible to use classification methods to perform a broad classification of failure or no failure and then fine tune the time to failure estimate using the regression-based methods. Current software monitoring technologies do not provide customers with any sort of failure prediction component. This research has demonstrated that data mining techniques are suited for this type of work and have their place alongside the current standard solutions.

2010 Program Home