Investigation into a Principal Components Methodology for Predicting Disease in Case-Based Decision Support, 10-R9757

Printer Friendly Version

Principal Investigators
Micah A. Spears
David A. Tong

Inclusive Dates:  10/01/07 – 04/01/09

Background - Medical diagnosis is a process where a medical provider forms a differential diagnosis using the information garnered from a conversation with the patient, physical examination, and laboratory test results. Because of inherent time constraints and the emerging availability of electronic medical record (EMR) data, health care providers are readily employing clinical decision support systems (DSS) to assist them in the diagnostic process. Research has shown leading diagnostic systems to be sufficient medical reference systems (i.e., electronic manuals); yet, in practice, these systems fail to provide an informed differential diagnosis in a timely manner. Modern differential diagnostic systems are ineffective at accurately diagnosing disease due to the inability to efficiently process "noisy" data. During the diagnostic process, the information shared between the diagnostic indicators represents a source of statistical "noise" (i.e., covariance) that can significantly reduce the accuracy of the regression model and resulting differential diagnosis.

Approach - SwRI researchers developed a novel medical software algorithm capable of automating the interpretation, simplification, and noise reduction for diagnostic data routinely analyzed by a clinician, thus requiring minimal human intervention to produce an accurate differential diagnosis. A custom software prototype, Túxn ("tie-kee"), was developed to provide a testing and demonstration vehicle for this approach. To determine the accuracy of the Centers for Disease Control and Prevention (CDC) dataset, an independent clinical review of a statistically significant sample was performed by two clinicians from The University of Texas Health Science Center at San Antonio.

The specific research objectives were to:

  • Develop a relational knowledge base capable of analyzing a large set of diagnostic records (2+ million).
  • Develop a principal components algorithm to automate the removal of predictive "noise."
  • Develop a multivariate regression algorithm to compute the diagnosis odds.
  • Develop an explanatory model to communicate the rationale for a predicted diagnosis.
  • Evaluate the diagnostic efficacy of the proposed diagnostic algorithm against two known algorithms.

Accomplishments - The project has met all of its research objectives. The researchers successfully prototyped a rich, visual DSS developed using the Java® programming language, R statistical package, and MySQL® database management system. The software prototype assists a medical provider in rapidly generating an evidence-based differential diagnosis, identifying a medical sub-specialty for referral and qualifying a patient for hospital admission.

Based on the clinical review results, the average percentage of supported "gold standard" diagnoses in the 80K record CDC dataset used for this research is approximately 75 percent (range of 55 to 95 percent). The novel differential diagnosis algorithm achieved an average diagnostic accuracy of 65.8 percent, resulting in an error corrected accuracy of 87.6 percent. The evaluation results strongly support the application of this linear subspace reduction algorithm to new and existing clinical DSSs to help reduce diagnostic error, improve patient care and minimize misdiagnosis costs in an ambulatory setting. As a direct result of this project, SwRI has the capability to demonstrate medical diagnostic technology and pursue external projects in clinical decision support, telemedicine, medical training, and biosurveillance.

Figure 1. Diagnostic Process


Figure 2. Software Prototype - Differential Diagnosis

2009 Program Home