2011 IR&D Annual Report

Using Extreme Value Theory to Eliminate Binary Thresholds in Anomaly Detection, 10-R8209

Principal Investigator
Sandra G. Dykes

Inclusive Dates:  12/22/10 – 07/06/11

Background — Anomaly detection (AD) algorithms are central to machine learning and data mining, with application to a wide range of areas such as malware detection, image processing, fraud detection and fault analysis. Although AD is potentially powerful, current methods are prone to high false positives, sensitive to training data, inconsistent over different data sets and provide little insight into results. One root cause of these problems is the use of binary thresholds. A binary threshold creates a discontinuity in the detection function, regardless of whether the detection method uses a statistical, classification or clustering algorithm. Consider two data points that are approximately equal, x1 and x2, with x1 slightly above the threshold and x2 slightly under it. Despite this small difference, x1 would be considered an anomaly and x2 would not. A binary threshold therefore couples false positive and false negative error rates — shifting the threshold reduces one at the expense of the other.

Approach — SwRI researchers introduced a new approach to anomaly detection called extreme value theory-anomaly detection (EVT-AD) with two novel concepts:

  • Applies extreme value theory to replace binary thresholds with continuous scores.

  • Constructs mathematical functions of behavior patterns from natural-language descriptions of the model.

The SwRI approach of applying formal extreme value theorems is unique. This method enables a more accurate model of the tail of the distribution where anomalies reside. Our second innovation is a method for constructing mathematical functions to represent behavior patterns. Behavior functions are constructed by mapping descriptive natural language terms onto mathematical variables and operators, where variables are the extreme value scores. This approach provides insight into why an entity was detected. Insight into results is critical because it provides contextual and supplemental information about significant underlying events.

SwRI researchers developed EVT-AD for use in insider threats and credit card fraud detection; however, it can be applied to a wide range of problems. Insider threat and fraud detection are particularly difficult because there may be no distinctive indicators. For insiders, the majority of their activities may be legitimate and normal; that is, there are no indicators for any single event. Instead, it is the pattern of activities that is unusual. Similarly, for credit card accounts, there may be no indicator of fraud for a single transaction. Fraud is detected by looking for differences in historic patterns or from average behaviors of a peer group.

Accomplishments — Phase I of this project developed the basics of EVT-AD, implemented a prototype version in software, and used simulation studies to evaluate its detection performance compared to a standard statistical anomaly detection method used in commercial products. Experiments in Phase I focused on insider threat models. In Phase II, we applied EVT-AD to credit card fraud detection with feature data generated to match parameters of known credit card usage. Experimental results showed that EVT-AD far outperformed the binary-threshold approach for both insider threat and fraud detection, providing the same detection rate with substantially fewer false positive errors. Results suggest that EVT-AD offers a significant contribution in the field of anomaly detection.

Benefiting government, industry and the public through innovative science and technology
Southwest Research Institute® (SwRI®), headquartered in San Antonio, Texas, is a multidisciplinary, independent, nonprofit, applied engineering and physical sciences research and development organization with 11 technical divisions.
07/05/12