Advanced science.  Applied technology.


Feature Identification using Spectral Hierarchies for Iterative Non-Targeted Analysis Grouping, 10-R6167

Principal Investigators
Keith Pickens
Kristin Favela
Inclusive Dates 
04/01/21 to 04/01/22


Non-Targeted Analysis (NTA) is the assay of all chemicals detected by an instrument without a pre-defined list of targeted chemicals. Advancements in hardware have enabled high-resolution mass spectrometric (MS) detection, increasing confidence in molecular identifications they provide, but also causing new challenges for efficient data management and processing. Difficulty or failure occurs for ~40% of features during two-dimensional gas chromatography (GCxGC) batch NTA data processing for two major reasons. First, the mass spectral complexity results in deconvolution errors for low-level features leading to missing peaks or time-consuming manual re-integration. Second, the samples are of sufficient chromatographic complexity that data exploration results in a significant rate of divergent identifications, leaving data too complex to be reduced in a reasonable timeframe.


The team overcomes this complexity by combining iterative processing of high-resolution GCxGC-MS data with machine learning (ML) to allow detection of low-level compounds otherwise missed by traditional peak finding algorithms. We leverage the information emergent from the batch to overcome the challenge of relying on peak finding and deconvolution for complex, high-resolution MS data. We first apply ML to automatically rank spectral signals by quality. Next, a representative signal is selected according to its quality at each chromatographic retention time in each sample. The high-resolution data is then exploited by identifying the mass spectral fingerprint of each high-quality molecular feature. This mass spectral fingerprint is leveraged in a second iteration of processing to extract quantitative information across the batch of samples by searching for specific ion signatures.


The team developed Highlight, a second-generation stand-alone software tool for automated processing of high-resolution GCxGC-MS data with NTA applications. Highlight enables end-to-end high-throughput analysis by incorporating signal quality review, pattern analysis, and low-level compound capture. Highlight lowers limits of detection back to hardware limits while increasing throughput of analysis fifty-fold. By automating tedious and time-consuming data processing tasks, the software allows analytical chemists to rapidly screen the chemical world around and within us.

Comparison of HighlightTM with manual curation and available software

Figure 1: Comparison of HighlightTM with manual curation and available software