Chemicals are ubiquitous in our modern world. They are in our food, clothes, tools, furniture, toys, cosmetics and medicines. Despite the usefulness of these products, many of the compounds they contain could have negative impacts on human health and the environment.
Toxic chemical exposures are associated with as much as 6% of the world’s disease burden — including chronic diseases and cancers as well as neurological and developmental disorders — and 8% of deaths. And these numbers could be growing.
Some chemicals occur naturally, such as table salt or water, while others are synthetic or man-made compounds used in everything from lubricants and cleaners to medicines and perfumes. Most of the chemicals we encounter every day are benign, while others — such as arsenic, lead and asbestos — can be toxic in high enough doses. The effects of most are unknown.
Trying to understand the chemical cocktail encountered today has led to the emerging field of exposomics, which seeks to understand how exposures from our environment, diet, lifestyle, etc. interact with our unique characteristics and affect our well-being. Heredity plays a part, but it is not the complete story. The largest difference is associated with varying exposures to an emerging array of chemicals in our world.
Traditionally, targeted analyses addressed only a small number of chemicals with known effects on health. To identify these, chemists analyze substances such as medicines, consumer products or environmental samples, looking for how much X, Y or Z they contain.
Using chemical analysis data and machine learning tools, SwRI scientists developed a software tool to help identify unknown chemical components found in everyday products, represented by colored points and peaks in this mass spectrometry data.
Recent advances in analytical technology have raised a different and more fundamental question: What is everything in this sample? The answer is frequently an astonishing array of chemicals, where hundreds or even thousands of different chemicals can occur in a single sample. All of these compounds — some known, some unidentified, some benign and some harmful — are part of the chemical world we live in.
Untargeted Techniques
For over 10 years, Southwest Research Institute has combined its expertise in analytical chemistry, machine learning and data science to innovate and advance the field of untargeted analysis. A targeted approach focuses on a predefined list of known compounds, leaving much of the chemical universe unexplored. In contrast, untargeted analysis can detect and identify unknown and unexpected chemicals within complex samples.
DETAIL
Since 2009, the EPA’s ExpoCast “Exposure Forecasting” project has developed data, tools and evaluation approaches to generate rapid and scientifically defensible exposure predictions for the full universe of existing and proposed commercial chemicals.
With project work ranging from deep sea to deep space, SwRI brings wide-ranging experience and expertise to solve challenging problems. In this case, SwRI chemists collaborated with computer scientists to take a deep dive into the complex world of chemicals, striving to develop tools to efficiently identify and characterize thousands of compounds present in a wide array of samples, from everyday consumer products to environmental sources.
Today’s powerful, high-resolution chromatography and mass spectrometry instruments are capable of interrogating thousands of chemicals in a single sample. For instance, comprehensive two-dimensional gas chromatography, commonly known as GCxGC, separates a sample using two chromatographic columns with different properties, run in tandem. Having two dimensions of separation means that GCxGC provides greater separation capacity than conventional one-dimensional GC, resolving complex mixtures and revealing minor components that would otherwise be obscured by major constituents.
SwRI chemists prepare samples from a range of consumer products for automated analysis with high-resolution chromatography and mass spectrometry equipment. Today’s precision technology identifies and quantifies the fingerprint of potentially hundreds of chemicals in a single sample.
Understanding complex data is a highly specialized and time consuming endeavor. Without high-throughput processing solutions, the journey in untargeted analysis began with chemists manually reviewing the analytical data from hundreds of samples, chemical-by-chemical. And this tedious, laborious effort ultimately paid off. SwRI staff collaborated with the Environmental Protection Agency (EPA) to publish a pair of award-winning papers in the journal Environmental Science and Technology that discussed the detection and identification of chemicals in consumer products.
These highly cited papers paved the way for continuing research to understand the composition of consumer products and the risks environmental exposure may pose to human health. Crucially, over 80% of the chemicals found were not listed on product ingredient lists, highlighting significant gaps in consumer product transparency.
SwRI chemists and computer scientists collaborated to develop Highlight, a software tool that uses machine learning algorithms to process untargeted chemical analyses in 2.6% of the amount of time needed to process them manually. That means a chemist can accurately assess the compounds in a sample in just five minutes versus three hours.
Data produced in this effort were fed into the EPA’s ExpoCast program, a publicly available tool used to understand and model environmental exposure. Estimating exposure is critical to prioritize and assess chemicals based on the risk they pose to public health and the environment.
DETAIL
Cheminformatics uses computational tools to manage, analyze and visualize chemical data. It integrates chemistry with computer science to efficiently explore, understand, predict and design chemical compounds.
These groundbreaking early studies exposed critical limitations inherent in untargeted chemical analysis: throughput and identification confidence. The time needed to remove low-quality signal artifacts limits the feasible number of full investigations possible. Reducing the manual review time for analytical data would allow more samples to be processed.
The SwRI team tackled these problems by creating an artificial intelligence (AI) platform capable of autonomously performing the tedious data review and identification process. Using internal research funding and a gold-standard, extensively labeled dataset, SwRI’s Artificial Intelligence for Mass Spectrometry (AIMS) group created its first successful program, called Floodlight™.
This machine-learning-driven tool automates the signal quality review of mass spectrometry data in a high-throughput manner. The “secret sauce” in this neural-network-based solution is the copious amounts of processed data SwRI had available to “teach” the tool.
This novel software tool efficiently discovers the vast numbers of chemical components — previously known and unknown — present in the food, air, drugs and products in use every day. This machine-learning-based tool integrates algorithms with analytical chemistry software to provide deep analysis of data from a wide range of analytical instruments. The groundbreaking cheminformatics platform received an R&D 100 award, recognizing it as one of the 100 top technology developments in 2021.
SwRI then combined and expanded Floodlight with a companion program, Searchlight™, to create Highlight™, the comprehensive program in use today. While Searchlight and Floodlight were designed for low-resolution mass spectrometry data, Highlight is capable of processing both low- and high-resolution data, including data from other types of analytical equipment. This capability supports higher accuracy measurements and better identification of chemical structures. Low-resolution data rounds to the nearest whole number. Highlight’s higher resolution allows researchers to decipher the precise molecular formula of an unknown chemical.
Highlight is a powerful platform capable of high-throughput processing of complex mass spectrometry data, demonstrating the ability to thoroughly interrogate consumer products in just 2.6% of the time needed to process the same samples manually. That means the automated Highlight tool allows chemists to analyze samples in a day instead of the weeks or months required for manual processing. The innovative process also accurately identifies chemicals present in various substances. With this functionality, Highlight effectively opens the door to understanding vast amounts of data previously too complex and extensive to completely interrogate, and its high throughput frees up resources to screen many more samples. For more data about Highlight, see Automating Untargeted Analysis.
SwRI conducted chemical analyses to understand the composition of consumer products and the risks environmental exposure may pose to human health. Most of the chemicals found were not listed on product ingredient lists, highlighting significant gaps in consumer product transparency.
The team then unleashed Highlight to retroactively process over four years’ worth of data in a single batch. This provides a better understanding of the chemicals a product is most likely to emit, which makes them of particular interest from an environmental exposure standpoint. With powerful programs like Highlight coming online in the field of untargeted analysis, scientists are keenly interested in this type of retroactive analysis to uncover previously undetectable patterns across huge datasets.
Broad Benefits
This work over the past 10 years on consumer product analysis has not only expanded the boundaries of chemical knowledge but also provided tools and methods that benefit both the scientific community and society at large. One critical insight — only possible from the accumulation of all these untargeted datasets — was finding chemical patterns linked to categories of consumer products. This provides a way to broadly link consumer habits with exposures and, ultimately, health outcomes.
DETAIL
Gas and liquid chromatography/ mass spectrometry equipment identifies and quantifies substances in a sample. Once compounds are separated by chromatography, the mass spectrometer defines a unique “fingerprint” for each compound.
SwRI’s AIMS group is already making strides in this direction, conducting bioanalytical studies critical to the ongoing advancement of environmental exposure research. These efforts include understanding what environmental chemicals are present in human serum, as well as modeling metabolic pathways. Ultimately, this understanding could allow virtual epidemiology, predicting an individual’s exposure based on a survey of products used and other environmental factors.
Highlight addresses the challenges associated with complex data analysis, providing advanced features in chemical data analyses for high-throughput screening, untargeted and targeted analyses, pattern matching and signal interpretation. Moving forward, the SwRI team is committed to exploring new frontiers in untargeted analysis that will contribute to a deeper understanding of the chemical world around us, with far-reaching implications for public health and safety.
ABOUT THE AUTHORS
Dr. Kristin Favela is a staff scientist in the Chemistry and Chemical Engineering Division, conducting research in forensics, environmental chemistry, homeland security and bioanalytical chemistry. She has developed and validated numerous mass spectrometry methodologies. Michael Hartnett is a lead computer scientist in the Intelligent Systems Division with a background in machine learning, data management and informatics. He has developed over a dozen data-driven decision-support tools across domains including energy, materials, medicine and traffic management.
Questions about this story or Chemical Analysis Services? Contact Dr. Kristin Favela at +1 210 522 4209.
Acknowledgements: This work would not have been possible without our many collaborations, including the EPA and other nonprofit and commercial collaborators. We acknowledge SwRI’s AIMS team members who have made critical contributions to this scientific body of work, including Joe Brewer (retired), Hamed Edrisi (retired), Abe Garza, John Gomez, Chris Gonzales, Chris Gourley, Prativa Hartnett, Jake Janssen, Robert Martinez, Christina Menn, Qingchu Peng, Keith Pickens (retired), Shraddha Quarderer, Lorraine Scheller, Heath Spidle, David Vickers (retired), William Watson, Steve Westbrook (retired), Bill Williamson and Alice Yau.