A recent joint study between the Department of Forensic Science of the College of Criminal Justice at Sam Houston State University (Huntsville, Texas) and the Department of Criminal Justice of the School of Social Sciences at Ming Chuan University, (Taipei, Taiwan) developed artificial intelligence (AI) by transfer learning in a convolutional neural network (CNN), GoogLeNet, with the image classification AI fine-tuned to create intelligent classification systems to discriminate samples containing gasoline residues from burned substrates.
Interpreting chromatographic data in arson cases poses challenges to analysts due to the complex nature of the chemical compositions of ignitable liquids (IL) and the interferences from fire debris matrices. A recent joint study between the Department of Forensic Science of the College of Criminal Justice at Sam Houston State University (Huntsville, Texas) and the Department of Criminal Justice of the School of Social Sciences at Ming Chuan University, (Taipei, Taiwan) developed artificial intelligence (AI) by transfer learning in a convolutional neural network (CNN), GoogLeNet, with the image classification AI fine-tuned to create intelligent classification systems to discriminate samples containing gasoline residues from burned substrates. All ground truth (reference) samples were analyzed by headspace solid-phase microextraction (HS-SPME) coupled with a gas chromatograph and mass spectrometer (GC–MS). LCGC International spoke to Jorn (Chi Chung) Yu and Ting-Yu Huang, the authors of the paper resulting from this work, about the study.
Your paper (1) presents a study using gas chromatography and mass spectrometry (GC–MS) for analyzing for the use of flammable materials in arson cases. What sort of challenges do the nature of samples collected from fire scenes pose to fire debris analysis?
The field of fire debris analysis has been constantly evolving, making it a challenging task for forensic analysts. Currently, the gold standard for analyzing extracts from fire debris samples for the identification and classification of ILs is gas chromatography and mass spectrometry (GC–MS) following the ASTM E1618 standards. The interpretation of GC–MS data involves comparing peak patterns, including the retention time and abundance in the chromatograms, and the identification of specific mass spectra of major chemical components representing the target ILs (2). In the instrumental analysis and data interpretation procedures, many factors can present challenges to the analysts:
What inspired the addition of artificial intelligence, specifically a convolutional neural network (CNN), GoogLeNet for image analysis?
Due to the complex nature of fire debris samples, as addressed above, manual interpretation of GC–MS data can be time-consuming and highly relies on an analyst’s experience. The applications of deep learning, an artificial intelligence (AI)-based methodology, in different fields of science have been fast growing. The convolutional neural network (CNN) in deep learning utilizes mathematical algorithms to train a model for pattern recognition and classification. Previous works have demonstrated the broad applications and superior capabilities of CNNs for image classification tasks in agriculture and medical diagnosis (3,4). This inspired us to investigate the feasibility of a CNN to facilitate the data interpretation of ILs and fire debris samples.
Our idea was to convert the total ion chromatograms (TICs), the summation of the abundance of the entire mass spectral peaks in the same scan, to 2-dimensional images that could represent the characteristic features of the target analytes in ILs. Subsequently, the images were fed to train an existing CNN, GoogLeNet, to recognize those chemical profiles in the transformed images. GoogLeNet (5) was developed specifically for image classification tasks by Google. The development of the Inception module enables GoogLeNet to make the best use of its architecture. Therefore, a higher classification performance can be achieved without long training time, considerable training parameters, and high computational cost. Those advantages attracted us and made GoogLeNet a good candidate for our studies to test the hypothesis.
Briefly state your findings in this study.
In our work, intelligent classification models were developed through image transformation and transfer learning with GoogLeNet to classify GC–MS data. The work aimed to detect the presence of gasoline residues for forensic purposes. Three types of representations of GC–MS data, including TIC, heatmap, and extracted ion heatmap, were used to fine-tune the deep CNN, producing three AI models. Our experimental results showed that the three AI models achieved 100 ± 0% accuracy in discriminating neat gasoline samples from different weights of burned synthetic carpets. The AI models were further challenged with simulated fire debris samples, prepared by spiking different concentrations of gasoline residues collected from various sources onto burned carpets. Among all the models, the extracted ion heatmap model obtained the highest accuracy, 95.9 ± 0.4%, in detecting those trace amounts of gasoline residues when matrix interferences were present. Overall, intelligent classification models developed based on our proposed methodology offered promising performance in gasoline detection.
Do your findings correlate with what you had hypothesized?
We proposed visualizing GC–MS data into a heatmap to facilitate transfer learning and develop classification models to detect gasoline residues in fire debris. The high classification accuracy derived from our experiments indicated the successful applications of the intelligent framework in forensic fire debris analysis. Specifically, both heatmap and TIC provided characteristic features of gasoline chemical profiles for transfer learning with GoogLeNet. The learning capability, efficiency, and performance of the classification algorithm were not impacted by the limited size (< 400 data samples) and diversity (different sources, concentrations, and weights) of the data set in our tests. All those findings offered supporting evidence that corroborated our hypotheses.
Was there anything particularly unexpected that stands out from your perspective?
The first thing that captured our attention from the experimental results was the good generalization capability of the deep CNN. Because the neat gasoline samples were collected from different retailers and had varied concentrations, the pattern of the chromatograms might differ due to varying additives in the liquids and signal abundance recorded by the detector. However, GoogLeNet was able to reduce the dimensionality of the data and retain the essential features that were relevant to the target analytes in gasoline, offering improved training efficiency and performance. In our study, the AI models could correctly detect neat gasoline residues from five automotive fuel stations, even though the concentration was only 0.4 μg gasoline in a 20-mL headspace vial.
We were also happy to discover that the extractions of major ions in the mass spectra of common ILs when generating heatmaps helped the GoogLeNet achieve outstanding performance in detecting simulated fire debris samples that contained gasoline residues. This outcome indicated that the extracted ions procedure was beneficial for reducing the interferences from sample matrices, which is a common issue encountered in all types of forensic analysis. Compared with machine learning algorithms, such as k-nearest neighbors, discriminant analysis, support vector machines, and Naive Bayes, our proposed intelligent framework provided higher performance in detecting those challenging samples.
The transfer learning method also offered a highly efficient and cost-effective training option. The development of our AI models did not require a large amount of labeled data. It did not require powerful computational resources, and even a personal laptop with a CPU environment could do the job perfectly. Lastly, the production of the prediction results was accurate and rapid, which could accelerate the interpretation procedures. We think that combining the intelligent framework and any analytical techniques that feature rapid sampling, or analysis can streamline the examination procedure of the questioned samples. In our manuscript, the use of headspace solid phase microextraction (SPME) analysis coupled with the intelligent framework is an example of this idea.
Were there any limitations or challenges you encountered in your work?
Though deep CNNs have been widely used in many fields, one of the well-known drawbacks of the network is its degraded performance in predicting “untrained class data” (6), which are those types of data that are not the same as the types of data used in the training process. We also encountered this issue when testing the performance of our AI models. In our work, the simulated fire debris samples were considered untrained class types since only pure gasoline samples were seen and utilized to train the GoogLeNet. It was found that the superior performance of the AI models on detecting neat gasoline samples decreased when the models were challenged by simulated fire debris samples. Nevertheless, the implementation of the extraction of ions in heatmaps, as described previously, somehow minimized the impact of this effect. The extracted ion heatmap model could still achieve a low detection limit at 1.6 μg gasoline in a 20-mL headspace vial.
What best practices can you recommend in this type of analysis for both instrument parameters and data analysis?
The performance of the deep CNN relies on the data it is trained on. Therefore, the quality and reproducibility of the analytical signals need to be ensured since they affect the quality of the transformed images that act as the ground truths (i.e., labeled data) for the deep CNN to recognize the features from. Poor quality data might result in inaccurate prediction results and poor classification performance. Furthermore, the training progress of the model should also be monitored by using various performance measures, such as accuracy, precision, recall, or F1-score, to identify if overfitting occurs. The model's performance might be improved by optimizing the hyperparameters, including maximum epoch, learning rate, mini-batch size, etc. To evaluate the model’s performance, a different data set that is new and unseen to the model should be collected.
Are there any typical accelerants that are easier or harder to detect?
The American Society for Testing and Materials (ASTM) has standardized the identification of IL residues from fire debris samples by GC–MS. Most IL samples collected from the scenes, such as gasoline, diesel, kerosene, etc., can be categorized into eight classes based on the classification schemes in the standard (7). The National Center for Forensic Science at the University of Central Florida hosts a collection of IL samples both in physical database and digital database (GC-MS data) in support of fire debris analysis (8). Extra care must be taken when the IL samples have matrix interferences or are weathered or degraded by micromaterials. As mentioned previously, those factors might complicate or alter the pattern of the chromatograms, posing challenges to manual data interpretation. The same issue might occur in the AI models if those data are untrained class types, meaning they are not in the training distribution. The pattern change in the transformed images might make the ILs harder to detect and lead to inaccurate prediction results.
Can you please summarize the feedback that you have received from others regarding this work?
Since our study utilized image representations as the input to train the deep CNN, some researchers have questioned whether the image transformation procedure was too complex and challenging to perform. This procedure should not be an issue, given that we have created codes that could automatically and rapidly convert the GC–MS data into TICs and heatmaps in batch processing. Additionally, since GoogLeNet could efficiently extract relevant information from the transformed images, we did not need to implement any data pretreatment approaches, which reduced manual labor and simplified the proposed workflow. Regarding the type of image representations used for transfer learning with GoogLeNet, other people suggested that the extracted mass chromatograms for major ions characteristic of each compound type in ILs could be an option for visualizing GC–MS data. Those profiles are distinctive of specific classes of hydrocarbons, and therefore, we think they might be able to minimize interferences from extraneous matrices.
What are the next steps in this research and are you planning to be involved in improving this technology?
We will continually investigate the improvement of the AI models to enhance the accuracy and robustness of the detection of untrained class data. We hope the AI models achieve a lower detection limit for simulated fire debris samples. We would also like to develop a fully AI-powered system that involves automated instrumental and data analysis to streamline forensic analytical workflow with minimum human intervention. The system can be employed in real-time analysis of unknown samples in a laboratory and field setting. Moreover, based on the experimental results obtained from our work, it is expected that the intelligent framework can assist forensic analysis with reliability and efficiency. Therefore, we will also explore new applications of the intelligent framework in other types of forensic physical evidence, such as drugs and biological samples. Ultimately, foundation models should be developed and trained on big chromatographic and spectrometric data. This task will require considerable resources to complete in the future.
What are your thoughts on AI in general, and machine learning in particular for data analysis in chromatography and spectrometry?
Traditionally, most analytical chemistry tasks are resolved by qualitative and quantitative analysis. An intelligent system based on chemical information is often required for classification and source-tracing tasks. AI and machine learning algorithms are advanced tools capable of high-level pattern recognition and identification, automatically extracting relevant and essential features from spectra; they can also be trained to lend a hand in identifying and quantifying specific compounds in complex mixtures. The algorithms can also automate the process of peak detection and integration, which minimizes the need to manipulate data manually. They can also find the relationships between variables in data, leading to revolutionary new insights. AI can potentially correlate chromatographic and spectrometric data with chemical processes and activities in chromatography and spectrometry. In my broader research, AI and machine learning demonstrate lots of potential in accurate and insightful data processing and analysis—integrating AI and machine learning into existing analytical schemes and establishing a standardized protocol for data processing across different systems look promising.
References
1. Huang, T.Y.; Yu, J. C. C. Assessment of Artificial Intelligence to Detect Gasoline in Fire Debris Using HS-SPME-GC/MS and Transfer Learning. J. Forensic Sci. 2024, 69 (4), 1222–1234 DOI:10.1111/1556-4029.15550
2. American Society for Testing and Materials. ASTM Method E1618–19 Standard Test Method for Ignitable Liquid Residues in Extracts from Fire Debris Samples by Gas Chromatography-Mass Spectrometry. West Conshohocken, PA: ASTM International, 2022.
3. Joseph, D. S.; Pawar, P. M.; Pramanik, R. Intelligent Plant Disease Diagnosis Using Convolutional Neural Network: A Review. Multimedia Tools and Applications2022. 82 (14), 21415–21481. DOI:10.1007/s11042-022-14004-6
4. Abdou, M. A. Literature Review: Efficient Deep Neural Networks Techniques for Medical Image Analysis. Neural Computing and Applications.2022. 34 (8), 5791–5812. DOI:10.1007/s00521-022-06960-9
5. Szegedy C.; Liu W.; Jia Y.; Sermanet P.; Reed S.; Anguelov D.; Erhan D.; Vanhoucke V.; Rabinovich A. Going Deeper with Convolutions.2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). 2015, 1–9. DOI:10.1109/cvpr.2015.7298594
6. Lee, Y.; Chae, H. Identification of Untrained Class Data using Neuron Clusters. Neural Computing and Applications.202335 (15), 10801–10819. DOI:10.1007/s00521-023-08265-x
7. Baerncopf, J.; Hutches, K. A Review of Modern Challenges in Fire Debris Analysis. Forensic Sci. Int.2014, 244, e12–e20. DOI:10.1016/j.forsciint.2014.08.006
8. Ignitable Liquids Reference Collection/Substrate databases. National Center for Forensic Science, University of Central Florida website.https://ncfs.ucf.edu/ilrc-2 (accessed 2024-10-03).
2024 EAS Awardees Showcase Innovative Research in Analytical Science
November 20th 2024Scientists from the Massachusetts Institute of Technology, the University of Washington, and other leading institutions took the stage at the Eastern Analytical Symposium to accept awards and share insights into their research.
Inside the Laboratory: The Richardson Group at the University of South Carolina
November 20th 2024In this edition of “Inside the Laboratory,” Susan Richardson of the University of South Carolina discusses her laboratory’s work with using electron ionization and chemical ionization with gas chromatography–mass spectrometry (GC–MS) to detect DBPs in complex environmental matrices, and how her work advances environmental analysis.
AI and GenAI Applications to Help Optimize Purification and Yield of Antibodies From Plasma
October 31st 2024Deriving antibodies from plasma products involves several steps, typically starting from the collection of plasma and ending with the purification of the desired antibodies. These are: plasma collection; plasma pooling; fractionation; antibody purification; concentration and formulation; quality control; and packaging and storage. This process results in a purified antibody product that can be used for therapeutic purposes, diagnostic tests, or research. Each step is critical to ensure the safety, efficacy, and quality of the final product. Applications of AI/GenAI in many of these steps can significantly help in the optimization of purification and yield of the desired antibodies. Some specific use-cases are: selecting and optimizing plasma units for optimized plasma pooling; GenAI solution for enterprise search on internal knowledge portal; analysing and optimizing production batch profitability, inventory, yields; monitoring production batch key performance indicators for outlier identification; monitoring production equipment to predict maintenance events; and reducing quality control laboratory testing turnaround time.
Infographic: Be confidently audit ready, at any time and reduce failures in pharma QC testing
November 20th 2024Discover how you can simplify the audit preparation process with data integrity dashboards that provide transparency to key actions, and seamlessly track long-term trends and patterns, helping to prevent system suitability failures before they occur with waters_connect Data Intelligence software.
Critical Role of Oligonucleotides in Drug Development Highlighted at EAS Session
November 19th 2024A Monday session at the Eastern Analytical Symposium, sponsored by the Chinese American Chromatography Association, explored key challenges and solutions for achieving more sensitive oligonucleotide analysis.