Using Machine Learning to Aid in the Detection of Water Pollutants

News
Article

A machine-learning tool to enhance the detection of small organic pollutants in water has been developed and tested.

In a study aimed at tackling the global water quality crisis, researchers from Ghent University, the University of Zagreb, the University of São Paulo, and Dow Benelux have developed a machine-learning algorithm to improve the identification of small organic pollutants in water (1). This approach offers a cost-effective, eco-friendly solution to a problem that is particularly acute in developing countries, where access to safe drinking water is often compromised by persistent organic pollutants (POPs) (2,3). The research was published in the journal Analytical Chemistry.

Pouring fresh water into a glass © stokkete - stock.adobe.com

Pouring fresh water into a glass © stokkete - stock.adobe.com

The proliferation of organic pollutants—stemming from industrial, agricultural, and domestic activities—poses significant challenges because of their persistence, toxicity, and potential for bioaccumulation. The current methods for detecting these pollutants, such as high performance liquid chromatography coupled with high-resolution mass spectrometry (HPLC–HRMS), are not without their limitations (1). Identifying pollutants based on their elemental composition is complicated by the vast number of potential structures, making structural identification a complex task, especially when authentic standards are unavailable.

To address these challenges, the research team focused on developing a machine-learning algorithm capable of supporting the structural elucidation of small organic molecules of carbon, oxygen, and hydrogen—specifically those weighing less than 500 Da. By comparing experimental and predicted retention times, the algorithm takes the data from two types of reversed-phase stationary phases—octadecylsilica (C18) and pentafluorphenylsilica (F5)—with differing selectivities. This approach allows for the removal of erroneous structures, enhancing the accuracy of pollutant identification.

An important element of the study is the translation of retention times into retention indices (RI), which ensures the algorithm's applicability and transferability across various HPLC–HRMS systems. By utilizing retention data and molecular descriptors, the predictive algorithm was able to predict retention indices with a high amount of accuracy.

The study utilized a comprehensive data set comprising 100 training compounds and 16 external test compounds to develop two Multiple Linear Regression (MLR), MLR-C18 and MLR-F5 models, employing the 16 most influential molecular descriptors out of a pool of 5666 screened descriptors. For the C18 stationary phase, the Multiple Linear Regression (MLR) model demonstrated good performance with an R² of 0.97, a root mean square error (RMSE) of 36, and a mean absolute error (MAE) of 26. The F5 phase, while slightly less precise, still exhibited good results, with an R² of 0.96, an RMSE of 44, and an MAE of 34. The intersection-based filtration method, allowing for a margin of error within ±1.5σ, successfully eliminated over 70% of impossible structures for a given elemental composition. This significant reduction in potential structures enhances the speed and accuracy of pollutant identification.

The practical application of this model was demonstrated through the analysis of a drinking water sample, showcasing the tool's potential in real-world scenarios (1). By facilitating faster and more accurate structural identification of unknown organic micropollutants, this machine-learning tool offers a promising aid to a critical environmental challenge and could help to pave the way for more sustainable practices in monitoring and managing water resources. As the global demand for clean water continues to grow, such technologies are vital in ensuring safe and reliable water supplies, particularly in regions where traditional methods are either too costly or ineffective.

References

(1) Kajtazi, A.; Kajtazi, M.; Barbetta, M. F. S.; et al. Prediction of Retention Indices in LC-HRMS for Enhanced Structural Identification of Organic Micropollutants in Water: Selectivity-Based Filtration. Anal. Chem. 2025, 97 (1), 65–74. DOI: 10.1021/acs.analchem.4c01784

(2) Stockholm Convention on Persistent Organic Pollutants (POPs) https://chm.pops.int/default.aspx (accessed 2025-01-15).

(3) EPA, Persistent Organic Pollutants: A Global Issue, A Global Response (2009).

Recent Videos
Toby Astill | Image Credit: © Thermo Fisher Scientific
Related Content