Predicting Retention Indices in LC–HRMS to Improve Water Quality Analysis

News
Article

Ardiana Kajtazi discusses her research identifying organic micropollutants in water using liquid chromatography–high-resolution mass spectrometry (LC–HRMS). She highlights the standardized filtration approach her team has developed based on intersection principles, utilizing retention indices from two reversed-phase liquid chromatography (RPLC) columns.

Organic micropollutants are a group of chemical substances that can be persistent, bioaccumulative, or toxic, making them critical targets for water quality monitoring. Ardiana Kajtazi, a postdoctoral researcher at the Toxicological Center, University of Antwerp, in Antwerp, Belgium, spoke with LCGC International about her research identifying organic micropollutants in water using liquid chromatography–high-resolution mass spectrometry (LC–HRMS). She discusses the standardized filtration approach her team has developed based on intersection principles, utilizing retention indices from two reversed-phase liquid chromatography (RPLC) columns.

What micropollutants are you focusing on and why?

Our research focuses on a wide variety of organic micropollutants made up primarily of carbon, hydrogen, and oxygen (CxHyOz molecules) with molecular weights below 500 Da, such as parabens, phthalates, and bisphenols. These contaminants are particularly important because many are persistent, bioaccumulative, or can have endocrine-disrupting effects, making them key targets for water quality monitoring and risk assessment. Right now, our approach already covers a broad range of environmentally significant contaminants, but there is always room to expand. Future work should bring in compounds containing heteroatoms such as halogens or nitrogen, which are commonly found in pharmaceuticals, pesticides, flame retardants, and various industrial chemicals.

How do you currently approach the structural identification of micropollutants in water samples, and what limitations have you encountered?

The approach we take for identifying micropollutants in water depends on the type of analysis, whether it is targeted analysis (TA), suspect screening (SS), or non-targeted screening (NTS). In TA, if we have reference standards, identification is straightforward; we match retention times and tandem mass spectrometry (MS/MS) fragmentation patterns to known compounds, which gives us high confidence in the results. With SS, we are working with predefined databases of potential contaminants, relying on exact mass, isotope patterns, and predictive retention models. However, since we often do not have reference standardsfor the tentatively identified compounds, there is always some level of uncertainty in their identification. NTS is the most open-ended approach and lets us detect completely unknown compounds without any prior assumptions. Here, we rely on high-resolution mass spectrometry (HRMS) combined with advanced computational tools for structure elucidation. But this comes with its own set of challenges. The amount of data generated is large, and distinguishing real signals of compounds of interest from background noise from the matrix is not always straightforward. Many emerging contaminants do not have reference mass spectra, which increases the risk of false positives, and retention times can vary between instruments, making reproducibility more difficult. On top of that, when multiple structures share the same molecular formula, ranking the right candidate is not always easy. That being said, recent advances are making a big difference. Machine learning-based retention modeling, in silico fragmentation software, and open-source data processing tools are helping refine structural identification. However, there is still work to be done. Standardizing NTS workflows and improving validation strategies will be crucial for making these approaches more reproducible and reliable in the long run.

What is your perspective on using machine learning algorithms to improve the identification of pollutants in water?

Machine learning (ML) is becoming an increasingly used tool in pollutant identification, offering the ability to automate and refine structural elucidation workflows in ways that were previously unattainable. By integrating predictive retention models, structure-based filtration, and large-scale data analysis, ML can significantly enhance the accuracy and efficiency of NTS. One of its greatest advantages, in my opinion, is the ability to reduce reliance on extensive reference libraries by predicting retentionbehavior and structural properties, which is especially useful for emerging contaminants that lack experimental data. In our study, we implemented a standardized, intersection-based filtration approach using retention indices from two reversed-phase liquid chromatography columns (C18 and F5), making it the first method to utilize selectivity-based filtration in liquid chromatography (LC)–HRMS across different stationary phases (1). This strategy has already demonstrated a substantial reduction in false positives, but realistically speaking, there is still much room for advancement. The field is evolving rapidly, with an increasing number of predictive models being developed (many cited in our paper), and networks such as NORMAN contributing to improved standardization (2). Looking ahead, ML will play an even bigger role in automating pollutant identification, making workflows faster, more scalable, and more reproducible.

What was the benefit of using multiple stationary phases in your predictive model?

As mentioned earlier, our study introduces a standardized approach to pollutant identification by integrating retention prediction with an intersection-based filtration strategy (1). A key aspect of this approach is the use of two complementary reversed-phase liquid chromatography (RPLC) columns, C18 (octadecylsilica) and F5 (pentafluorophenylsilica), which significantly enhances the selectivity and accuracy of structural identification. The reason this works so well is that C18 primarily interacts through hydrophobic interactions, while F5 introduces additional polar and π-π interactions, leading to subtly different elution patterns. This dual-phase approach demonstrated several benefits. With the differences in retention behavior across both columns, we could eliminate a large portion of incorrect structural candidates in our external validation set, reducing false positives by over 70%. Our method is based on two separate predictive retention models, one for each stationary phase, which work independently to assess the plausibility of a given structure. Each model can either accept or reject a candidate structure based on its predicted retention index. The key advantage comes from the intersection-based filtration, where only structures that are accepted by both models are retained as likely candidates. This selective overlap reduces the number of erroneous structures, ensuring a much higher confidence in the identification process. It will be very interesting and exciting to see how this approach evolves. There is a lot of potential for implementing additional RPLC columns with diverse selectivity profiles, which could further improve filtration efficiency and make this strategy even more robust.

What are your thoughts on incorporating retention data and molecular descriptors into predictive models?

The predictive model we used in our study is based on a fundamental relationship between a dependent variable (retention data) and independent variables (molecular descriptors). Simply put, retention data represents how long a compound interacts with the chromatographic system, while molecular descriptors are numerical values that capture key structural and physicochemical properties of molecules, such as hydrophobicity, polarity, and molecular weight. While this approach is already proving to be effective, the models can be further improved, especially in refining molecular descriptors to better reflect real chromatographic conditions. One of the biggest challenges is that many descriptors are computed under generalized conditions that do not fully capture how compounds behave in an actual LC system. Factors such as mobile phase composition, solvent effects, and column chemistry all play a significant role in retention, but these aspects are often missing from currently used descriptors. In my opinion, developing new descriptors that incorporate solvent-solubility effects, ionization states, and stationary phase interactions will be essential for improving model accuracy and applicability across different chromatographic setups.

In your view, how important is it to filter erroneous structural possibilities when identifying unknown compounds?

When dealing with non-targeted screening, one of the biggest challenges is that we are trying to identify compounds in a vast, mostly unknown chemical space. There are millions of registered chemicals, but only a small fraction has been studied in detail, and even fewer have reference mass spectra available. When analyzing environmental samples, especially something as complex as wastewater, thousands of possible structures can be generated for a single molecular formula. Without proper filtration strategies, it becomes nearly impossible to determine which structures are real and which are misleading. This is especially relevant when looking at transformation products formed during wastewater treatment. Many pollutants do not just degrade, but also react with disinfectants, oxidants, and microbial processes, forming entirely new compounds. Some of these byproducts may be more toxic or persistent than the original pollutants. If we do not filter out erroneous structural candidates effectively, we might completely misinterpret which chemicals are present, leading to false conclusions about environmental risks and treatment efficiency.

How do you ensure the transferability of your methods across diverse HPLC–HRMS systems when analyzing pollutants?

Method transferability is always a challenge because retention times can vary between instruments due to differences in column conditions, mobile phase composition, and system parameters. In our case, we ensure transferability by using retention indices (RI) instead of raw retention times—an approach which provides a more standardized way to compare retention behavior across different setups. We developed our model using generic method conditions, meaning that, in principle, any user applying the same method should obtain comparable results. The use of RI ensures that even if slight variations occur, retention data remains reliable across different systems. We also tested this in our study, and in the majority of cases, the relative standard deviation (RSD%) was below 1%, demonstrating the robustness of this approach. These findings are detailed in our supplementary information for those interested (1).

How do you see this research being applied in future non-targeted screening approaches, and can it be extended to other biological matrices?

This model was developed strictly using reference standards and initially applied as a proof of concept to drinking water samples. The core idea was to refine the final stages of the identification process by narrowing down the many possible structures that share the same elemental composition. In theory, this approach can be applied to any matrix, as long as the compounds are detectable by HRMS and their elemental composition is known. Expanding to other matrices, such as biological fluids, would require adapting to matrix effects and potential interferences, but the core principle remains the same.

In fact, as part of my postdoctoral research at the Toxicological Center (UAntwerp), under the supervision of Professor A. Covaci, we are currently working on an exposomics project where an NTS approach combined with TA analyses will be used to assess chemical exposure in pregnant women. This longitudinal study involves analyzing blood and urine samples across different trimesters and correlating chemical exposures with birth outcomes. Here, we will also implement this model for compounds that fit within its scope, specifically CxHyOz -based contaminants, to assist structural identification in complex biological matrices.

References

(1) Kajtazi, A.; Kajtazi, M.; Barbetta, M. F. S.; et al. Prediction of Retention Indices in LC-HRMS for Enhanced Structural Identification of Organic Micropollutants in Water: Selectivity-Based Filtration. Anal. Chem. 2025, 97 (1), 65–74. DOI: 10.1021/acs.analchem.4c01784

(2) NORMAN. https://www.norman-network.net/ (accessed 2025-02-20).

Image courtesy of interviewee

Image courtesy of interviewee

Ardiana Kajtazi is a postdoctoral researcher at the Toxicological Center, University of Antwerp (Antwerp, Belgium), supervised by Professor Adrian Covaci. Her current research focuses on target analysis and suspect/non-target screening approaches for chemical exposure assessment in the human exposome. She obtained a PhD in chemistry from Ghent University (Ghent, Belgium), supervised by Professor F. Lynen, where she worked on retention predictive modeling for the analysis of unknown micropollutants in water and wastewater samples, as part of the Marie Skłodowska-Curie ITN project InnovEOX. Her academic background also includes an MRes in pharmaceutical analysis from Nottingham Trent University (Nottingham, UK) and a BSc in environmental and public health from the University of Rijeka (Rijeka, Croatia).

Related Content