Improved Machine Learning Workflow for Predicting Oligonucleotide Separation

News
Article

A research group has developed an improved workflow for constructing machine learning (ML) models in oligonucleotide separation.

A team of scientists from the Department of Engineering and Chemical Sciences and the Department of Mathematics and Computer Science at Karlstad University, Sweden, has developed an improved workflow for constructing machine learning (ML) models to predict retention times and peak widths in oligonucleotide separation. Their work was published in the Journal of Chromatography A (1).

Illustration of IT roadmap modern technology and innovative processes, networking and big data: © Johannes - stock.adobe.com

Illustration of IT roadmap modern technology and innovative processes, networking and big data: © Johannes - stock.adobe.com

Oligonucleotides are short nucleic acid molecules used in therapeutic applications; they present a unique challenge in chromatography because of their complex structures. Any analytical method must be capable of separating, quantifying, and characterizing oligonucleotides and their potential impurities, which can arise from the multistep manufacturing process (2). The goal of this research was to create an ML-driven system that could accurately predict retention times and peak widths from large datasets, removing the need for time-consuming manual analysis.

Using a combination of ML techniques, the researchers built a systematic workflow capable of handling extensive datasets. They analyzed oligonucleotide forms, ranging from native to fully phosphorothioated structures, using three different gradient slopes. These oligonucleotides were separated on a C18 chromatographic system using tributylaminium ion-pair reagents. The study generated retention time data for approximately 900 sequences per gradient.

To process the large amount of data, the team implemented a semi-automated rule-based approach for retention time determination, peak decomposition and width assessment, signal-to-noise ratio, and skewness analysis. The workflow also incorporated probability density functions (PDFs) to fit elution profiles, with an F-test used for PDF selection. Coeluting peaks were addressed using a multiple Gaussian PDF approach.

The encoded sequence data was modeled using multiple ML algorithms, including support vector regression (SVR); gradient boosting (GB); random forest (RF); and decision tree (DT). The results indicated that GB and SVR were the most effective models for retention predictions, demonstrating accuracy in predicting retention times. While RF and DT models performed well in terms of speed, they showed limited generalization capabilities.

The ML models encountered larger prediction errors for shallower gradient slopes and lower predictability for P=O sequences. The authors suggested that signal intensity and sequence heterogeneity contributed to these errors. Future improvements in signal-to-noise ratios, such as incorporating mass spectrometry in selected ion monitoring mode, could enhance predictability.

By using these ML models, scientists can now predict chromatograms for various gradient slopes, allowing for the simulation of impurity peak resolution across different experimental conditions. This could lead to more efficient drug development processes, especially in the production of therapeutic oligonucleotides. The ability to anticipate peak behaviors before running actual experiments could significantly reduce costs and improve analytical accuracy in pharmaceutical research. This approach also enables the prediction of resolution between critical solutes. As ever, the researchers acknowledge that the models are not without their limitations. Caution should be advised when interpreting separation performance, particularly resolution, as the main challenge lies in accurately predicting peak width (1).

References

(1) Samuelsson, J.; Enmark, M.; Szabados, G.; et al. Improved Workflow for Constructing Machine Learning Models: Predicting Retention Times and Peak Widths in Oligonucleotide Separation. J. Chrom A 2025, 1747, 465746. DOI: 10.1016/j.chroma.2025.465746

(2) Fornstedt, T.; Enmark, M. Separation of Therapeutic Oligonucleotides Using Ion-pair Reversed-phase Chromatography Based on Fundamental Separation Science. J. Chrom. Open 2023, 3, 100079. DOI: 10.1016/j.jcoa.2023.100079

Related Content