Improved Machine Learning Workflow for Predicting Oligonucleotide Separation

March 14, 2025

News

Article

A research group has developed an improved workflow for constructing machine learning (ML) models in oligonucleotide separation.

A team of scientists from the Department of Engineering and Chemical Sciences and the Department of Mathematics and Computer Science at Karlstad University, Sweden, has developed an improved workflow for constructing machine learning (ML) models to predict retention times and peak widths in oligonucleotide separation. Their work was published in the Journal of Chromatography A (1).

Illustration of IT roadmap modern technology and innovative processes, networking and big data: © Johannes - stock.adobe.com

Oligonucleotides are short nucleic acid molecules used in therapeutic applications; they present a unique challenge in chromatography because of their complex structures. Any analytical method must be capable of separating, quantifying, and characterizing oligonucleotides and their potential impurities, which can arise from the multistep manufacturing process (2). The goal of this research was to create an ML-driven system that could accurately predict retention times and peak widths from large datasets, removing the need for time-consuming manual analysis.

Using a combination of ML techniques, the researchers built a systematic workflow capable of handling extensive datasets. They analyzed oligonucleotide forms, ranging from native to fully phosphorothioated structures, using three different gradient slopes. These oligonucleotides were separated on a C18 chromatographic system using tributylaminium ion-pair reagents. The study generated retention time data for approximately 900 sequences per gradient.

To process the large amount of data, the team implemented a semi-automated rule-based approach for retention time determination, peak decomposition and width assessment, signal-to-noise ratio, and skewness analysis. The workflow also incorporated probability density functions (PDFs) to fit elution profiles, with an F-test used for PDF selection. Coeluting peaks were addressed using a multiple Gaussian PDF approach.

The encoded sequence data was modeled using multiple ML algorithms, including support vector regression (SVR); gradient boosting (GB); random forest (RF); and decision tree (DT). The results indicated that GB and SVR were the most effective models for retention predictions, demonstrating accuracy in predicting retention times. While RF and DT models performed well in terms of speed, they showed limited generalization capabilities.

The ML models encountered larger prediction errors for shallower gradient slopes and lower predictability for P=O sequences. The authors suggested that signal intensity and sequence heterogeneity contributed to these errors. Future improvements in signal-to-noise ratios, such as incorporating mass spectrometry in selected ion monitoring mode, could enhance predictability.

By using these ML models, scientists can now predict chromatograms for various gradient slopes, allowing for the simulation of impurity peak resolution across different experimental conditions. This could lead to more efficient drug development processes, especially in the production of therapeutic oligonucleotides. The ability to anticipate peak behaviors before running actual experiments could significantly reduce costs and improve analytical accuracy in pharmaceutical research. This approach also enables the prediction of resolution between critical solutes. As ever, the researchers acknowledge that the models are not without their limitations. Caution should be advised when interpreting separation performance, particularly resolution, as the main challenge lies in accurately predicting peak width (1).

References

(1) Samuelsson, J.; Enmark, M.; Szabados, G.; et al. Improved Workflow for Constructing Machine Learning Models: Predicting Retention Times and Peak Widths in Oligonucleotide Separation. J. Chrom A 2025, 1747, 465746. DOI: 10.1016/j.chroma.2025.465746

(2) Fornstedt, T.; Enmark, M. Separation of Therapeutic Oligonucleotides Using Ion-pair Reversed-phase Chromatography Based on Fundamental Separation Science. J. Chrom. Open 2023, 3, 100079. DOI: 10.1016/j.jcoa.2023.100079

Related Content

Best of the Week: Nitrosamine Analysis, PFAS in Estuaries, SFC Europe

Aaron Acevedo

April 11th 2025

Article

Here is some of the most popular content posted on LCGC International this week.

Experts Discuss the Complexities of Nitrosamine Analysis

Aaron Acevedo

April 9th 2025

Article

During a recent LCGC International peer exchange discussion on nitrosamine analysis, our professional panelists discussed the complex issues surrounding the processes for detecting nitrosamines.

Colorful bright neon glowing graphic equalizer. Ultraviolet signal spectrum, laser show, energy, sound vibrations and waves. 3d illustration | Image Credit: © flashmovie - stock.adobe.com

How Many Repetitions Do I Need? Caught Between Sound Statistics and Chromatographic Practice

Bob W. J. Pirok

April 7th 2025

Article

In chromatographic analysis, the number of repeated measurements is often limited due to time, cost, and sample availability constraints. It is therefore not uncommon for chromatographers to do a single measurement.

Diagrams and graphs on virtual screen. Business strategy, data analysis technology and financial growth concept. | Image Credit: © WrightStudio - stock.adobe.com.

Fundamentals of Benchtop GC–MS Data Analysis and Terminology

James Mizvesky;Nicholas H. Snow

April 5th 2025

Article

In this installment, we will review the fundamental terminology and data analysis principles in benchtop GC–MS. We will compare the three modes of analysis—full scan, extracted ion chromatograms, and selected ion monitoring—and see how each is used for quantitative and quantitative analysis.

Rethinking Chromatography Workflows with AI and Machine Learning

Isabel Kolinko

April 1st 2025

Article

Interest in applying artificial intelligence (AI) and machine learning (ML) to chromatography is greater than ever. In this article, we discuss data-related barriers to accomplishing this goal and how rethinking chromatography data systems can overcome them.

Investigating Antisense Oligonucleotide Separation Kinetics Using Hydrophilic Interaction Liquid Chromatography

Alasdair Matheson

March 21st 2025

Article

LCGC International spoke with Daniel Meston and Dwight Stoll from Gustavus Adolphus College in St. Peter, Minnesota, USA, about a project they worked on with Todd Maloney from Eli Lilly in Indianapolis, Indiana, USA, to investigate the optimal performance conditions of antisense oligonucleotides (ASOs) when using hydrophilic interaction liquid chromatography (HILIC).