Deep Learning Framework for Peak Detection at the Intact Level of Therapeutic Proteins

Fact checked by Caroline Hroncich
News
Article

A recent study conducted by the Department of Chemical Engineering at the Indian Institute of Technology (Delhi, India) used liquid chromatography-mass spectrometry (LC–MS) to distinguish hetero-variants (glycoforms) resulting in a monoclonal antibody (mAb) able to be characterized, revealing discernible peaks at the intact level.

Although automated peak detection functionalities are available in commercially accessible software, utilizing visual inspection and manual adjustments to achieve optimal true positive rates is often necessary. A recent study conducted by the Department of Chemical Engineering at the Indian Institute of Technology (Delhi, India) used liquid chromatography-mass spectrometry (LC–MS) to distinguish hetero-variants (glycoforms) resulting in a monoclonal antibody (mAb) able to be characterized, revealing discernible peaks at the intact level. LCGC International spoke to Anurag Rathore, corresponding author for the article, about his department’s findings.

Your paper (1) presents a study conducted by you and your coauthors where a machine learning (ML)-based approach for peak detection is used to facilitate a head-to-head intact-level comparison of commercially licensed biosimilars and innovator products. What are the benefits of using ML to do this, as opposed to other approaches?
Unlike the traditional peak detection methods that require pre-existing information about the sample such as baseline distortions, phasing errors and t1 noise, ML-based techniques necessitate minimum prior knowledge. Thus, reducing the dependency on manual adjustments and expert input, making the process more automated and streamlined. In addition, ML based methods are less sensitive to noise and thus particularly advantageous in environments with signal-to-noise issues ensuring reliable peak detection without extensive manual intervention.

Why is the method of peak detection important?
The method of peak detection is important for following reasons in the context of analysis of therapeutic proteins (mAbs):

  1. It allows for precise identification and quantification of different molecular species present in the sample, thereby ensuring that the product meets necessary quality standards.
  2. By identifying peaks accurately, manufacturers can monitor and optimize the drug development process. This helps in identifying any deviations or inconsistencies in the manufacturing process, enabling timely adjustments.
  3. For intact mass analysis, which is important for verifying the molecular mass of therapeutic proteins and ensuring that the protein is correctly assembled and has the expected molecular mass.
  4. For structural biology, peak detection aids in analyzing the structural components of proteins and other biomolecules. This information is crucial for understanding the function and interaction of these molecules within biological systems.

Have you found that there are certain chromatography or spectrometry techniques that are optimized by using ML-based approaches? 7
Yes, certain chromatography and spectrometry techniques can be optimized by using ML based approaches due to their complex and high-dimensional data characteristics which can be challenging to process using traditional methods. Some common examples are high-resolution liquid chromatography-mass spectrometry (LC–MS). Studies have shown that ML techniques such as convolutional neural network (CNN) and recurrent neural network (RNN) supersede far over other techniques in higher true positive rate detection.

Was ML was your best option in carrying out your peak detection analysis? Were other artificial intelligence (AI) approaches considered?

Conventional algorithms for peak detection such as partial least squares-discriminant analysis (PLS-DA) and locally weighted regression (LWR) were applied towards our problem statement. The results from them reflected lesser accuracy in multiple peak detection and required heavy computational load. Artificial neural networks were also deployed for the similar tasks of peak detection but their inability to extract relations out of spectral data led to inaccurate detections. The approach developed by us with convolutional neural networks transcended the performance of conventional algorithms as well as the ML approach based on artificial neural networks in terms of accuracy, computational efficiency, and operational efficiency.

Briefly state your findings in this study.

In the initial phase, hetero-variants (glycoforms) of a mAb were distinguished using LC–MS, revealing discernible peaks at the intact level. To comprehensively identify each peak in the intact-level analysis, a deep learning approach utilizing CNNs was employed. Using conventional software for peak identification only five peaks were detected with a 0.5 threshold. The CNN model identified seven main peaks with many overlapping peaks within the main peak under the same conditions, indicating superior detection capability. The true positive rate for 0.5 threshold of CNN model was 0.9 with probability AUC value of 0.9949, giving good results. The results were also compared with some conventional algorithms such as PLS-DA and LWR for peak detection and CNN model outperformed both of these models with higher computational efficiency.

Do your findings correlate with what you had hypothesized?
Yes, as hypothesized utilizing machine learning, specifically CNNs, would improve peak detection accuracy and true positive rates compared to conventional methods. Using conventional software for peak identification only five peaks were detected with a 0.5 threshold. The CNN model identified seven main peaks with many overlapping peaks within the main peak under the same conditions, indicating superior detection capability.

Was there anything particularly unexpected that stands out from your perspective?

The ability of the CNN model to accurately detect multiple overlapping peaks without getting affected by noise is what unexpectedly stands out.

Were there any limitations or challenges you encountered in your work?

Some of the limitations of the present study include:

  1. The model may not generalize well to other datasets or different experimental conditions without further validation.
  2. Despite high accuracy, expert validation and interpretation of detected peaks is still required to confirm the findings.
  3. CNNs can be complex and difficult to interpret, making it challenging to understand the rationale behind specific peak identifications.

What best practices can you recommend in this type of analysis for both instrument parameters and data analysis?

The best practices we recommend for effective data analysis are:

  1. Focus on tuning hyperparameters by experimenting with different filter sizes, learning rate and batch sizes to obtain best results.
  2. To obtain optimal trade-off between computational speed and accuracy adjust the layers of CNN architecture accordingly.
  3. Implementation of dropout layers and regularization techniques to avoid overfitting in necessary.

In the case of instrument parameters, one should focus on:

  1. Efficient separation and intact mass estimation of mAb with adept LC system and column. For our study we used Agilent 1260 Infinity Bio-inert Quaternary LC system.
  2. Calibration of positive ion mode of MS chromatogram before analysis.
  3. Setting capillary gas temperature and its voltage as well as the voltage of fragmentor at optimum levels.
  4. Applying precise algorithms in the available softwares like the Agilent MassHunter Qualitative Analysis and BioConfirm for deconvoluting the MS chromatogram.

What are the next steps in this research and are you planning to be involved in improving this technology?

We can explore combining CNN with other AI techniques such as classifiers to enhance detection capabilities and robustness further. Develop strategies to reduce the computational load, such as parallel processing or splitting datasets into smaller regions, to make the approach more efficient and scalable.

What are your thoughts on AI and ML for data analysis in chromatography and spectrometry?
AI and machine learning can significantly improve the reliability and depth of analytical results in chromatography and spectrometry by enhancing accuracy, efficiency and scalability. The high-dimensional and complex datasets produced by the chromatography and spectrometry techniques can be easily handled by ML algorithms by extracting meaningful features and patterns that might be missed by conventional methods.

Reference

1. Nikita, S.; Bhattacharya, S.; Manocha, K.; Rathore, A. S. Deep Learning Framework for Peak Detection at the Intact Level of Therapeutic Proteins. J. Sep. Sci. 2024, 47 (11),139888. DOI: 10.1002/jssc.202400051

Anurag S. Rathore is a professor in the Department of Chemical Engineering at the Indian Institute of Technology in Delhi, India.

Anurag S. Rathore is a professor in the Department of Chemical Engineering at the Indian Institute of Technology in Delhi, India.

Recent Videos
Toby Astill | Image Credit: © Thermo Fisher Scientific
Related Content