A paper published by Nature Communications proposes a suite of batch effect removal neural networks (BERNN) to remove batch effects in large liquid chromatography-mass spectrometry (LC-MS) experiments, with the goal of maximizing sample classification performance between conditions.
The paper states that, while LC-MS is a powerful method for profiling complex biological samples, batch effects typically arise due to the omnipresence of confounding factors, which can be divided into those biological in nature (such as age or gender) and non-biological (such as batch effects). Non-biological factors are practically unavoidable in large-scale studies, due to limitations in instrument availability and timeline of sample collection. Ideally, batch effects would be removed from the final biological quantification value. It can be difficult to remove batch effects completely without the quality of the biological signal being affected. These effects can significantly impact the interpretability of results. Correcting batch effects is crucial for the reproducibility of omics research. Current methods, however, are not optimal for the removal of batch effects without compressing the genuine biological variation under study.
The authors in this multi-affiliated paper, representing laboratories in the United States, Canada, the Netherlands, France, and the United Kingdom, present an approach to countering batch effects that is different from most other solutions, as they do not rely on a single solution. Instead, they acknowledge that not all problems require the same solution and propose multiple potential solutions to address batch effects. They therefore aim to empower researchers to easily try multiple methods simultaneously, and then pick the optimal approach for their dataset and scientific questions.
Amongst this suite of models, the authors present the first use of Variational Autoencoders (VAE), Domain Adversarial Neural Networks (DANN), and Domain Inverse Triplet Loss (invTriplet) for batch correction in LC-MS. Furthermore, in contrast to other batch correction methods, they do not recommend using the corrected output of the autoencoder for biomarker discovery through downstream analysis (for example, using differential analysis). Rather, they demonstrate in their paper how SHapley Additive exPlanations (SHAP), a game theoretic approach explaining the output of any machine learning model which connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (2), can be used for biomarker discovery.
Comparison of batch effect correction methods across five diverse datasets presented in the paper (Alzheimer’s, Adenocarcinoma, aging mice, benchmark, and mixed tissues) demonstrated that BERNN models consistently showed the strongest sample classification performance. However, the model producing the greatest classification improvements did not always perform best in terms of batch effect removal.
The paper also presented findings that the overcorrection of batch effects resulted in the loss of some essential biological variability. These findings highlight the importance of balancing batch effect removal while preserving valuable biological diversity in large-scale LC-MS experiments.
The authors believe that, through their findings and this resulting paper, their contribution to researchers who are facing batch effect problems is threefold. First, they have demonstrated the effectiveness of models that, to their knowledge, have never been applied in LC-MS experiments to correct batch effects. Secondly, they showed the necessity of trying different models to solve different problems. Finally, they show that, to obtain the best classification on a given dataset, removing parts of the batch effects can improve the results, but removing too many batch effects might come at the cost of diminished classification performance.
References
LCGC’s Year in Review: Highlights in Liquid Chromatography
December 20th 2024This collection of technical articles, interviews, and news pieces delves into the latest innovations in LC methods, including advance in high performance liquid chromatography (HPLC), ultrahigh-pressure liquid chromatography (UHPLC), liquid chromatography–mass spectrometry (LC–MS), and multidimensional LC.
Using Chromatography to Study Microplastics in Food: An Interview with Jose Bernal
December 16th 2024LCGC International sat down with Jose Bernal to discuss his latest research in using pyrolysis gas chromatography–mass spectrometry (Py-GC–MS) and other chromatographic techniques in studying microplastics in food analysis.
Next Generation Peak Fitting for Separations
December 11th 2024Separation scientists frequently encounter critical pairs that are difficult to separate in a complex mixture. To save time and expensive solvents, an effective alternative to conventional screening protocols or mathematical peak width reduction is called iterative curve fitting.