A paper published by Nature Communications proposes a suite of batch effect removal neural networks (BERNN) to remove batch effects in large liquid chromatography-mass spectrometry (LC-MS) experiments, with the goal of maximizing sample classification performance between conditions.
The paper states that, while LC-MS is a powerful method for profiling complex biological samples, batch effects typically arise due to the omnipresence of confounding factors, which can be divided into those biological in nature (such as age or gender) and non-biological (such as batch effects). Non-biological factors are practically unavoidable in large-scale studies, due to limitations in instrument availability and timeline of sample collection. Ideally, batch effects would be removed from the final biological quantification value. It can be difficult to remove batch effects completely without the quality of the biological signal being affected. These effects can significantly impact the interpretability of results. Correcting batch effects is crucial for the reproducibility of omics research. Current methods, however, are not optimal for the removal of batch effects without compressing the genuine biological variation under study.
The authors in this multi-affiliated paper, representing laboratories in the United States, Canada, the Netherlands, France, and the United Kingdom, present an approach to countering batch effects that is different from most other solutions, as they do not rely on a single solution. Instead, they acknowledge that not all problems require the same solution and propose multiple potential solutions to address batch effects. They therefore aim to empower researchers to easily try multiple methods simultaneously, and then pick the optimal approach for their dataset and scientific questions.
Amongst this suite of models, the authors present the first use of Variational Autoencoders (VAE), Domain Adversarial Neural Networks (DANN), and Domain Inverse Triplet Loss (invTriplet) for batch correction in LC-MS. Furthermore, in contrast to other batch correction methods, they do not recommend using the corrected output of the autoencoder for biomarker discovery through downstream analysis (for example, using differential analysis). Rather, they demonstrate in their paper how SHapley Additive exPlanations (SHAP), a game theoretic approach explaining the output of any machine learning model which connects optimal credit allocation with local explanations using the classic Shapley values from game theory and their related extensions (2), can be used for biomarker discovery.
Comparison of batch effect correction methods across five diverse datasets presented in the paper (Alzheimer’s, Adenocarcinoma, aging mice, benchmark, and mixed tissues) demonstrated that BERNN models consistently showed the strongest sample classification performance. However, the model producing the greatest classification improvements did not always perform best in terms of batch effect removal.
The paper also presented findings that the overcorrection of batch effects resulted in the loss of some essential biological variability. These findings highlight the importance of balancing batch effect removal while preserving valuable biological diversity in large-scale LC-MS experiments.
The authors believe that, through their findings and this resulting paper, their contribution to researchers who are facing batch effect problems is threefold. First, they have demonstrated the effectiveness of models that, to their knowledge, have never been applied in LC-MS experiments to correct batch effects. Secondly, they showed the necessity of trying different models to solve different problems. Finally, they show that, to obtain the best classification on a given dataset, removing parts of the batch effects can improve the results, but removing too many batch effects might come at the cost of diminished classification performance.
References
Advancing Bladder Cancer Research with Mass Spectrometry: A FeMS Interview with Marta Relvas-Santos
November 12th 2024LCGC International interviewed FeMS Empowerment Award winner Marta Relvas-Santos on her use of mass spectrometry to identify potential biomarkers and therapies for bladder cancer. She also shared insights on her work with FeMS and advice for fellow scientists.
Exploring The Chemical Subspace of RPLC: A Data-driven Approach
November 11th 2024Saer Samanipour from the Van ‘t Hoff Institute for Molecular Sciences (HIMS) at the University of Amsterdam spoke to LCGC International about the benefits of a data-driven reversed-phase liquid chromatography (RPLC) approach his team developed.