Our attention is typically focused on the chromatography in the laboratory. Yet ultimately, the liquid chromatograph (LC) or gas chromatograph (GC) only produces a signal—the chromatogram—from which information still must be distilled. In this series, we will discuss different aspects of data analysis and learn how to extract different bits of information from our chromatograms. We start with simple, yet important, bits of information, which we will tie together in future articles. In this installment, we establish why peak integration still poses challenges, and at the same time, see some of the computational techniques in action that we learn to use ourselves in future installments.
Any raw signal, such as those encountered in analytical separation science, is comprised of several components or frequencies. Frequency here refers to the rate at which the signal is changing over time. A large number of sudden changes within a limited time are of high frequency, whereas a slowly and gradually changing signal is of low frequency.
For chromatography, a signal can be roughly categorized as follows: (i) a high-frequency component that contains the noise; (ii) a low-frequency component that captures the baseline drift; and (iii) a medium-frequency component that usually includes the chromatographic peaks of interest. This is illustrated in Figure 1.
The integration of a peak is thus the result of several steps of data pre-processing in which the peaks (that is, the medium-frequency component) are first isolated. In future articles, we will learn in more detail how to perform these individual steps. For now, it suffices to understand that this isolation process poses a challenge in itself with significant risks of altering the actual area of a peak. For instance, errors may be introduced by excessive smoothing of the noise or by wrongly recognizing sections of (co-eluted) peaks as baseline. This may seem less relevant for relatively simple separations, where detection possibilities allow for clear chromatograms to be obtained. However, it is critical for complex separations, especially when trace concentrations are expected or the number of peaks is too large for manual curation of each individual analyte.
Assuming that signal pre-processing has successfully been conducted, we can now focus our attention to the actual integration of a peak. Mathematically, this is the full integration of the peak from its start and end point, analogous to the zeroth moment (0th) as discussed in the previous installment (1). This is a relatively simple exercise for prominent, isolated peaks. However, it quickly becomes difficult for co-eluted signals.
Figure 2 shows an example, with peaks 2 and 3 partially co-eluted. The challenge now becomes to establish the true areas of these two peaks. At this stage, it is important to acknowledge that we are dealing with a chromatographic-resolution problem for which we now seek a computational solution. In essence, we must computationally solve the lack of resolution. Different strategies are now possible, and some of them are, for this reason, also known in literature as resolution-enhancement methods.
We will learn about these different strategies in future articles. In this article, we focus on the impact of the choice of strategy. Figure 2a shows a strategy based on local-maxima peak detection. This approach is probably best known to chromatographers. Here, the two co-eluted peaks are split at their saddle point, the minimum between the two signals. A vertical line (shown in pink) is drawn between the saddle point and the baseline. Everything to the left side is then considered to be the area of peak #2, and all of the area to the right would belong to peak #3.
Another strategy involves computationally deconvoluting the two peaks by fitting mathematical distribution functions through each (such as a Gaussian) and then integrating these individually. This is shown in Figures 2b and 2c, with the blue lines depicting the individual fitted peaks. The purple line depicts the sum of the blue lines, and the blue dots represent the original data. In panel b, Gaussians are fitted to the peaks, and in panel c, a modified Pearson VII function is used that can capture the tailing of chromatographic peaks.
Table I shows the resulting peak areas for each of the four peaks using the different strategies. The first two approaches suggest that peak #2 is smaller than peak #3, whereas the third approach suggests that the two peaks are equal in area. We also visually see in Figure 2c that the two fitted peaks are now similar in width, which is more consistent with what we would expect from a chromatographic perspective. The numbers by the third approach are in agreement with the composition of the created sample.
The choice of integration strategy thus significantly impacts the resulting number. The advantage of the first strategy is its simplicity and robustness. Splitting the peaks may not make much sense from a chromatographic perspective, but surely will produce consistent numbers.
Deconvolution methods may provide more accurate numbers, but they hinge on several factors, such as the severity of co-elution, as well as parameters, such as the choice of function that is fitted through the peaks. For real complex separations, it is impossible to manually inspect the results for each individual signal, and thus, improved robust strategies are of high interest. Multi-channel detectors, such as mass spectrometers, alleviate this problem somewhat, but also introduce new challenges with respect to signal consistency (for example varying ionization efficiencies across the peak as the degree of co-elution changes).
It is not surprising that groups around the world have devoted significant attention to alternatives including multivariate methods and machine learning (for example, [3,4]).
(1) Pirok, B. W. J. Resolving Separation Issues with Computational Methods: What Is the Retention Time, Exactly? LCGC Int. 2024, 1 (4), 24–25. DOI: 10.56530/lcgc.int.lg4875i7
(2) Pirok, B. W. J.; Westerhuis, J. A. Challenges in Obtaining Relevant Information from One- and Two-Dimensional LC Experiments. LCGC North Am. 2020, 6 (38), 8–14. DOI: 10.56530/lcgc.na.jk4782s5
(3) Risum, A. B.; Bro, R. Using Deep Learning to Evaluate Peaks in Chromatographic Data. Talanta 2019, 204, 255–260. DOI: 10.1016/j.talanta.2019.05.053
(4) Satwekar, A; Panda, A.; Nandula, P.; Sripada, S.; Govindaraj, R.; Rossi, M. Digital by Design Approach to Develop a Universal Deep Learning AI Architecture for Automatic Chromatographic Peak Integration. Biotechnol. Bioeng. 2023, 120 (7), 1822–1843. DOI: 10.1002/bit.28406
Bob W. J. Pirok is an assistant professor of analytical chemistry at the Van ‘t Hoff Institute for Molecular Science (HIMS) at the University of Amsterdam. Direct correspondence to: B.W.J.Pirok@uva.nl
Exploring The Chemical Subspace of RPLC: A Data-driven Approach
November 11th 2024Saer Samanipour from the Van ‘t Hoff Institute for Molecular Sciences (HIMS) at the University of Amsterdam spoke to LCGC International about the benefits of a data-driven reversed-phase liquid chromatography (RPLC) approach his team developed.
AI-Powered Precision for Functional Component Testing in Tea Analysis
October 11th 2024Analyzing functional foods reveals numerous health benefits. These foods are rich in bioactive compounds that go beyond basic nutrition, boosting the immune system and improving overall wellness. However, analyzing these compounds can be challenging. This article discusses AI algorithms to support automated method development for liquid chromatography, simplifying the process, enhancing labor efficiency, and ensuring precise results, making it accessible to non-experts for tea analysis.