Our attention is typically focused on the chromatography in the laboratory. Yet ultimately, the liquid chromatograph (LC) or gas chromatograph (GC) only produces a signal—the chromatogram—from which information still must be distilled. In this series, we will discuss different aspects of data analysis and learn how to extract different bits of information from our chromatograms. We start with simple, yet important, bits of information, which we will tie together in future articles. In this installment, we establish why peak integration still poses challenges, and at the same time, see some of the computational techniques in action that we learn to use ourselves in future installments.
Any raw signal, such as those encountered in analytical separation science, is comprised of several components or frequencies. Frequency here refers to the rate at which the signal is changing over time. A large number of sudden changes within a limited time are of high frequency, whereas a slowly and gradually changing signal is of low frequency.
For chromatography, a signal can be roughly categorized as follows: (i) a high-frequency component that contains the noise; (ii) a low-frequency component that captures the baseline drift; and (iii) a medium-frequency component that usually includes the chromatographic peaks of interest. This is illustrated in Figure 1.
The integration of a peak is thus the result of several steps of data pre-processing in which the peaks (that is, the medium-frequency component) are first isolated. In future articles, we will learn in more detail how to perform these individual steps. For now, it suffices to understand that this isolation process poses a challenge in itself with significant risks of altering the actual area of a peak. For instance, errors may be introduced by excessive smoothing of the noise or by wrongly recognizing sections of (co-eluted) peaks as baseline. This may seem less relevant for relatively simple separations, where detection possibilities allow for clear chromatograms to be obtained. However, it is critical for complex separations, especially when trace concentrations are expected or the number of peaks is too large for manual curation of each individual analyte.
Assuming that signal pre-processing has successfully been conducted, we can now focus our attention to the actual integration of a peak. Mathematically, this is the full integration of the peak from its start and end point, analogous to the zeroth moment (0th) as discussed in the previous installment (1). This is a relatively simple exercise for prominent, isolated peaks. However, it quickly becomes difficult for co-eluted signals.
Figure 2 shows an example, with peaks 2 and 3 partially co-eluted. The challenge now becomes to establish the true areas of these two peaks. At this stage, it is important to acknowledge that we are dealing with a chromatographic-resolution problem for which we now seek a computational solution. In essence, we must computationally solve the lack of resolution. Different strategies are now possible, and some of them are, for this reason, also known in literature as resolution-enhancement methods.
We will learn about these different strategies in future articles. In this article, we focus on the impact of the choice of strategy. Figure 2a shows a strategy based on local-maxima peak detection. This approach is probably best known to chromatographers. Here, the two co-eluted peaks are split at their saddle point, the minimum between the two signals. A vertical line (shown in pink) is drawn between the saddle point and the baseline. Everything to the left side is then considered to be the area of peak #2, and all of the area to the right would belong to peak #3.
Another strategy involves computationally deconvoluting the two peaks by fitting mathematical distribution functions through each (such as a Gaussian) and then integrating these individually. This is shown in Figures 2b and 2c, with the blue lines depicting the individual fitted peaks. The purple line depicts the sum of the blue lines, and the blue dots represent the original data. In panel b, Gaussians are fitted to the peaks, and in panel c, a modified Pearson VII function is used that can capture the tailing of chromatographic peaks.
Table I shows the resulting peak areas for each of the four peaks using the different strategies. The first two approaches suggest that peak #2 is smaller than peak #3, whereas the third approach suggests that the two peaks are equal in area. We also visually see in Figure 2c that the two fitted peaks are now similar in width, which is more consistent with what we would expect from a chromatographic perspective. The numbers by the third approach are in agreement with the composition of the created sample.
The choice of integration strategy thus significantly impacts the resulting number. The advantage of the first strategy is its simplicity and robustness. Splitting the peaks may not make much sense from a chromatographic perspective, but surely will produce consistent numbers.
Deconvolution methods may provide more accurate numbers, but they hinge on several factors, such as the severity of co-elution, as well as parameters, such as the choice of function that is fitted through the peaks. For real complex separations, it is impossible to manually inspect the results for each individual signal, and thus, improved robust strategies are of high interest. Multi-channel detectors, such as mass spectrometers, alleviate this problem somewhat, but also introduce new challenges with respect to signal consistency (for example varying ionization efficiencies across the peak as the degree of co-elution changes).
It is not surprising that groups around the world have devoted significant attention to alternatives including multivariate methods and machine learning (for example, [3,4]).
(1) Pirok, B. W. J. Resolving Separation Issues with Computational Methods: What Is the Retention Time, Exactly? LCGC Int. 2024, 1 (4), 24–25. DOI: 10.56530/lcgc.int.lg4875i7
(2) Pirok, B. W. J.; Westerhuis, J. A. Challenges in Obtaining Relevant Information from One- and Two-Dimensional LC Experiments. LCGC North Am. 2020, 6 (38), 8–14. DOI: 10.56530/lcgc.na.jk4782s5
(3) Risum, A. B.; Bro, R. Using Deep Learning to Evaluate Peaks in Chromatographic Data. Talanta 2019, 204, 255–260. DOI: 10.1016/j.talanta.2019.05.053
(4) Satwekar, A; Panda, A.; Nandula, P.; Sripada, S.; Govindaraj, R.; Rossi, M. Digital by Design Approach to Develop a Universal Deep Learning AI Architecture for Automatic Chromatographic Peak Integration. Biotechnol. Bioeng. 2023, 120 (7), 1822–1843. DOI: 10.1002/bit.28406
Bob W. J. Pirok is an assistant professor of analytical chemistry at the Van ‘t Hoff Institute for Molecular Science (HIMS) at the University of Amsterdam. Direct correspondence to: B.W.J.Pirok@uva.nl
The Future of Digital Method Development: An Interview with Anne Marie Smith
December 12th 2024Following the HPLC 2024 Conference in Denver, Colorado, LCGC International spoke with Anne Marie Smith of ACD/Labs about the new ICH Q14 guidelines and how they impact analytical scientists and their work.
Inside the Laboratory: Using GC–MS to Analyze Bio-Oil Compositions in the Goldfarb Group
December 5th 2024In this edition of “Inside the Laboratory,” Jillian Goldfarb of Cornell University discusses her laboratory’s work with using gas chromatography–mass spectrometry (GC–MS) to characterize compounds present in biofuels.
RAFA 2024: Michel Suman Discusses Food Safety And Authenticity Research
November 28th 2024During RAFA 2024, Michel Suman of Barilla Spa and Catholic University Sacred Heart talked with us about his food safety and authenticity research, focusing on contaminants, adulterants, and authenticity markers in food processing.
Exploring The Chemical Subspace of RPLC: A Data-driven Approach
November 11th 2024Saer Samanipour from the Van ‘t Hoff Institute for Molecular Sciences (HIMS) at the University of Amsterdam spoke to LCGC International about the benefits of a data-driven reversed-phase liquid chromatography (RPLC) approach his team developed.