Inside the Laboratory: Using GC–MS to Analyze Bio-Oil Compositions in the Goldfarb Group

December 5, 2024

News

Article

"Inside the Laboratory" is a joint series with LCGC and Spectroscopy, profiling analytical scientists and their research groups at universities all over the world. This series spotlights the current chromatographic and spectroscopic research their groups are conducting, and the importance of their research in analytical chemistry and specific industries. In this edition of “Inside the Laboratory,” Jillian Goldfarb of Cornell University discusses her laboratory’s work with using gas chromatography–mass spectrometry (GC–MS) to characterize compounds present in biofuels.

Although gas chromatography–mass spectrometry (GC–MS) has made enormous strides over the past couple decades, limitations remain, particularly in GC–MS software. GC–MS software is inefficient and often lacks the ability to compare data across different platforms and methods from diverse research groups (1). As a result, researchers are looking to develop tools and exploring improved ways to automate and standardized processing of GC–MS data.

Jillian Goldfarb is one of those researchers looking at this issue. Goldfarb is an Associate Professor of Chemical and Biomolecular Engineering at Cornell University in Ithaca, New York. She received her BS in Chemical Engineering from Northeastern University in 2004 and her Ph.D. from Brown University in Chemical Engineering in 2008 (2). Recently, she and her team developed a new open-source Python tool to automate processing of GC–MS data, with the goal of addressing the limitations of proprietary solutions (1).

In this edition of “Inside the Laboratory,” LCGC International sat down with Goldfarb to discuss her group’s newly developed tool and its key features, and how it is being applied to analyze bio-oil compositions.

Dr. Jillian Goldfarb of Cornell University. Photo Credit: © Jillian Goldfarb

Can you provide a brief snapshot of your laboratory group and the projects that you and your team is currently working on?

The Goldfarb Lab at Cornell University is an interdisciplinary playground where talented engineers, scientists, policy scholars, and archaeologists are united by carbon. We develop new ways to transform organic wastes (like those dining hall mishaps and local agricultural residues) into sustainable biofuels and materials to treat contaminated water and store energy. The same laboratory techniques that we use to monitor reaction progress and fuel quality can be adapted to identify the organic resides in ancient ceramics to understand our past food uses.

What inspired you to develop the “gcms_data_analysis” tool, and what challenges in biofuel and organic mixture analysis were you addressing?

Over the past decade, we’ve seen an explosion in the number of biofuel papers published. However, making comparisons across many of these papers is difficult because each group seems to use its own method to analyze bio-oil composition. What we often see is that papers report an oil’s compounds and their corresponding chromatogram area. Area is directly correlated with concentration (higher area = higher concentration), so one can easily compare samples within a given paper by area. However, we each have our own GC–MS instrument, column, and method (temperature program, gas flow rate, injection volume, operating conditions, and so on), and each of these variables changes how the instrument responds to a given concentration. This makes it impossible to compare samples that report chromatogram area only across different papers and laboratory groups.

There are two reasons for this. One, calibrating a GC–MS instrument for the hundreds of potential compounds present in a biofuel is labor-intensive (we know, we’ve done it!). Second, the proprietary software from each instrument manufacturer can be clunky and really doesn’t link up with established databases to enable matching of “near neighbors” on a chromatogram. Our initial goal was to make a tool for our laboratory to rapidly process data based on calibration curves we built. After talking about this method with colleagues at conferences in the United States and Europe, they all asked for our code because they were struggling with the same issues. Even once they had a calibration, it was difficult to use it to quantify compounds for which they had not calibrated, which our use of the Tanimoto Similarity Index—built into the code—addresses.

Can you explain how your tool integrates PubChemPy and the published fragmentation algorithm to enhance compound identification and functional group analysis?

In GC–MS analysis of bio-oils, the challenge is to automate the quantification and classification of the large number of identified compounds. For each identified compound, the tool queries the PubChem database using the pubchempy application programming interface (API) for its molecular structure. The retrieved structure is then fed to the fragmentation algorithm published by Simon Müller to count the different functional groups in the molecule. This allows us to improve the accuracy of semi-calibrations where available calibration curves (unfortunately, some are still needed) are attributed to uncalibrated compounds, only if the calibrated and uncalibrated molecules are similar enough (in terms of Tanimoto similarity). It then classifies bio-oils based on their content in terms of specific functional groups.

Why did you select open-source Python as your software development tool?

There’s the practical reason that the lead developer, Matteo Pecchi (a postdoc in the laboratory) was leaning into Python because of its open-source, flexible nature. As he led the project, he selected the software!

More broadly, we wanted something we could share with the entire scientific community without cost. This goes to our core value as a laboratory team. We are motivated by the United Nations Sustainable Development Goals and are committed to bringing transparency and equity to STEM. We need to freely and globally share knowledge to advance and deploy sustainable biofuels before irreversible climate damage occurs. There are incredible researchers doing groundbreaking work around the world who don’t have the resources we have at Cornell University. We wanted our work to be accessible to all of our colleagues, and hope that they build and improve upon it in the future.

How does the tool improve the comparability of GC–MS data across different research groups, and why is this crucial for advancing bio-oil processes?

If we can move away from reporting chromatogram areas and into actual concentrations of compounds produced in a bio-oil, the possibilities are limitless. We could better leverage big data tools to optimize processes and catalysts that produce desired biofuel compositions. It could inform development and selection of energy crops based on product profiles. We could close mass balances to predict product yields and compositions based on biomass and processing conditions to improve the accuracy of process models. This would derisk investment in biorefineries.

Your tool incorporates calibrations and semi-calibrations using Tanimoto and molecular weight similarities. Can you discuss how these approaches enhance data accuracy and reliability?

Calibration is really the gold standard, especially when we want to understand complex mixtures whose compositions change with even small changes in reaction conditions. However, there are some bio-oils where we detect a dozen components and others with hundreds of compounds present, making calibration somewhat impossible (at least for a small academic lab). In this case, the Tanimoto approach gives us confidence—beyond the GC–MS propriety software’s NIST library match—of the ensemble of molecules present in bio-oil sample. The Tanimoto approach is a widely used similarity metric, particularly in cheminformatics, that allows us to have a semiquantitative estimate of how much is present of these matched compounds in the sample. Although we can’t have as much confidence in the quantitative values reported as for a calibrated compound, we rigorously quantified the error present for a series of validation, verification and uncertainty experiments. When the Tanimoto similarity index is above 0.7, the percent errors are less than one order of magnitude. Using such an approximation is preferable to what many groups do when a compound is not identified by the NIST library (essentially ignore the data). Not reporting unidentified compounds results in a large data loss that could skew process design decisions, particularly if it’s a consistent class of compounds (for example aldehydes, phenols) that goes unidentified.

How do you and your group stay updated with advancements in analytical chemistry techniques and technologies?

As chemical engineers, we need to be able to leverage analytical chemistry tools to assess our process designs. To stay up to date on advancements in the field, we read. A lot. I don’t mean that we skim review articles or have artificial intelligence (AI) summarize papers for us. We get lost in the table of contents of a journal and follow references from one paper through another to explore topics and developments. We read magazines like the American Chemical Society’s Chemical & Engineering News to learn about advances in different fields and how they’re leveraging new analytical capabilities. By having a handle on what’s going on within and across areas of research, we are more agile and creative in our research.

Can you discuss a recent innovation or development in chromatography that you find particularly impactful or exciting?

The shift towards more sustainable chromatography methods, which often turn out to be more economical and reliable than traditional methods, is really exciting to me. From supercritical fluid chromatography (SFC) enabling more efficient separations through temperature and pressure gradients to solid-phase extraction (SPE) reducing solvent use and ion suppression, there’s a real shift towards implementing principles of Green Chemistry and Green Engineering in chromatography. One that really stands out (and always on my wish list) is the compact or mini- high performance liquid chromatography (HPLC) instrument. In the early 2000s, there was a question of sacrificing performance (matrix effects for complex mixtures were amplified with single column systems) for portability, but today’s instruments even take advantage of all the new columns on the market, embedded or coupling of detectors, and greater operating pressure and temperature ranges. The injection loops need nanoliters sample, so the HPLC instrument can use three orders of magnitude less solvent than our current standard HPLC instrument yet take up less bench space than a laptop. The portable nature of a piece of equipment that can measure (at least some) contaminants in the parts-per-billion (ppb) range in the field could enable real-time decision-making during environmental disasters. Being able to put it in a glovebox could save considerable time and enhance experimental workflows.

References

Pecchi, M.; Goldfarb, J. L. Open-source Python Module to Automate GC–MS Data Analysis Developed in the Context of Bio-oil Analyses. RSC Sustain. 2024, 2, 1444–1455. DOI: 10.1039/D3SU00345K
Cornell Engineering, Jillian Goldfarb Home Page. Cornell.edu. Available at: https://www.cheme.cornell.edu/faculty-directory/jillian-goldfarb (accessed 2024-12-02).