LCGC International spoke with Shawn Anderson, Associate Vice President of Digital Lab Innovations at Agilent Technologies; Marco Kleine, Head of the Informatics Department at Shimadzu Europa GmbH; Trish Meek, Senior Director, Connected Science, Waters Corporation and Todor Petrov, Senior Director, QA/QC, Waters Corporation; and Crystal Welch, Product Marketing Manager at Thermo Fisher Scientific about the latest trends in data handling.
What is currently the biggest problem in data management for chromatographers?
MARCO Kleine: One of the biggest problems in data management for chromatographers is the huge volume of data generated during analyses. Chromatography techniques, such as liquid chromatography (LC), high-performance liquid chromatography (HPLC), and gas chromatography (GC), produce large amounts of data that need to be organised, stored, analysed and in some cases transported through a network. This can be a time-consuming and error-prone task. Additionally, the lack of standardized data formats and the compatibility issues between different chromatography software systems (CDSs) can make data management even more complicated.
SHAWN Anderson: Thoughtful inclusion of the chromatography results in a larger data set, to allow for insights into purity and yield improvements. After all, separation and detection combined are only one step (often the last) in what is usually a process to produce a molecule. There are many other steps as well, and correlating the purity and yield results with other factors in this process can drive true innovation. As a prerequisite for this, many chromatographers are yearning for more widespread adoption of vendor-neutral standards for data. The Findability, Accessibility, Interoperability, and Reusability (FAIR) data principles are useful for guiding this journey (1).
CRYSTAL Welch: The largest problem continues to be reducing the time and effort spent to manage data. It is still common that files are spread across multiple storage locations, with the effort to compile information together being manual and time-consuming. People want their chromatography systems to work like their phone software, with a more stable platform and easy-to-use applications, with all data secured into a central location so they can use, view, and download it onto their next tablet or cell phone without having to transfer it from device to device.
TODOR Petrov: If we look at the chromatography data as something that has a lifecycle, there are different challenges in the different phases the data goes through. For example, once a multitude of chromatograms are acquired and quantitative results are calculated, the first challenge an analyst faces is with screening the data to determine which data sets are in line with the expectations and which are outliers. In today’s technology, machine learning can be utilized for anomaly detection to make the data review process more efficient by focusing on the exceptions.
Once the data passes the first review gate, the next challenge may often be with data sharing for collaboration purposes. Companies have large networks of partners that generate chromatography data that the sponsors need to review as well. The growth of contract services demands efficient solutions for data sharing with minimum delays. In today’s technology, cloud-based solutions offer the best mechanisms to achieve that.
Once the chromatography data has been reviewed and has served its primary purpose, it needs to be made available for extracting analytical insights across other processes the sample in question has been subjected to. The data format standardization is the main challenge in this phase.
The data gets archived eventually and while the amounts of it accumulated over time can be challenging to manage, a major challenge is the expectation that data sets can be resurrected at any time in the software application that has produced them originally. This implies data format compatibility that goes back decades or having to maintain dated application instances.
TRISH Meek: Throughout each of the steps in the lifecycle that Todor described, laboratories need to be able to share laboratory data and methods with their internal and external colleagues, show auditors that they are following regulatory guidance and Good Laboratory Practice (GLP), and use their data to make decisions about whether water is safe to drink or if a product can be released.
While organizations often rely on systems like electroniclab notebooks (ELNs) and laboratory information management systems (LIMS) to aggregate and share to handle final results, like peak concentrations and amounts, across the enterprise it does not include all of the chromatography data, so it is often evaluated without the context of how that data was acquired. As we work with laboratories, their biggest challenge is getting the complete picture of their data.
What is the future of data handling solutions for chromatographers?
Anderson: We believe that we are seeing the limits of the current LIMS-oriented model, and we are likely to see an advancement in insight generation that is distinct and separate from the LIMS wheelhouse of sample management and test execution. There are numerous innovations around this that are becoming popular. One is data format standardization in a vendor-neutral way, likely based on ASM, the allotrope simplified model. This provides a common input language for organizations to develop and maintain their own data lakes. Another is cloud/prem hybrid storage, which balances redundancy and backup security with low-latency, real-time access. This hybrid model can also allow for more powerful (and cheaper) data processing operations in the cloud while keeping control and stepwise analyses on premises and close to the instrument and end user.
Kleine: The future of data handling solutions for chromatographers is likely to involve advances in automation, cloud-based storage, data analytics and standardisation.
In terms of automation the increasing volume of data generated during a measurement means automation will play a key role in data handling. AI-driven algorithms can automate data processing and analysis, reducing the amount of work and minimizing (human) errors.
Cloud-based technologies will enable chromatographers to store and access their data remotely from everywhere. Cloud-based solutions also enable data sharing and collaboration with other researchers.
Advanced data analytics techniques, such as machine learning and artificial intelligence, will help to extract more detailed information from chromatographic data.
Additionally, standardisation will become important. Efforts have already been undertaken to establish standardized data formats and protocols for chromatographic data to ensure integration and compatibility between different instruments and software platforms.
Welch: Solutions in this space are looking to take the hard work out of data analysis and management—whether that is by enabling software to process data holistically and offer things like consensus reporting for multi-omics, reduce manual processes with automation, or leverage new AI tools with a goal of getting closer to the truth.
Petrov: Many organizations are moving or have moved their IT infrastructure to the cloud, including data handling solutions like CDS. There are multiple reasons for the increasing interest in software as a service (SaaS) solutions for chromatography data. The primary reasons are to simplify the management of the applications and to make the data accessible to the organization. SaaS solutions provide benefits such as secure worldwide access, up-to-date application and infrastructure security, scalable IT infrastructure, economies of scale, competitive operational costs, and lower initial costs compared to non-subscription deployments on premise.
Meek: In addition to the infrastructure changes, techniques such as machine learning will become critical to data acquisition, processing, and analytics. There are many opportunities to improve on traditional data processing algorithms and support review by exception by deploying artificial intelligence.
What one recent development in “Big Data” is most important
for chromatographers from a practical perspective?
Anderson: It is difficult to not answer “Generative AI” for this question. An obvious use case might be to train a model on chromatographic methods for categories of molecules and then ask the AI to generate ideal yet broadly applicable separation methods. Another area that is intriguing (but not as fashionable as AI) is using Big Data for real-time decision-making. One example is using chromatographic data from bioreactor sampling to trigger changes in media composition or temperature settings. Another example is setting limits for hardware metrics such as pump cycles to automatically trigger preventative maintenance scheduling.
Kleine: For a long time, chromatographers have relied on manual data analysis methods, which can be time-consuming and lead to errors. With the latest development in (big) data analytics, chromatographers now have access to powerful tools, like databases, that can support and automate data analysis. These data analytics tools utilise machine learning algorithms, pattern recognition techniques, and statistical analysis methods to analyse large volumes of chromatographic data quickly and accurately. They can help in identifying peaks, quantifying compounds, detecting outliers, and optimising experimental conditions.
Welch: Big Data can mean different things to different people, but one practical example would be utilizing trending over time to inform on when to perform maintenance, replace instrumentation, or just manage practical utilization of instrumentation better. Tools like schedulers, control charting, or predictive modelling can help plan for events and keep the whole lab moving forward.
Petrov: The term “Big Data” is typically used to describe large, unstructured data—think random text, images, and videos—where searching for an item of interest is not trivial and pattern recognition and training models are utilized instead. The chromatography data is structured for the most part, except for the chromatograms themselves, and therein lies the opportunity for using machine learning algorithms originally developed for Big Data. Detecting anomalies using such algorithms can substantially increase the efficiency of traditional methods for comparing chromatograms.
If we extend the scope beyond chromatography and consider the data lakes storing data from multiple phases a substance goes through during its development or manufacturing process, unstructured data is how that can be described. From that standpoint, anomaly detection algorithms can be beneficial, as well as another type of machine learning algorithms, known as classifiers. The classifiers identify clusters of similar data, and once clusters are associated with outcomes, the algorithms can predict an outcome for a set of data exhibiting similarities to a known cluster.
What obstacles do you think stand in the way of chromatographers
adopting new data solutions?
Anderson: Primarily the pain and time investment to change. Data will need to be transformed and migrated into these newer paradigms and this will often be a lower priority than the many day-to-day laboratory business demands. A large contributor to this daunting effort is (re)validation, which is required in regulated environments. In non-regulated environments it is also becoming more commonplace because these organizations also recognize the value of truly FAIR data.
Kleine: There are five main obstacles today:
Fear of change: Users may be accustomed to their existing data management and analysis methods and may be hesitant to adopt new solutions. They may be comfortable with manual processes or may have concerns about the reliability and accuracy of automated data solutions.
Costs: Implementing new data solutions often requires investment in hardware, software, and training. Chromatographers may be concerned about the upfront costs and ongoing expenses associated with adopting new technologies, especially if they are working with limited budgets.
Compatibility: Decision-makers may face challenges in integrating new data solutions with their existing instruments, software, and laboratory infrastructure. Compatibility issues can make the transition to new solutions difficult and time-consuming.
Data security: Chromatographers work with sensitive data and may have concerns about data security when adopting new data solutions systems. They must be sure that their data will be protected, especially when using cloud-based solutions.
Training: Adopting new data solutions requires additional trainings for chromatographers and laboratory staff. It will take time to acquire the necessary skills and knowledge to effectively use and leverage the new tools and technologies.
Welch: There is always a lag seen between new technology and adoption due to it not fitting exactly into the prior solution footprint. For example, moving software to cloud-hosted took a change in everything from architecture to validation approaches. But the only way to move forward is to challenge whether we keep procedures for familiarity or functionality.
Petrov: I see two major obstacles standing in the way of adopting new chromatography solutions. One is the accessibility of such solutions in terms of deployment difficulties associated with software upgrades and validation. Solutions delivered as SaaS will help lower that barrier. Another obstacle is the willingness to accept that automated decision-making can displace the human factor in industries with critical outcomes as life sciences. If you think about it, humans are trusted with certain decisions because they have been trained appropriately and have proven that they can make such decisions as discerning good from bad chromatograms. Algorithms can be trained too, and they can prove in subsequent tests that they can make such decisions. The real difference is that once properly trained, algorithms can do that day in and day out with a lot higher efficiency and a lower failure rate than humans.
Meek:There is an additional challenge, that adopting new technologies can be difficult in a regulated environment. Regulators have shown, however, that they are supportive of using technology to eliminate manual processes such as manual integration to ensure consistent and reliable results. AI does pose a particular challenge given the natural drift that can occur in models, which is why, at least for the time being, a human in loop approach that leans on the expertise of chromatographers provides the best balance.
What was the biggest accomplishment or news in 2023-2024 for Data Handling?
Anderson: Some might mention the growth in popularity of ASM or the availability of generative AI tools; however, we don’t think this area has seen the biggest accomplishment yet. Perhaps the coming months of 2024 will surprise us all.
Kleine: It is becoming easier and easier to store and handle large amounts of data. Improved computing power and network connections make this possible. Measurement results no longer need to be stored locally, making the storage space for data scalable. The large amounts of data are therefore also available over a longer period. With the help of large amounts of data, an AI can support the user in chromatography in the evaluation and interpretation of measurements.
Welch:The biggest thing in the last year must be AI. Who hasn’t read something about ChatGPT? But the foundation for AI is not really in the algorithms or the user interface, but in how AI uses large banks of data. So, data architecture, classification, cataloguing, and the design of data tagging and master lists are really where the fundamental shifts are coming. Without stable structures, AI cannot utilize the available information in a productive way.
Meek: While not “data handling” specifically, I think everyone would agree that, since its launch in November 2022, ChatGPT has dominated technology news. While generative AI may have been the focus of the media, any AI-based technology is only as good as the quality and volume of data that informs it. I think the biggest accomplishment over the past two years is the work companies are doing to build data lakes that enable them to use data science to look across research and development and from the lab to the production floor.
PETROV: Organizations in the pharmaceutical space have been able to use AI to develop novel therapeutics in drug discovery and development. Using AI to generate extremely complex molecules and then test their binding capabilities in the virtual space is a ground-breaking advancement to speed up drug discovery like never before. Over time, we expect to see this technology deployed across the product lifecycle through manufacturing.
(1) Wilkinson, M. D.; Dumontier, M.; Aalbersberg, I. J.; et al. The FAIR Guiding Principles for Scientific Data Management and Stewardship. Sci. Data 2016, 3 (1). DOI: 10.1038/sdata.2016.18
The Future of Digital Method Development: An Interview with Anne Marie Smith
December 12th 2024Following the HPLC 2024 Conference in Denver, Colorado, LCGC International spoke with Anne Marie Smith of ACD/Labs about the new ICH Q14 guidelines and how they impact analytical scientists and their work.
Inside the Laboratory: Using GC–MS to Analyze Bio-Oil Compositions in the Goldfarb Group
December 5th 2024In this edition of “Inside the Laboratory,” Jillian Goldfarb of Cornell University discusses her laboratory’s work with using gas chromatography–mass spectrometry (GC–MS) to characterize compounds present in biofuels.
RAFA 2024: Michel Suman Discusses Food Safety And Authenticity Research
November 28th 2024During RAFA 2024, Michel Suman of Barilla Spa and Catholic University Sacred Heart talked with us about his food safety and authenticity research, focusing on contaminants, adulterants, and authenticity markers in food processing.
Exploring The Chemical Subspace of RPLC: A Data-driven Approach
November 11th 2024Saer Samanipour from the Van ‘t Hoff Institute for Molecular Sciences (HIMS) at the University of Amsterdam spoke to LCGC International about the benefits of a data-driven reversed-phase liquid chromatography (RPLC) approach his team developed.