Special Issues
Fred E. Regnier
Proteins are the workhorses of cells, obviously requiring a high level of complexity; but how complex? Originally it was thought there would be a close relationship between the ~20,000 protein-coding genes in the human genome and the number of expressed proteins. Wrong! Through a variety of new methods including mass spectrometry (MS) sequencing it is now predicted there could be 250,000 to 1 million proteins in the human proteome (1).
But what does this have to do with chromatography? Liquid chromatography (LC) has played a pivotal role in discovering, identifying, and quantifying the components in living systems for more than a century. The question being explored here is whether that is likely to continue or if LC will become a historical footnote as the MS community suggests.
First, what is a proteoform? We know that during protein synthesis a protein-coding gene provides the blueprint for a family of closely related structural isoforms arising from small, regulated variations in their synthesis involving alternative splicing (2) and more than 200 types of post-translational modification (PTM) (3). This process can lead to a proteoform family of 100 members (4), many of which differ in biological function. The human genome gets more “bang” per protein-coding gene in this way. Smith and Kelleher proposed the name “proteoform” for these structural isoforms in 2013 (5).
An important issue is how these high levels of proteoform complexity were predicted. The idea arose from the identification of splice variant sites and large numbers of PTMs in peptides derived from trypsin digests, often supported by top-down sequence analysis of intact proteins by MS (6). The use of gas-phase ions to identify sites and types of modifications in the primary structure of a protein is of great value, but it must be accompanied by structure, function, and interaction partner (7,8) analysis of proteoforms in vivo. This combined analysis is needed because life occurs in an aqueous world.
There is the impression that the discovery, isolation, and characterization of proteins is highly evolved. Actually, fewer than 100,000 human proteins have probably been isolated and characterized. If the number of proteoforms predicted is accurate, less than half have been isolated and characterized. Protein isolation is inefficient. A breakthrough in separation technology is needed.
Protein peak capacities are no more than a few hundred in most forms of LC; suggesting peaks from a 1-million-component mixture could potentially bear 1000 proteins. Multidimensional separation methods are an obvious approach, but comprehensive structure analysis of a 200 × 200 fraction set to find proteoforms would be formidable. That has always been a problem. Obviously particle size, theoretical plates, and peak capacity tweaks will not solve this problem either. Moreover, structure selectivity of ion-exchange chromatography, hydrophobic interaction chromatography, reversed-phase LC, and immobilized metal affinity chromatography is poor. Proteins of completely different structure are coeluted.
Probing deeper, there is hope for this seemingly intractable problem. The fact that proteoforms arise from a single gene means they are cognates with multiple, identical structural features. A stationary phase that could recognize these shared features would make it possible to capture a proteoform family; theoretically reducing 1-million-component mixtures to fewer than 100 components in a single step. This possibility is of enormous significance. Species of no interest would be rejected while selected proteins would likely be structurally related with the exception of a few nonspecifically bound (NSB) proteins. In this scenario, the poor structure-specific selectivity of current LC columns would be an asset, fractionating family members based on other structural features. Moreover, top-down MS would identify structural differences and NSB proteins.
The big question is how to obtain such a magical, structure-selective stationary phase. Surprisingly, they already exist; an immobilized polyclonal antibody (pAb) interrogates multiple features (epitopes) of a protein, making it highly probable that features common to all proteoforms in a family would be recognized and selected. Family-specific monoclonal antibodies (mAbs) do the same, but only recognize a single shared epitope.
Production of a pAb targeting common proteome family epitopes can be achieved by using any member of an existing family as an immunogen. Thousands of pAbs are already available.
Proteins that have never been isolated present a larger problem. There is no family member to use as an immunogen. The new field of antibody-based proteomics (9–11) addresses this problem by using protein fragment libraries to obtain immunogens. The rationale is that the DNA sequence of a protein coding gene predicts 6–15 amino acid fragments of a protein family that when synthesized and attached to a large immunogen will sometimes produce antibodies that recognize common epitopes of the family.
Based on the need for fractionation in determining the structure and function of so many proteins, the future of LC in the life sciences seems bright, but with some enjoinments. Clearly, affinity selector acquisition and use is a major opportunity. The application of a family-selective phase in the first fractionation step would allow rejection of untargeted proteins while directing those of interest into higher-order fractionation steps. Fortunately, engineering and production of the requisite antibodies for implementing this approach to protein analysis is receiving increasing attention (9–11). Finally, new ways must be found to use affinity selectors in protein fractionation that circumvent covalent immobilization. The necessity to covalently bind ~20,000 different affinity selectors to achieve the goals noted above is inconceivable.
References
Fred E. Regnier and JinHee Kim are with Novilytic at the Kurz Purdue Technology Center (KPTC) in West Lafayette, Indiana.
Best of the Week: Food Analysis, Chemical Migration in Plastic Bottles, STEM Researcher of the Year
December 20th 2024Top articles published this week include the launch of our “From Lab to Table” content series, a Q&A interview about using liquid chromatography–high-resolution mass spectrometry (LC–HRMS) to assess chemical hazards in plastic bottles, and a piece recognizing Brett Paull for being named Tasmanian STEM Researcher of the Year.
Using LC-MS/MS to Measure Testosterone in Dried Blood Spots
December 19th 2024Testosterone measurements are typically performed using serum or plasma, but this presents several logistical challenges, especially for sample collection, storage, and transport. In a recently published article, Yehudah Gruenstein of the University of Miami explored key insights gained from dried blood spot assay validation for testosterone measurement.
Determination of Pharmaceuticals by Capillary HPLC-MS/MS (Dec 2024)
December 19th 2024This application note demonstrates the use of a compact portable capillary liquid chromatograph, the Axcend Focus LC, coupled to an Agilent Ultivo triple quadrupole mass spectrometer for quantitative analysis of pharmaceutical drugs in model aqueous samples.