LCGC North America
A multilaboratory collaborative study organized by the Human Proteome Organization demonstrated that participating laboratories had difficulty in identifying components of a simple protein mixture.
The previous installment of this column (1) surveyed the challenges in obtaining high quality results in bottom-up proteomics, the sources of variability in proteomics experiments, and the difficulty in comparing results obtained from different laboratories using different sample preparation procedures, different instrument platforms, and different bioinformatic software. Five organizations were identified that have programs in place for standardizing proteomics workflows. These are the Association of Biomolecular Research Facilities (ABRF), the Biological Reference Material Initiative (BRMI), Clinical Proteomic Technology Assessment for Cancer (CPTAC), the Fixing Proteomics Campaign, and the Human Proteome Organization (HUPO). At the time of writing, the HUPO Test Sample Working Group had completed a collaborative study on protein identification but the results were not published until after the column had gone to press (2). This installment of "Directions in Discovery" will review the results of the study, as they clearly reveal the sources of variability in bottom-up proteomics and point to the road ahead in standardizing proteomics workflows.
Tim Wehr
The HUPO Test Sample
The HUPO sample consisted of 20 human proteins in the mass range of 32–110 kDa. To create the sample, candidate sequences were selected from the open reading frame collection and the mammalian gene collection, expressed in E. coli, and purified using preparative sodium dodecyl sulfate–polyacrylamide gel electrophoresis (SDS-PAGE) or 2D high performance liquid chromatography (HPLC) (anion-exchange and reversed-phase chromatography). Purity of the proteins was determined to be 95% or greater by 1D SDS-PAGE. Quality and stability of the test sample was confirmed by mass spectrometry (MS) analysis. All of the 20 proteins were selected to contain at least one unique tryptic peptide of 1250 ±5 Da, each with a different amino acid sequence. This feature was designed to test for peptide undersampling derived from the data-dependent acquisition methods used by most bottom-up LC–MS protocols.
Sample Distribution to Collaborators
The 20-component test sample was distributed to 27 laboratories selected for their expertise in proteomics techniques. Of these, 24 were academic or industrial research laboratories or core facilities, while three were instrument vendors. Sample recipients were instructed to identify all 20 proteins and all 22 unique peptides with mass 1250 ±5 Da and to report results to the lead investigator of the Test Sample Working Group. Participants were allowed to use procedures and instrumentation they routinely employed in their laboratories so that effectiveness of different workflows could be assessed. To minimize variability in data matching and reporting, participants were requested to use the same version of the NCBI nonredundant human protein database.
Initial Study Results
In the initial reports returned to the Test Sample Working Group, only seven of the 27 participating laboratories identified all 20 proteins. The remaining 20 laboratories experienced a variety of problems. The first group (seven laboratories) reported naming errors in the protein identifications. The second group (six laboratories) reported naming errors, false positives, and redundant identifications. The remaining group of seven laboratories experienced several problems. These included trypsinization problems, undersampling, incomplete matching of MS spectra due to acrylamide alkylation, database search errors, and use of overly stringent search criteria.
Results for the peptide sequences were even more problematical; only one of the 27 laboratories reported detection of all 22 peptides. Six of the 22 peptides contained cysteine residues, which are modified in the reduction and alkylation steps performed before trypsin digestion. Only three additional laboratories reported detection of any of the cysteine-containing peptides. Several laboratories incorrectly reported 1250-Da peptides arising from contaminating proteins or missed trypsin cleavage.
Transfer of Data to Tranche and PRIDE
To facililate centralized analysis of study data, participants were asked to submit their results to Tranche. Tranche, in use since 2006, is a free, open-source file-sharing tool that enables collections of computers to easily share data sets and can handle very large data sets. Tranche is structured as a peer-to-server-to-peer distributed network. For the HUPO study, submitted information included raw MS data, methodologies, peak lists, peptide statistics, and protein identifications. After submission to Tranche, a copy of all data was transferred to PRIDE. PRIDE (PRoteomics IDEntifications) is a centralized, standards- compliant public data repository for proteomics data. It was designed to provide the proteomics community with a public repository for protein and peptide identifications together with supporting evidence for the identifications.
Figure 1: Number of tandem mass spectra assigned to tryptic peptides. Comparison of protein abundance from the centralized analysis of raw data collected from the participating laboratories (a) before and (b) following removal of individual laboratory contaminants. Adapted from reference 2.
Centralized Analysis of Study Data
Following downloading to Tranche, the centralized data was analyzed collectively to assign probabilities to identifications, determine total number of assigned tandem MS spectra, number of distinct peptides, and amino acid coverage. Inspection of the raw data revealed that the majority of participating laboratories had generated data of satisfactory quality to identify all 20 proteins and most of the 22 1250-Da peptides. Centralized data analysis provided several additional insights:
Figure 2: Peptide heat map representation for each of the 20 proteins from the centralized analysis of raw data from participating laboratories, showing frequency of observation of a given peptide and its position in the protein sequence. Red tones: redundant tryptic peptides excluding 1250-Da peptides; purple tones: redundant 1250-Da peptides. Adapted from reference 2.
Implications for the Proteomics Community
This study demonstrated that, even with a simple mixture of 20 proteins, the majority of the participating laboratories had difficulty in correctly identifying the components. Centralized analysis of the data revealed that these laboratories had generated tandem MS data of sufficient quality to identify all of the proteins and most of the 1250-Da peptides. It also identified database problems as a major source of error. Due to the construction of the database, the search engines employed by participants were unable to differentiate between multiple identifiers for the same protein, and manual curation of MS data was needed for correct reporting. The Working Group noted that search engines employed different algorithms for calculation of molecular weight and recommended that a common method be adopted. The study organizers provided additional recommendations based upon the results of the study:
Conclusion
The HUPO Test Working Group study is distinct from other collaborative studies of protein identification (3). First, the component proteins each contained a peptide of similar size to test for the ability of the mass spectrometer to reproducibly sample precursor ions. Second, participants received feedback from the working group on technical problems encountered in the initial analysis, and recommendations for improvement. Third, the working group performed centralized analysis of the combined data sets, which permitted discrimination of factors related to data generation versus data analysis. There are three key outcomes of this study that are important for the proteomics community. First, it demonstrates that a variety of instruments and workflows can generate tandem MS data of sufficient quality for protein identification. Second, operator training and expertise are critical for successful proteomics experiments. Third, environmental contamination can compromise data quality, particularly for gel-based workflows. Good laboratory practice including analysis of controls and blanks is necessary. Fourth, variations in database construction and curation must be addressed to allow proteomics researchers to obtain consistent results.
The simple equimolar 20-protein mixture used in the HUPO study hardly represents the complexity of a typical proteomics sample, which can contain hundreds of thousands of analytes covering several orders of magnitude in abundance. However, it did serve to illuminate factors that compromise data quality and to provide guidelines for improving performance in proteomics studies.
Tim Wehr "Directions in Discovery" editor Tim Wehr is staff scientist at Bio-Rad Laboratories, Hercules, California. Direct correspondence about this column to "Directions in Discovery," LCGC, Woodbridge Corporate Plaza, 485 Route 1 South, Building F, First Floor, Iselin, NJ 08830, e-mail lcgcedit@lcgcmag.com.
References
(1) T. Wehr, LCGC 27 (7), 558–562 (2009).
(2) A.W. Bell, E.W. Deutsch, C. E. Au, R.E. Kearney, R. Beavis, S. Sechi, T. Nilsson, J.J.M Bergeron, and the HUPO Test Sample Working Group, Nature Methods 6, 423–429 (2009).
(3) R. Aebersold, Nature Methods 6, 411–412 (2009).
New Study Uses MSPE with GC–MS to Analyze PFCAs in Water
January 20th 2025Scientists from the China University of Sciences combined magnetic solid-phase extraction (MSPE) with gas chromatography–mass spectrometry (GC–MS) to analyze perfluoro carboxylic acids (PFCAs) in different water environments.
The Next Frontier for Mass Spectrometry: Maximizing Ion Utilization
January 20th 2025In this podcast, Daniel DeBord, CTO of MOBILion Systems, describes a new high resolution mass spectrometry approach that promises to increase speed and sensitivity in omics applications. MOBILion recently introduced the PAMAF mode of operation, which stands for parallel accumulation with mobility aligned fragmentation. It substantially increases the fraction of ion used for mass spectrometry analysis by replacing the functionality of the quadrupole with high resolution ion mobility. Listen to learn more about this exciting new development.
A Guide To Finding the Ideal Syringe and Needle
January 20th 2025Hamilton has produced a series of reference guides to assist science professionals in finding the best-suited products and configurations for their applications. The Syringe and Needle Reference Guide provides detailed information on Hamilton Company’s full portfolio of syringes and needles. Everything from cleaning and preventative maintenance to individual part numbers are available for review. It also includes selection charts to help you choose between syringe terminations like cemented needles and luer tips.
The Complexity of Oligonucleotide Separations
January 9th 2025Peter Pellegrinelli, Applications Specialist at Advanced Materials Technology (AMT) explains the complexity of oligonucleotide separations due to the unique chemical properties of these molecules. Issues such as varying length, sequence complexity, and hydrophilic-hydrophobic characteristics make efficient separations difficult. Separation scientists are addressing these challenges by modifying mobile phase compositions, using varying ion-pairing reagents, and exploring alternative separation modes like HILIC and ion-exchange chromatography. Due to these complexities, AMT has introduced the HALO® OLIGO column, which offers high-resolution, fast separations through its innovative Fused-Core® technology and high pH stability. Alongside explaining the new column, Peter looks to the future of these separations and what is next to come.
Oasis or Sand Dune? Isolation of Psychedelic Compounds
January 20th 2025Magic mushrooms, once taboo, have recently experienced a renaissance. This new awakening is partially due to new findings that indicate the effects of psilocybin, and its dephosphorylated cousin psilocin may produce long lasting results for patients who might be struggling with anxiety, depression, alcohol and drug abuse, and post-traumatic stress disorder. Hamilton Company has developed a methodology for the isolation and identification of 5 common psychedelic compounds used in the potential treatment of disease. The PRP-1 HPLC column resin remains stable in the harsh alkaline conditions ideal for better separations.