- Open Access
ProSave: an application for restoring quantitative data to manipulated subsets of protein lists
Source Code for Biology and Medicine volume 13, Article number: 3 (2018)
In proteomics studies, liquid chromatography tandem mass spectrometry data (LC-MS/MS) is quantified by spectral counts or by some measure of ion abundance. Downstream comparative analysis of protein content (e.g. Venn diagrams and network analysis) typically does not include this quantitative data and critical information is often lost. To avoid loss of spectral count data in comparative proteomic analyses, it is critical to implement a tool that can rapidly retrieve this information.
We developed ProSave, a free and user-friendly Java-based program that retrieves spectral count data from a curated list of proteins in a large proteomics dataset. ProSave allows for the management of LC-MS/MS datasets and rapidly retrieves spectral count information for a desired list of proteins.
ProSave is open source and freely available at https://github.com/MahajanLab/ProSave. The user manual, implementation notes, and description of methodology and examples are available on the site.
Shotgun proteomic analysis is frequently used in translational biomedical research [1,2,3,4,5]. Mass spectrometry-based experiments generate large amounts of data, and the complexity and volume of this data is increasing with time. One promising application of shotgun proteomics is the molecular characterization of diseased tissue samples to identify biomarkers or drug targets . We have applied this method to numerous vitreoretinal diseases where there are few therapeutic options [7, 8]. Liquid biopsies (e.g. vitreous or aqueous humor) can be taken at the time of surgery (Fig. 1a) [8,9,10]. These liquid biopsies can then be processed and analyzed using liquid chromatography-tandem mass spectrometry (LC-MS/MS) to evaluate protein content (Fig. 1b–c) . Highly-advanced algorithms can match protein IDs to the thousands of peptide mass-spectral data obtained during the experiment (Fig. 1d) [12,13,14,15]. This quantitative data is typically represented in terms of spectral counts or ion abundance (Fig. 1e). Downstream analysis, organization, and meaningful interpretation of this LC-MS/MS data remains a challenge for researchers. Identified proteins can be further categorized using Venn diagrams, gene ontology (GO) categorization, clustering analysis, molecular pathway representation, and protein interaction network analysis (Fig. 1f) [1, 16, 17]. However, these analyses frequently make use of only the protein ID lists and the quantitative data (e.g. label-free spectral counts) is often ignored (Fig. 1g). This can create issues for investigators attempting to make meaningful interpretations of these results, especially if they are unfamiliar with shell scripting or lack access to expensive bioinformatics suites (e.g. Ingenuity or Partek). To overcome this barrier, we created ProSave, a Java-based application that restores quantitative data to manipulated lists of protein IDs from larger shotgun proteomics datasets (Fig. 1h–i). ProSave is different from other currently-available bioinformatic tools: it is free, open-source, and user-friendly (as opposed to R/Bioconductor).
ProSave was developed using Java and was successfully tested on Microsoft Windows 10 and Mac OS Sierra ver.10.12.6. It was written to maintain quantitative protein data (e.g. spectral counts, protein intensity, etc.) that was otherwise lost when protein ID lists were compared between tissue samples during proteomic analysis, which excludes all numerical protein data and focuses solely on the protein IDs derived from the liquid biopsies. ProSave solves this problem and restores critical protein information lost during analysis by processing original protein data before it is manipulated by downstream comparative analysis, such as Venn diagrams or gene ontology (GO) and network analysis. ProSave is a tool that is useful beyond proteomics research. It was designed to work with any large-scale gene or protein expression analysis. Further, ProSave works with protein expression data from a variety of methods, including data obtained through data-dependent and data-independent acquisition (DDA and DIA) as well as labeled methods like iTRAQ (isobaric tag for relative and absolute quantification) and SILAC (stable isotope labeling with amino acids in cell culture).
ProSave is a free, open source software available at https://github.com/MahajanLab/ProSave/. Additionally, java class files can be extracted from the ProSave.jar file for modification. The ProSaveGUI class creates the ProSave object and sets some graphical user interface (GUI) parameters (Fig. 2a). The ProSave class creates the framework and manages layout of the GUI (Fig. 2b). The Protein class is used to handle different types or amounts of data relating to each individual protein (Fig. 2c). The program processes the original data file by inserting data into a nested HashMap structure, executed by the ReadProteinData class (Fig. 2d). The ReadProtein class (Fig. 2e) uses the hashing structure for rapid data lookup. All GUI layout and interface parameters are specified in the ProSave class (Fig. 2b), which also has an internal class for event handling (Fig. 2f).
ProSave has been designed to be applied as a tool for any large-scale gene or protein expression investigation. Below are steps on how to use ProSave on any compatible data set:
Step 1: Download ProSave.jar from https://github.com/MahajanLab/ProSave/ and run ProSave by opening the downloaded file (Fig. 3a). Additionally, download Java if it is not already downloaded.
Step 2: Make a .txt with the original data. To do this from Excel go to File>Export>Change File Type>Text>Save. Once ProSave opens, click ‘Choose File’ to add the .txt file of the original data. For proper function, insure all columns have one-word names and text begins on first row of the .txt file (Fig. 3b).
Step 3: Enter a list of protein IDs in the textbox labeled ‘Enter protein IDs’, then click ‘Continue’ (Fig. 3c).
Step 4: Click the button labels with the name of the column of data corresponding to the tissue for comparison.
Step 5: Get restored data from the text box labeled ‘Restored protein-data pairs’ (Fig. 3d).
We tested ProSave on a comparative proteomics dataset of anatomical regions of the human retina: the peripheral retina, juxta-macular, and foveomacular regions . LC-MS/MS was performed on retinal punch biopsies using an LTQ Velos and data were acquired using the DDA acquisition method as previously described. [18, 19] We identified 1,779 ± 51 individual proteins in the peripheral retina, 1,999 ± 46 individual proteins juxta-macular region, and 1,974 ± 92 individual proteins in the foveomacular region. Data were organized and analyzed using comparative analyses (e.g. Venn diagrams, differential protein expression, pathway representation, etc.). Protein ID lists from each tissue sample were compared using Venn diagrams to identify shared and unique proteins among the different regions of the retina. This analysis identified 1,354 proteins shared among the three retinal regions. After this comparison, however, only protein IDs remained, and the protein expression levels were not available for interpretation. Using ProSave, spectral count data was restored to this list of 1,354 proteins and we were able to ascertain the most abundant proteins shared among the three groups: alpha- and gamma-enolase, tubulin, pyruvate kinase, creatine kinase b-type, vimentin, glyceraldehyde-3-phosphate dehydrogenase, and histone H2B (types 1-D and G) . A similar approach was used to gather information on the most abundant proteins unique to each anatomical region .
Without protein abundance data, insights into significant similarities or differences in retinal tissue protein expression are ambiguous. To avoid such data loss, one could attempt the tedious and time-consuming task of interrogating the original dataset to restore quantitative data for each protein of interest. Instead, ProSave accomplishes the same task in a matter of seconds instead of hours or days. We applied ProSave to our shared and unique protein lists to restore spectral count data. This gave us insight into which proteins were most and least abundant, thus allowing us to increase our understanding of targeted tissues.
In conclusion, ProSave is a free and user-friendly tool to restore quantitative data to manipulated subsets of protein IDs during analysis of proteomic data. It speeds up the workflow for proteomic bioinformatics and makes for meaningful interpretation of comparative data. We anticipate that ProSave will be a useful tool to simplify processing and analysis of translational proteomics data. Such a program could even be applied to other gene/protein expression platforms where comparative analyses make use of only gene/protein IDs (e.g. RNA-seq, microarrays, ELISA).
Availability and requirements
Project name: ProSave
Project home page: https://github.com/MahajanLab/ProSave
Operating system(s): Platform independent
Programming language: Java
Other requirements: None
Any restrictions to use by non-academics: None
Graphical user interface
Isobaric tag for relative and absolute quantification
Liquid chromatography-tandem mass spectrometry
Stable isotope labeling with amino acids in cell culture
Mahajan VB, Skeie JM. Translational vitreous proteomics. Proteomics Clin Appl. 2014;8(3–4):204–8.
Duarte TT, Spencer CT. Personalized proteomics: the future of precision medicine. Proteomes. 2016;4(4):29.
Skeie JM, Roybal CN, Mahajan VB. Proteomic insight into the molecular function of the vitreous. PLoS One. 2015;10(5):e0127567.
Skeie JM, Mahajan VB. Proteomic landscape of the human choroid-retinal pigment epithelial complex. JAMA Ophthalmol. 2014;132(11):1271–81.
Skeie JM, Mahajan VB. Proteomic interactions in the mouse vitreous-retina complex. PLoS One. 2013;8(11):e82140.
Velez G, Tang PH, Cabral T, Cho GY, Machlab DA, Tsang SH, Bassuk AG, Mahajan VB. Personalized proteomics for precision health: identifying biomarkers of vitreoretinal disease. Trans Vis Sci Tech. 2018;7(5):12.
Velez G, Bassuk AG, Colgan D, Tsang SH, Mahajan VB. Therapeutic drug repositioning using personalized proteomics of liquid biopsies. JCI Insight. 2017;2(24):e97818.
Velez G, Roybal CN, Colgan D, Tsang SH, Bassuk AG, Mahajan VB. Precision medicine: personalized proteomics for the diagnosis and treatment of idiopathic inflammatory disease. JAMA Ophthalmol. 2016;134(4):444–8.
Velez G, Roybal CN, Binkley E, Bassuk AG, Tsang SH, Mahajan VB. Proteomic analysis of elevated intraocular pressure with retinal detachment. Am J Ophthalmol Case Rep. 2017;5:107–10.
Skeie JM, Brown EN, Martinez HD, Russell SR, Birkholz ES, Folk JC, Boldt HC, Gehrs KM, Stone EM, Wright ME, et al. Proteomic analysis of vitreous biopsy techniques. Retina. 2012;32(10):2141–9.
Skeie JM, Tsang SH, Zande RV, Fickbohm MM, Shah SS, Vallone JG, Mahajan VB. A biorepository for ophthalmic surgical specimens. Proteomics Clin Appl. 2014;8(3–4):209–17.
Geer LY, Markey SP, Kowalak JA, Wagner L, Xu M, Maynard DM, Yang X, Shi W, Bryant SH. Open mass spectrometry search algorithm. J Proteome Res. 2004;3(5):958–64.
Bjornson RD, Carriero NJ, Colangelo C, Shifman M, Cheung KH, Miller PL, Williams K. X!!Tandem, an improved method for running X!Tandem in parallel on collections of commodity computers. J Proteome Res. 2008;7(1):293–9.
Yen CY, Meyer-Arendt K, Eichelberger B, Sun S, Houel S, Old WM, Knight R, Ahn NG, Hunter LE, Resing KA. A simulated MS/MS library for spectrum-to-spectrum searching in large scale identification of proteins. Mol Cell Proteomics. 2009;8(4):857–69.
Perkins DN, Pappin DJ, Creasy DM, Cottrell JS. Probability-based protein identification by searching sequence databases using mass spectrometry data. Electrophoresis. 1999;20(18):3551–67.
Thomas PD, Campbell MJ, Kejariwal A, Mi H, Karlak B, Daverman R, Diemer K, Muruganujan A, Narechania A. PANTHER: a library of protein families and subfamilies indexed by function. Genome Res. 2003;13(9):2129–41.
Szklarczyk D, Morris JH, Cook H, Kuhn M, Wyder S, Simonovic M, Santos A, Doncheva NT, Roth A, Bork P, et al. The STRING database in 2017: quality-controlled protein-protein association networks, made broadly accessible. Nucleic Acids Res. 2017;45(D1):D362–8.
Velez G, Machlab DA, Tang PH, Sun Y, Tsang SH, Bassuk AG, Mahajan VB. Proteomic analysis of the human retina reveals region-specific susceptibilities to metabolic- and oxidative stress-related diseases. PLoS One. 2018;13(2):e0193250.
Cabral T, Toral MA, Velez G, DiCarlo JE, Gore AM, Mahajan M, Tsang SH, Bassuk AG, Mahajan VB. Dissection of human retina and RPE-choroid for proteomic analysis. J Vis Exp. 2017;(129). https://doi.org/10.3791/56203.
VBM and AGB are supported by NIH grants [R01EY026682, R01EY024665, R01EY025225, R01EY024698, R21AG050437, and P30EY026877], VBM is also supported by the Doris Duke Charitable Foundation Grant #2013103, and Research to Prevent Blindness (RPB), New York, NY. GV is supported by NIH grants [F30EYE027986 and T32GM007337].
Availability of data and materials
The datasets generated during and/or analyzed during the current study are available from the corresponding author on reasonable request.
Ethics approval and consent to participate
Consent for publication
The authors declare that they have no competing interests.
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
About this article
Cite this article
Machlab, D.A., Velez, G., Bassuk, A.G. et al. ProSave: an application for restoring quantitative data to manipulated subsets of protein lists. Source Code Biol Med 13, 3 (2018). https://doi.org/10.1186/s13029-018-0070-0
- Precision medicine