Although genomewide RNA expression analysis has become a routine tool in biomedical research, extracting biological insight from such information remains a major challenge. Here, we describe a powerful analytical method called Gene Set Enrichment Analysis (GSEA) for interpreting gene expression data. The method derives its power by focusing on gene sets, that is, groups of genes that share common biological function, chromosomal location, or regulation. We demonstrate how GSEA yields insights into several cancer-related data sets, including leukemia and lung cancer. Notably, where single-gene analysis finds little similarity between two independent studies of patient survival in lung cancer, GSEA reveals many biological pathways in common. The GSEA method is embodied in a freely available software package, together with an initial database of 1,325 biologically defined gene sets.
We have generated a molecular taxonomy of lung carcinoma, the leading cause of cancer death in the United States and worldwide. Using oligonucleotide microarrays, we analyzed mRNA expression levels corresponding to 12,600 transcript sequences in 186 lung tumor samples, including 139 adenocarcinomas resected from the lung. Hierarchical and probabilistic clustering of expression data defined distinct subclasses of lung adenocarcinoma. Among these were tumors with high relative expression of neuroendocrine genes and of type II pneumocyte genes, respectively. Retrospective analysis revealed a less favorable outcome for the adenocarcinomas with neuroendocrine gene expression. The diagnostic potential of expression profiling is emphasized by its ability to discriminate primary lung adenocarcinomas from metastases of extra-pulmonary origin. These results suggest that integration of expression profile data with clinical parameters could aid in diagnosis of lung cancer patients.
Summary Somatic mutations have been extensively characterized in breast cancer, but the effects of these genetic alterations on the proteomic landscape remain poorly understood. We describe quantitative mass spectrometry-based proteomic and phosphoproteomic analyses of 105 genomically annotated breast cancers of which 77 provided high-quality data. Integrated analyses allowed insights into the somatic cancer genome including the consequences of chromosomal loss, such as the 5q deletion characteristic of basal-like breast cancer. The 5q trans effects were interrogated against the Library of Integrated Network-based Cellular Signatures, thereby connecting CETN3 and SKP1 loss to elevated expression of EGFR, and SKP1 loss also to increased SRC. Global proteomic data confirmed a stromal-enriched group in addition to basal and luminal clusters and pathway analysis of the phosphoproteome identified a G Protein-coupled receptor cluster that was not readily identified at the mRNA level. Besides ERBB2, other amplicon-associated, highly phosphorylated kinases were identified, including CDK12, PAK1, PTK2, RIPK2 and TLK2. We demonstrate that proteogenomic analysis of breast cancer elucidates functional consequences of somatic mutations, narrows candidate nominations for driver genes within large deletions and amplified regions, and identifies therapeutic targets.
Better biomarkers are urgently needed to improve diagnosis, guide molecularly targeted therapy and monitor activity and therapeutic response across a wide spectrum of disease. Proteomics methods based on mass spectrometry hold special promise for the discovery of novel biomarkers that might form the foundation for new clinical blood tests, but to date their contribution to the diagnostic armamentarium has been disappointing. This is due in part to the lack of a coherent pipeline connecting marker discovery with well-established methods for validation. Advances in methods and technology now enable construction of a comprehensive biomarker pipeline from six essential process components: candidate discovery, qualification, verification, research assay optimization, biomarker validation and commercialization. Better understanding of the overall process of biomarker discovery and validation and of the challenges and strategies inherent in each phase should improve experimental study design, in turn increasing the efficiency of biomarker development and facilitating the delivery and deployment of novel clinical tests.
SUMMARY To provide a detailed analysis of the molecular components and underlying mechanisms associated with ovarian cancer, we performed a comprehensive mass spectrometry-based proteomic characterization of 174 ovarian tumors previously analyzed by The Cancer Genome Atlas (TCGA), of which 169 were high-grade serous carcinomas (HGSC). Integrating our proteomic measurements with the genomic data yielded a number of insights into disease such as how different copy number alternations influence the proteome, the proteins associated with chromosomal instability, the sets of signaling pathways that diverse genome rearrangements converge on, as well as the ones most associated with short overall survival. Specific protein acetylations associated with homologous recombination deficiency suggest a potential means for stratifying patients for therapy. In addition to providing a valuable resource, these findings provide a view of how the somatic genome drives the cancer proteome and associations between protein and post-translational modification levels and clinical outcomes in HGSC.
Highlights d Systematic identification of colon cancer-associated proteins and phosphosites d Proteomics-supported neoantigens and cancer/testis antigens in 78% of the tumors d Rb phosphorylation is an oncogenic driver and a putative target in colon cancer d Glycolysis inhibition may render MSI tumors more sensitive to checkpoint blockade
We report a mass spectrometry–based method for the integrated analysis of protein expression, phosphorylation, ubiquitination and acetylation by serial enrichments of different post-translational modifications (SEPTM) from the same biological sample. this technology enabled quantitative analysis of nearly 8,000 proteins and more than 20,000 phosphorylation, 15,000 ubiquitination and 3,000 acetylation sites per experiment, generating a holistic view of cellular signal transduction pathways as exemplified by analysis of bortezomib-treated human leukemia cells.
Here we present an optimized workflow for global proteome and phosphoproteome analysis of tissues or cell lines that uses isobaric tags (TMT (tandem mass tags)-10) for multiplexed analysis and relative quantification, and provides 3× higher throughput than iTRAQ (isobaric tags for absolute and relative quantification)-4-based methods with high intra- and inter-laboratory reproducibility. The workflow was systematically characterized and benchmarked across three independent laboratories using two distinct breast cancer subtypes from patient-derived xenograft models to enable assessment of proteome and phosphoproteome depth and quantitative reproducibility. Each plex consisted of ten samples, each being 300 μg of peptide derived from <50 mg of wet-weight tissue. Of the 10,000 proteins quantified per sample, we could distinguish 7,700 human proteins derived from tumor cells and 3100 mouse proteins derived from the surrounding stroma and blood. The maximum deviation across replicates and laboratories was <7%, and the inter-laboratory correlation for TMT ratio-based comparison of the two breast cancer subtypes was r > 0.88. The maximum deviation for the phosphoproteome coverage was <24% across laboratories, with an average of >37,000 quantified phosphosites per sample and differential quantification correlations of r > 0.72. The full procedure, including sample processing and data generation, can be completed within 10 d for ten tissue samples, and 100 samples can be analyzed in −4 months using a single LC-MS/MS instrument. The high quality, depth, and reproducibility of the data obtained both within and across laboratories should enable new biological insights to be obtained from mass spectrometry-based proteomics analyses of cells and tissues together with proteogenomic data integration.
scite is a Brooklyn-based organization that helps researchers better discover and understand research articles through Smart Citations–citations that display the context of the citation and describe whether the article provides supporting or contrasting evidence. scite is used by students and researchers from around the world and is funded in part by the National Science Foundation and the National Institute on Drug Abuse of the National Institutes of Health.
hi@scite.ai
10624 S. Eastern Ave., Ste. A-614
Henderson, NV 89052, USA
Copyright © 2024 scite LLC. All rights reserved.
Made with 💙 for researchers
Part of the Research Solutions Family.