2016
DOI: 10.1038/nmeth.3902
|View full text |Cite
|
Sign up to set email alerts
|

Recognizing millions of consistently unidentified spectra across hundreds of shotgun proteomics datasets

Abstract: Mass spectrometry (MS) is the main technology used in proteomics approaches. However, on average 75% of spectra analysed in an MS experiment remain unidentified. We propose to use spectrum clustering at a large-scale to shed a light on these unidentified spectra. PRoteomics IDEntifications database (PRIDE) Archive is one of the largest MS proteomics public data repositories worldwide. By clustering all tandem MS spectra publicly available in PRIDE Archive, coming from hundreds of datasets, we were able to cons… Show more

Help me understand this report

Search citation statements

Order By: Relevance

Paper Sections

Select...
2
1
1
1

Citation Types

3
209
0

Year Published

2016
2016
2018
2018

Publication Types

Select...
7

Relationship

2
5

Authors

Journals

citations
Cited by 151 publications
(212 citation statements)
references
References 40 publications
(58 reference statements)
3
209
0
Order By: Relevance
“…In addition, only the peptides that were longer than 9 Aa were screened out for SAP detection (14). Next, we mapped those identified peptides to protein sequences, only the matched peptides that contained the same variant Aa as that of in protein sequences were considered as SAP peptides preliminarily.…”
Section: Methodsmentioning
confidence: 99%
See 1 more Smart Citation
“…In addition, only the peptides that were longer than 9 Aa were screened out for SAP detection (14). Next, we mapped those identified peptides to protein sequences, only the matched peptides that contained the same variant Aa as that of in protein sequences were considered as SAP peptides preliminarily.…”
Section: Methodsmentioning
confidence: 99%
“…Although these studies have facilitated the exploration of SAPs to certain extent, they do not focus on the variant spectra. With the rapid development of spectral library searching methods in peptide identification, a growing number of studies have paid attention to the peptides/proteins as well as the corresponding spectra, for example, the National Institute of Science of Technology (NIST), which provides a widely used resource for spectral library searching, contains over 719 338 mass spectra (3 May 2016) (13), the European Bioinformatics Institute (EBI)- PRoteomics IDEntifications (PRIDE) database also provides updated spectral library recently (14). However, the spectral library from Peptide Atlas has not been updated yet since 2013 (15).…”
Section: Introductionmentioning
confidence: 99%
“…on a variety of proteins. But many additional, unknown modifications likely lurk in acquired spectra that await identification, which are the subject of ongoing developments such as using cascade search, open search, or spectral clustering approaches [62, 63]. In addition to classical studies of phosphorylation and ubiquitination, improved methods to isolate and identify PTMs have fueled investigations into many different kinds of modifications including glycosylation, acetylation, sumoylation, and oxidative modifications that are now known to play critical and indispensable roles in the regulations of core aspects of cardiac physiology.…”
Section: Examples and Frontiers In Cardiovascular Applicationsmentioning
confidence: 99%
“…It is estimated that up to 75–85 % of mass spectra generated in a proteomics experiment can remain unidentified by current data analysis workflows [62, 86], thus leaving room for continuous growth through better bioinformatics in the near future. Currently the unidentified “junk” spectra are mostly siloed or discarded, thus they constitute a major untapped source of biomedical big data.…”
Section: Outlooks: Emergence Of Proteomics Big Datamentioning
confidence: 99%
See 1 more Smart Citation