Show simple item record

dc.contributor.author
Omasits, Ulrich
dc.contributor.author
Varadarajan, Adithi R.
dc.contributor.author
Schmid, Michael
dc.contributor.author
Goetze, Sandra
dc.contributor.author
Melidis, Damianos
dc.contributor.author
Bourqui, Marc
dc.contributor.author
Nikolayeva, Olga
dc.contributor.author
Québatte, Maxime
dc.contributor.author
Patrignani, Andrea
dc.contributor.author
Dehio, Christoph
dc.contributor.author
Frey, Juerg E.
dc.contributor.author
Robinson, Mark D.
dc.contributor.author
Wollscheid, Bernd
dc.contributor.author
Ahrens, Christian H.
dc.date.accessioned
2020-06-18T05:35:31Z
dc.date.available
2018-02-16T08:36:28Z
dc.date.available
2018-02-16T08:36:01Z
dc.date.available
2018-01-03T02:39:33Z
dc.date.available
2018-01-17T14:27:39Z
dc.date.available
2017-12-28T03:42:17Z
dc.date.available
2018-01-17T14:28:40Z
dc.date.available
2018-02-01T22:57:33Z
dc.date.available
2018-03-08T07:29:06Z
dc.date.available
2018-11-01T09:45:26Z
dc.date.available
2020-06-18T05:35:31Z
dc.date.issued
2017
dc.identifier.issn
1088-9051
dc.identifier.issn
1549-5469
dc.identifier.other
10.1101/gr.218255.116
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/242247
dc.identifier.doi
10.3929/ethz-b-000224984
dc.description.abstract
Accurate annotation of all protein-coding sequences (CDSs) is an essential prerequisite to fully exploit the rapidly growing repertoire of completely sequenced prokaryotic genomes. However, large discrepancies among the number of CDSs annotated by different resources, missed functional short open reading frames (sORFs), and overprediction of spurious ORFs represent serious limitations. Our strategy toward accurate and complete genome annotation consolidates CDSs from multiple reference annotation resources, ab initio gene prediction algorithms and in silico ORFs (a modified six-frame translation considering alternative start codons) in an integrated proteogenomics database (iPtgxDB) that covers the entire protein-coding potential of a prokaryotic genome. By extending the PeptideClassifier concept of unambiguous peptides for prokaryotes, close to 95% of the identifiable peptides imply one distinct protein, largely simplifying downstream analysis. Searching a comprehensive Bartonella henselae proteomics data set against such an iPtgxDB allowed us to unambiguously identify novel ORFs uniquely predicted by each resource, including lipoproteins, differentially expressed and membrane-localized proteins, novel start sites and wrongly annotated pseudogenes. Most novelties were confirmed by targeted, parallel reaction monitoring mass spectrometry, including unique ORFs and single amino acid variations (SAAVs) identified in a re-sequenced laboratory strain that are not present in its reference genome. We demonstrate the general applicability of our strategy for genomes with varying GC content and distinct taxonomic origin. We release iPtgxDBs for B. henselae, Bradyrhizobium diazoefficiens and Escherichia coli and the software to generate both proteogenomics search databases and integrated annotation files that can be viewed in a genome browser for any prokaryote.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Cold Spring Harbor Laboratory Press
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc/4.0/
dc.title
An integrative strategy to identify the entire protein coding potential of prokaryotic genomes by proteogenomics
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution-NonCommercial 4.0 International
dc.date.published
2017-11-15
ethz.journal.title
Genome Research
ethz.journal.volume
27
en_US
ethz.journal.issue
12
en_US
ethz.journal.abbreviated
Genome res.
ethz.pages.start
2083
en_US
ethz.pages.end
2095
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
Cold Spring Harbor, NY
en_US
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00003 - Schulleitung und Dienste::00022 - Bereich VP Forschung / Domain VP Research::02207 - Functional Genomics Center Zurich / Functional Genomics Center Zurich
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02070 - Dep. Gesundheitswiss. und Technologie / Dep. of Health Sciences and Technology::02072 - Proteomics Plattform D-HEST
en_US
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00003 - Schulleitung und Dienste::00022 - Bereich VP Forschung / Domain VP Research::02207 - Functional Genomics Center Zurich / Functional Genomics Center Zurich
ethz.leitzahl.certified
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02070 - Dep. Gesundheitswiss. und Technologie / Dep. of Health Sciences and Technology::02072 - Proteomics Plattform D-HEST
en_US
ethz.date.deposited
2017-12-28T03:42:28Z
ethz.source
WOS
ethz.source
SCOPUS
ethz.source
FORM
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2018-02-20T10:21:14Z
ethz.rosetta.lastUpdated
2020-06-18T05:35:43Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/230700
dc.identifier.olduri
http://hdl.handle.net/20.500.11850/238147
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=An%20integrative%20strategy%20to%20identify%20the%20entire%20protein%20coding%20potential%20of%20prokaryotic%20genomes%20by%20proteogenomics&rft.jtitle=Genome%20Research&rft.date=2017&rft.volume=27&rft.issue=12&rft.spage=2083&rft.epage=2095&rft.issn=1088-9051&1549-5469&rft.au=Omasits,%20Ulrich&Varadarajan,%20Adithi%20R.&Schmid,%20Michael&Goetze,%20Sandra&Melidis,%20Damianos&rft.genre=article&rft_id=info:doi/10.1101/gr.218255.116&
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record