Species abundance information improves sequence taxonomy classification accuracy
dc.contributor.author
Kaehler, Benjamin D.
dc.contributor.author
Bokulich, Nicholas
dc.contributor.author
McDonald, Daniel
dc.contributor.author
Knight, Rob
dc.contributor.author
Caporaso, J. Gregory
dc.contributor.author
Huttley, Gavin A.
dc.date.accessioned
2020-08-17T11:33:15Z
dc.date.available
2020-08-12T10:07:05Z
dc.date.available
2020-08-17T11:33:15Z
dc.date.issued
2019-10
dc.identifier.issn
2041-1723
dc.identifier.other
10.1038/s41467-019-12669-6
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/431166
dc.identifier.doi
10.3929/ethz-b-000431166
dc.description.abstract
Popular naive Bayes taxonomic classifiers for amplicon sequences assume that all species in the reference database are equally likely to be observed. We demonstrate that classification accuracy degrades linearly with the degree to which that assumption is violated, and in practice it is always violated. By incorporating environment-specific taxonomic abundance information, we demonstrate a significant increase in the species-level classification accuracy across common sample types. At the species level, overall average error rates decline from 25% to 14%, which is favourably comparable to the error rates that existing classifiers achieve at the genus level (16%). Our findings indicate that for most practical purposes, the assumption that reference species are equally likely to be observed is untenable. q2-clawback provides a straightforward alternative for samples from common environments.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Nature
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.title
Species abundance information improves sequence taxonomy classification accuracy
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution 4.0 International
dc.date.published
2019-10-11
ethz.journal.title
Nature Communications
ethz.journal.volume
10
en_US
ethz.journal.issue
1
en_US
ethz.journal.abbreviated
Nat Commun
ethz.pages.start
4643
en_US
ethz.size
10 p.
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.publication.place
London
ethz.publication.status
published
en_US
ethz.leitzahl
ETH Zürich::00002 - ETH Zürich::00012 - Lehre und Forschung::00007 - Departemente::02070 - Dep. Gesundheitswiss. und Technologie / Dep. of Health Sciences and Technology::02701 - Inst.f. Lebensmittelwiss.,Ernährung,Ges. / Institute of Food, Nutrition, and Health::09714 - Bokulich, Nicholas / Bokulich, Nicholas
en_US
ethz.relation.isNewVersionOf
10.3929/ethz-b-000431207
ethz.date.deposited
2020-08-12T10:07:13Z
ethz.source
BATCH
ethz.eth
no
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2020-08-17T11:33:27Z
ethz.rosetta.lastUpdated
2022-03-29T02:55:55Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Species%20abundance%20information%20improves%20sequence%20taxonomy%20classification%20accuracy&rft.jtitle=Nature%20Communications&rft.date=2019-10&rft.volume=10&rft.issue=1&rft.spage=4643&rft.issn=2041-1723&rft.au=Kaehler,%20Benjamin%20D.&Bokulich,%20Nicholas&McDonald,%20Daniel&Knight,%20Rob&Caporaso,%20J.%20Gregory&rft.genre=article&rft_id=info:doi/10.1038/s41467-019-12669-6&
Files in this item
Publication type
-
Journal Article [124243]