Show simple item record

dc.contributor.author
Yu, Lijia
dc.contributor.author
Tanwar, Deepak K.
dc.contributor.author
Penha, Emanuel D.S.
dc.contributor.author
Wolf, Yuri I.
dc.contributor.author
Koonin, Eugene V.
dc.contributor.author
Basu, Malay K.
dc.date.accessioned
2019-03-04T11:50:06Z
dc.date.available
2019-03-03T03:31:15Z
dc.date.available
2019-03-04T11:50:06Z
dc.date.issued
2019-02-26
dc.identifier.issn
0027-8424
dc.identifier.issn
1091-6490
dc.identifier.other
10.1073/pnas.1814684116
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/328688
dc.identifier.doi
10.3929/ethz-b-000328688
dc.description.abstract
From an abstract, informational perspective, protein domains appear analogous to words in natural languages in which the rules of word association are dictated by linguistic rules, or grammar. Such rules exist for protein domains as well, because only a small fraction of all possible domain combinations is viable in evolution. We employ a popular linguistic technique, n-gram analysis, to probe the “proteome grammar”—that is, the rules of association of domains that generate various domain architectures of proteins. Comparison of the complexity measures of “protein languages” in major branches of life shows that the relative entropy difference (information gain) between the observed domain architectures and random domain combinations is highly conserved in evolution and is close to being a universal constant, at ∼1.2 bits. Substantial deviations from this constant are observed in only two major groups of organisms: a subset of Archaea that appears to be cells simplified to the limit, and animals that display extreme complexity. We also identify the n-grams that represent signatures of the major branches of cellular life. The results of this analysis bolster the analogy between genomes and natural language and show that a “quasi-universal grammar” underlies the evolution of domain architectures in all divisions of cellular life. The nearly universal value of information gain by the domain architectures could reflect the minimum complexity of signal processing that is required to maintain a functioning cell.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
National Academy of Sciences
en_US
dc.rights.uri
http://creativecommons.org/licenses/by-nc-nd/4.0/
dc.subject
n-gram
en_US
dc.subject
bigram
en_US
dc.subject
protein domain
en_US
dc.subject
language
en_US
dc.subject
domain architecture
en_US
dc.title
Grammar of protein domain architectures
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution-NonCommercial-NoDerivatives 4.0 International
dc.date.published
2019-02-07
ethz.journal.title
Proceedings of the National Academy of Sciences of the United States of America
ethz.journal.volume
116
en_US
ethz.journal.issue
9
en_US
ethz.journal.abbreviated
PNAS
ethz.pages.start
3636
en_US
ethz.pages.end
3645
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.publication.place
Washington, DC
en_US
ethz.publication.status
published
en_US
ethz.date.deposited
2019-03-03T03:31:17Z
ethz.source
SCOPUS
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2019-03-04T11:50:16Z
ethz.rosetta.lastUpdated
2019-03-04T11:50:16Z
ethz.rosetta.exportRequired
true
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Grammar%20of%20protein%20domain%20architectures&rft.jtitle=Proceedings%20of%20the%20National%20Academy%20of%20Sciences%20of%20the%20United%20States%20of%20America&rft.date=2019-02-26&rft.volume=116&rft.issue=9&rft.spage=3636&rft.epage=3645&rft.issn=0027-8424&1091-6490&rft.au=Yu,%20Lijia&Tanwar,%20Deepak%20K.&Penha,%20Emanuel%20D.S.&Wolf,%20Yuri%20I.&Koonin,%20Eugene%20V.&rft.genre=article&
 Search via SFX

Files in this item

Thumbnail

Publication type

Show simple item record