Proteome sequence features carry signatures of the environmental niche of prokaryotes


Date

2011-01

Publication Type

Journal Article

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Background Prokaryotic environmental adaptations occur at different levels within cells to ensure the preservation of genome integrity, proper protein folding and function as well as membrane fluidity. Although specific composition and structure of cellular components suitable for the variety of extreme conditions has already been postulated, a systematic study describing such adaptations has not yet been performed. We therefore explored whether the environmental niche of a prokaryote could be deduced from the sequence of its proteome. Finally, we aimed at finding the precise differences between proteome sequences of prokaryotes from different environments. Results We analyzed the proteomes of 192 prokaryotes from different habitats. We collected detailed information about the optimal growth conditions of each microorganism. Furthermore, we selected 42 physico-chemical properties of amino acids and computed their values for each proteome. Further, on the same set of features we applied two fundamentally different machine learning methods, Support Vector Machines and Random Forests, to successfully classify between bacteria and archaea, halophiles and non-halophiles, as well as mesophiles, thermophiles and mesothermophiles. Finally, we performed feature selection by using Random Forests. Conclusions To our knowledge, this is the first time that three different classification cases (domain of life, halophilicity and thermophilicity) of proteome adaptation are successfully performed with the same set of 42 features. The characteristic features of a specific adaptation constitute a signature that may help understanding the mechanisms of adaptation to extreme environments.

Publication status

published

Editor

Book title

Volume

11

Pages / Article No.

26

Publisher

BioMed Central

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Support Vector Machine; Feature Selection; Random Forest; Optimal Growth Temperature; Environmental Niche

Organisational unit

03749 - Sbalzarini, Ivo F. (ehemalig) check_circle
02721 - Inst. f. Biogeochemie u. Schadstoffdyn. / Inst. Biogeochem. and Pollutant Dynamics

Notes

Funding

Related publications and datasets