Systematic investigation of synthetic operon designs enables prediction and control of expression levels of multiple proteins


Loading...

Date

2022-06-10

Publication Type

Working Paper

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Controlling the expression levels of multiple recombinant proteins for optimal performance is crucial for synthetic biosystems but remains difficult given the large number of DNA-encoded factors that influence the process of gene expression from transcription to translation. In bacterial hosts, biosystems can be economically encoded as operons, but the sequence requirements for exact tuning of expression levels in an operon remain unclear. Here, we demonstrate the extent and predictability of protein-level variation using diverse arrangements of twelve genes to generate 88 synthetic operons with up to seven genes at varying inducer concentrations. The resulting 2772 protein expression measurements allowed the training of a sequence-based machine learning model that explains 83% of the variation in the data with a mean absolute error of 9% relative to reference constructs, making it a useful tool for protein expression prediction. Feature importance analysis indicates that operon length, gene position and gene junction structure are of major importance for protein expression.

Publication status

published

Editor

Book title

Journal / series

Volume

Pages / Article No.

Publisher

Cold Spring Harbor Laboratory

Event

Edition / version

v1

Methods

Software

Geographic location

Date collected

Date created

Subject

Synthetic biology; Biotechnology; Synthetic operons; Combinatorial DNA assembly; Machine learning; Protein expression

Organisational unit

03602 - Panke, Sven / Panke, Sven

Notes

Funding

289326 - Standarization and orthogonalization of the gene expression flow for robust engineering of NTN (new-to-nature) biological properties. (EC)

Related publications and datasets