Simon Höllerer
Loading...
4 results
Filters
Reset filtersSearch Results
Publications1 - 4 of 4
- Data-driven protease engineering by DNA-recording and epistasis-aware machine learningItem type: Journal Article
Nature CommunicationsHuber, Lukas; Kucera, Tim; Höllerer, Simon; et al. (2025)Protein engineering has recently seen tremendous transformation due to machine learning (ML) tools that predict structure from sequence at unprecedented precision. Predicting catalytic activity, however, remains challenging, restricting our capabilities to design protein sequences with desired catalytic function in silico. This predicament is mainly rooted in a lack of experimental methods capable of recording sequence-activity data in quantities sufficient for data-intensive ML techniques, and the inefficiency of searches in the enormous sequence spaces inherent to proteins. Herein, we address both limitations in the context of engineering proteases with tailored substrate specificity. We introduce a DNA recorder for deep specificity profiling of proteases in Escherichia coli as we demonstrate testing 29,716 candidate proteases against up to 134 substrates in parallel. The resulting sequence-activity data on approximately 600,000 protease-substrate pairs does not only reveal key sequence determinants governing protease specificity, but allows to build a data-efficient deep learning model that accurately predicts protease sequences with desired on- and off-target activities. Moreover, we present epistasis-aware training set design as a generalizable strategy to streamline searches within enormous sequence spaces, which strongly increases model accuracy at given experimental efforts and is thus likely to have implications for protein engineering far beyond proteases. - Large-Scale Sequence-Function Mapping of Bacterial Genetic Elements using DNA-based Phenotypic RecordingItem type: Doctoral ThesisHöllerer, Simon (2023)Synthetic biology involves the engineering and redesigning of organisms to give them new, purposeful abilities with the goal of addressing current challenges in medicine, agriculture, and manufacturing. Therefore, synthetic biologists aim to understand organisms to such depth that their biological functions become predictable and tunable. This requires large, high-quality datasets linking genetic elements to their quantitative functions. However, generating such datasets can be difficult due to the limitations of current experimental methods (Chapter 1). To address this, in this thesis, I present a novel method termed uASPIre (ultradeep Acquisition of Sequence-Phenotype Interrelations) that enables high-throughput recording of sequence-function datasets on a large scale. This method capitalizes on the ability of a site-specific recombinase to record functional information in DNA. Combining this recording with modern short-read next-generation sequencing (NGS) techniques enables the readout of both sequence and quantitative function simultaneously at hitherto unmatched scale (Chapter 2). An area of research that requires large sequence-function datasets, is the longstanding endeavor to understand and predict bacterial gene expression. One of its key determinants and the rate limiting step is the process of translation initiation, which mainly depends on the sequence of the ribosomal binding site (RBS) as part of the 5’-untranslated region (5’-UTR). Since the RBS has the biggest impact on translation initiation, we sought to get a better understanding of the translational process by studying thousands of RBSs and use machine learning to be able to predict their quantitative function. Therefore, we applied uASPIre to dynamically record the quantitative function of over 300,000 RBS variants in a single experiment in the bacterium Escherichia coli. We then used the resulting data to train a deep learning model which was able to predict RBS translation initiation rates with high accuracy (Chapter 2). In addition to the 5’-UTR, the coding sequence (CDS) of the downstream gene has a major effect on translation initiation. Although these two mRNA parts, 5’-UTR and CDS, have been extensively studied in the past, their complex interaction is still unknown. Due to a lack of experimental methods capable of measuring both sequence and function at large scale, only few combinations of 5’-UTRs and CDSs have been studied, which led to contradictory conclusions about the impact of various sequence motifs on bacterial translation. To resolve this, we expanded uASPIre’s capabilities and systematically characterized the translation rates of over 1.2 million 5’ UTR-CDS pairs (Chapter 3). With the resulting big data, we could provide detailed quantification of the impact of mRNA sequence motifs on translation initiation. We conclusively showed that the effect of the CDS can almost exclusively be explained by mRNA secondary structures and not by tRNA abundance in the cell. Moreover, we obtained clear experimental evidence for a base-pairing interaction between the base directly downstream of the anticodon in the initiator tRNA and the base directly upstream of the start codon (Chapter 3). Many genetic elements are longer than several hundred base pairs, which makes them difficult to be characterized at large scale using current short-read Illumina sequencing techniques. To overcome this, we combined uASPIre with a barcoding strategy and long-read SMRT-sequencing. This requires a computational pipeline that processes raw NGS and SMRT-seq data, links barcodes to their corresponding variants, and generates a quantitative output for each characterized variant. Therefore, I have developed computational means that were required to extend the uASPIre method to characterize long genetic elements while still capitalizing on the massive throughput of Illumina sequencing. Several conceptual steps were needed, which I built in silico and tested in an internal collaboration. To further enable other researchers to use and benefit from uASPIre, in this thesis, I provide a step-by-step experimental protocol including NGS data analysis for both short-read Illumina and long-read SMRT-sequencing data. As an illustrative example, variants of the Tobacco Etch Virus protease and the L-rhamnose inducible promoter were characterized (Chapter 4). Due to the irreversible recording of Bxb1 expression in DNA, uASPIre is a very sensitive method capable of resolving even small changes in gene expression. This requires zero (or low) basal expression of Bxb1 without induction. Many genetic elements, however, are constitutively active, and can therefore not be characterized using uASPIre. To overcome this, I therefore expanded the usability of uASPIre by adding a second recombinase that allows the user to switch on the recording function when needed. As a proof-of-concept, I showcase this novel double-recombinase system by characterizing a set of constitutive bacterial promoters (Chapter 5). Collectively, this thesis encompasses an overview of methods for high-throughput sequence-function mapping, the development of uASPIre, its application to different genetic elements and a step-by-step protocol including scripts for data analysis. Overall, uASPIre is a widely applicable and practical method for recording high-throughput sequence-function datasets on a massive scale and has the potential to greatly advance the field of synthetic biology. This will further strengthen our understanding of biological functions and ultimately strengthen our ability to redesign organisms and harness the power of nature to solve current and future challenges.
- From sequence to function and back – High-throughput sequence-function mapping in synthetic biologyItem type: Review Article
Current Opinion in Systems BiologyHöllerer, Simon; Desczyk, Charlotte; Muro, Ricardo Farrera; et al. (2024)How does genetic sequence give rise to biological function? Answering this question is key to our understanding of life and the construction of synthetic biosystems that fight disease, resource scarcity and climate change. Unfortunately, the virtually infinite number of possible sequences and limitations in their functional characterization limit our current understanding of sequence-function relationships. To overcome this dilemma, several high-throughput methods to experimentally link sequences to corresponding functional properties have been developed recently. While all of these share the goal to collect sequence-function data at large scale, they differ significantly in their technical approach, functional readout and application scope. Herein, we highlight recent developments in the aspiring field of high-throughput sequence-function mapping providing a critical assessment of their potential in synthetic biology. - Ultradeep characterisation of translational sequence determinants refutes rare-codon hypothesis and unveils quadruplet base pairing of initiator tRNA and transcriptItem type: Journal Article
Nucleic Acids ResearchHöllerer, Simon; Jeschek, Markus (2023)Translation is a key determinant of gene expression and an important biotechnological engineering target. In bacteria, 5 '-untranslated region (5 '-UTR) and coding sequence (CDS) are well-known mRNA parts controlling translation and thus cellular protein levels. However, the complex interaction of 5 '-UTR and CDS has so far only been studied for few sequences leading to non-generalisable and partly contradictory conclusions. Herein, we systematically assess the dynamic translation from over 1.2 million 5 '-UTR-CDS pairs in Escherichia coli to investigate their collective effect using a new method for ultradeep sequence-function mapping. This allows us to disentangle and precisely quantify effects of various sequence determinants of translation. We find that 5 '-UTR and CDS individually account for 53% and 20% of variance in translation, respectively, and show conclusively that, contrary to a common hypothesis, tRNA abundance does not explain expression changes between CDSs with different synonymous codons. Moreover, the obtained large-scale data provide clear experimental evidence for a base-pairing interaction between initiator tRNA and mRNA beyond the anticodon-codon interaction, an effect that is often masked for individual sequences and therefore inaccessible to low-throughput approaches. Our study highlights the indispensability of ultradeep sequence-function mapping to accurately determine the contribution of parts and phenomena involved in gene regulation.
Publications1 - 4 of 4