Machine learning methods for efficient antibody discovery, engineering and optimization
OPEN ACCESS
Loading...
Author / Producer
Date
2025
Publication Type
Doctoral Thesis
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Antibodies have emerged as one of the most important biopharmaceuticals with
transformative outcomes in the treatment of various diseases including cancer,
autoimmune disorders, and infectious diseases. Despite their success, the
discovery, engineering and optimization of therapeutic antibodies remain limited
by experimental bottlenecks along the entire development pipeline that
substantially increase the cost of bringing an antibody therapeutic to the
patients. Traditional in vivo discovery campaigns generate high-affinity
antibodies through in vivo maturation that possess favorable developability
properties as opposed to in vitro methods. However, in vivo discovery relies
heavily on animal immunization and experimental screening of B cells and
developability optimization is constrained by experimental low-throughput
assays, and therefore costly and labor-intensive. Computational advancements,
such as machine learning (ML), have the potential to transform this field, but are
equally constrained by limited data availability.
In this thesis, we address key challenges in antibody discovery, affinity
engineering and developability optimization through three complementary
studies. First, we generated a unique dataset of single-cell transcriptomes and
antibody repertoires from immunized mice labeled for antigen specificity. We
investigated predictive patterns in transcriptome and antibody amino acid
sequences and demonstrated that gene expression-based ML models
outperform sequence-based approaches in predicting antigen specificity within
an antigen cohort. This work highlights the potential of single-cell gene
expression patterns for in vivo antibody discovery.
Second, we developed a workflow for ML-guided affinity engineering of an
antigen-specific antibody variant. Using antibody repertoires from immunized
mice a computational workflow aimed to select a set of antigen-binding variants
was developed. The amino acid sequences and their experimentally measured
affinities were used to train ML regression models and were able to accurately
predict continuous affinity values. This approach enabled the ML-guided design
of eight synthetic antibody variants, of which seven exhibited the desired
affinities when experimentally validated. These findings highlight the feasibility of
leveraging small datasets (<50) for precise affinity engineering, reducing the
reliance on extensive experimental screening.
Finally, we introduced a modular framework for antibody developability
optimization based on Retrieval Augmented Generation (RAG). This method
combines a retriever and generator to optimize antibody sequences for
developability parameters, such as solubility. This framework enables flexible
control over the optimization aimed at preserving antigen-binding functionality.
By utilizing a generalizable database we envision this approach to be applicable
across different optimization campaigns, offering a transparent and interpretable
approach to improve antibody developability optimization.
Together, these studies present advances along the entire therapeutic antibody
development pipeline by introducing novel methodologies for antigen-specificity
prediction, affinity engineering, and developability optimization. By enhancing
traditional experimental approaches with computational methodologies, such as
ML, this thesis provides a foundation for accelerating therapeutic antibody
development while minimizing experimental efforts.
Permanent link
Publication status
published
External links
Editor
Contributors
Examiner : Reddy, Sai
Examiner : Singh, Rohit
Examiner : Rodríguez Martínez, María
Book title
Journal / series
Volume
Pages / Article No.
Publisher
ETH Zurich
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Antibody engineering; Machine learning
Organisational unit
03952 - Reddy, Sai / Reddy, Sai
Notes
Funding
197941 - Single-cell profiling of antibody repertoires and transcriptomes from B cells to determine the relationship with antigen-specificity and aging (SNF)
Related publications and datasets
References: 10.1016/j.cels.2024.11.005