Machine learning methods for efficient antibody discovery, engineering and optimization


Loading...

Author / Producer

Date

2025

Publication Type

Doctoral Thesis

ETH Bibliography

yes

Citations

Altmetric

Data

Abstract

Antibodies have emerged as one of the most important biopharmaceuticals with transformative outcomes in the treatment of various diseases including cancer, autoimmune disorders, and infectious diseases. Despite their success, the discovery, engineering and optimization of therapeutic antibodies remain limited by experimental bottlenecks along the entire development pipeline that substantially increase the cost of bringing an antibody therapeutic to the patients. Traditional in vivo discovery campaigns generate high-affinity antibodies through in vivo maturation that possess favorable developability properties as opposed to in vitro methods. However, in vivo discovery relies heavily on animal immunization and experimental screening of B cells and developability optimization is constrained by experimental low-throughput assays, and therefore costly and labor-intensive. Computational advancements, such as machine learning (ML), have the potential to transform this field, but are equally constrained by limited data availability. In this thesis, we address key challenges in antibody discovery, affinity engineering and developability optimization through three complementary studies. First, we generated a unique dataset of single-cell transcriptomes and antibody repertoires from immunized mice labeled for antigen specificity. We investigated predictive patterns in transcriptome and antibody amino acid sequences and demonstrated that gene expression-based ML models outperform sequence-based approaches in predicting antigen specificity within an antigen cohort. This work highlights the potential of single-cell gene expression patterns for in vivo antibody discovery. Second, we developed a workflow for ML-guided affinity engineering of an antigen-specific antibody variant. Using antibody repertoires from immunized mice a computational workflow aimed to select a set of antigen-binding variants was developed. The amino acid sequences and their experimentally measured affinities were used to train ML regression models and were able to accurately predict continuous affinity values. This approach enabled the ML-guided design of eight synthetic antibody variants, of which seven exhibited the desired affinities when experimentally validated. These findings highlight the feasibility of leveraging small datasets (<50) for precise affinity engineering, reducing the reliance on extensive experimental screening. Finally, we introduced a modular framework for antibody developability optimization based on Retrieval Augmented Generation (RAG). This method combines a retriever and generator to optimize antibody sequences for developability parameters, such as solubility. This framework enables flexible control over the optimization aimed at preserving antigen-binding functionality. By utilizing a generalizable database we envision this approach to be applicable across different optimization campaigns, offering a transparent and interpretable approach to improve antibody developability optimization. Together, these studies present advances along the entire therapeutic antibody development pipeline by introducing novel methodologies for antigen-specificity prediction, affinity engineering, and developability optimization. By enhancing traditional experimental approaches with computational methodologies, such as ML, this thesis provides a foundation for accelerating therapeutic antibody development while minimizing experimental efforts.

Publication status

published

Editor

Contributors

Examiner : Reddy, Sai
Examiner : Singh, Rohit
Examiner : Rodríguez Martínez, María

Book title

Journal / series

Volume

Pages / Article No.

Publisher

ETH Zurich

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Antibody engineering; Machine learning

Organisational unit

03952 - Reddy, Sai / Reddy, Sai

Notes

Funding

197941 - Single-cell profiling of antibody repertoires and transcriptomes from B cells to determine the relationship with antigen-specificity and aging (SNF)

Related publications and datasets