Language Models in Molecular Discovery


METADATA ONLY
Loading...

Date

2024

Publication Type

Book Chapter

ETH Bibliography

yes

Citations

Altmetric
METADATA ONLY

Data

Rights / License

Abstract

The success of language models, especially transformer-based architectures, has trickled into other scientific domains, giving rise to the concept of “scientific language models” that operate on small molecules, proteins, or polymers. In chemistry, language models contribute to accelerating the molecule discovery cycle as evidenced by promising recent findings in early-stage drug discovery. In this chapter, we review the role of language models in molecular discovery, underlining their strengths and examining their weaknesses in de novo drug design, property prediction, and reaction chemistry. We highlight valuable open-source software assets to lower the entry barrier to the field of scientific language modeling. Furthermore, as a solution to some of the weaknesses we identify, we outline a vision for future molecular design that integrates a chatbot interface with available computational chemistry tools through techniques such as retrieval-augmented generation (RAG). Our contribution serves as a valuable resource for researchers, chemists, and AI enthusiasts interested in understanding how language models can and will be used to accelerate chemical discovery.

Publication status

published

Book title

Drug Development Supported by Informatics

Journal / series

Volume

Pages / Article No.

121 - 141

Publisher

Springer

Event

Edition / version

Methods

Software

Geographic location

Date collected

Date created

Subject

Organisational unit

Notes

Funding

Related publications and datasets