Journal: Digital Discovery
Loading...
Abbreviation
Publisher
Royal Society of Chemistry
13 results
Filters
Reset filtersSearch Results
Publications 1 - 10 of 13
- Chemical representation learning for toxicity predictionItem type: Journal Article
Digital DiscoveryBorn, Jannis; Markert, Greta; Janakarajan, Nikita; et al. (2023)Undesired toxicity is a major hindrance to drug discovery and largely responsible for high attrition rates in early stages. This calls for new, reliable, and interpretable molecular property prediction models that help prioritize compounds and thus reduce the high costs for development and the risk to humans, animals, and the environment. Here, we propose an interpretable chemical language model that combines attention with multiscale convolutions and relies on data augmentation. We first benchmark various molecular representations (e.g., fingerprints, different flavors of SMILES and SELFIES, as well as graph and graph kernel methods) revealing that SMILES coupled with augmentation overall yields the best performance. Despite its simplicity, our model is then shown to outperform existing approaches across a wide range of molecular property prediction tasks, including but not limited to toxicity. Moreover, the attention weights of the model allow for easy interpretation and show enrichment of known toxicophores even without explicit supervision. To introduce a notion of model reliability, we propose and combine two simple methods for uncertainty estimation (Monte-Carlo dropout and test-time-augmentation). These methods not only identify samples with high prediction uncertainty, but also allow formation of implicit model ensembles that improve accuracy. Last, we validate our model on a large-scale proprietary toxicity dataset and find that it outperforms previous work while giving similar insights into revealing cytotoxic substructures. - A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processingItem type: Journal Article
Digital DiscoveryWinter, Benedikt; Winter, Clemens; Schilling, Johannes; et al. (2022)The knowledge of mixtures’ phase equilibria is crucial in nature and technical chemistry. Phase equilibria calculations of mixtures require activity coefficients. However, experimental data on activity coefficients are often limited due to the high cost of experiments. For an accurate and efficient prediction of activity coefficients, machine learning approaches have been recently developed. However, current machine learning approaches still extrapolate poorly for activity coefficients of unknown molecules. In this work, we introduce a SMILES-to-properties-transformer (SPT), a natural language processing network, to predict binary limiting activity coefficients from SMILES codes. To overcome the limitations of available experimental data, we initially train our network on a large dataset of synthetic data sampled from COSMO-RS (10 million data points) and then fine-tune the model on experimental data (20 870 data points). This training strategy enables the SPT to accurately predict limiting activity coefficients even for unknown molecules, cutting the mean prediction error in half compared to state-of-the-art models for activity coefficient predictions such as COSMO-RS and UNIFACDortmund, and improving on recent machine learning approaches. - Capsule-based automated synthesis for the efficient assembly of PROTAC like moleculesItem type: Journal Article
Digital DiscoveryBordi, Samuele; Jiang, Tuo; Konopka, Anna; et al. (2023)In recent years, the therapeutically beneficial degradation of proteins using PROteolysis Targeting Chimeras (PROTACs) has become an increasingly popular approach in drug discovery. However, the preparation of these larger than average, heavily functionalised molecules can be synthetically challenging and time-consuming, and experience in making and handling the final PROTACs and their precursors is not yet widespread. To overcome these challenges, an existing capsule-based automated synthesis console has been adapted and employed for the automated synthesis of PROTAC-like molecules. Reagent capsules containing a partial PROTAC reagent plus the reagents required for conjugation of the partial PROTAC to the target protein binder, as well as the materials for product isolation, were prepared in order to accelerate the process and simplify PROTAC synthesis. The use of these capsules, in combination with the automated synthesis console, has enabled the safe, automated preparation of a range of different PROTAC-like molecules bearing different linker and E3 ligase functionalities. - Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processingItem type: Other Journal Item
Digital DiscoveryWinter, Benedikt; Winter, Clemens; Schilling, Johannes; et al. (2023)Correction for 'A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing' by Benedikt Winter et al., Digital Discovery, 2022, https://doi.org/10.1039/d2dd00058j. - Atomate2: modular workflows for materials scienceItem type: Journal Article
Digital DiscoveryGanose, Alex M.; Sahasrabuddhe, Hrushikesh; Asta, Mark; et al. (2025)High-throughput density functional theory (DFT) calculations have become a vital element of computational materials science, enabling materials screening, property database generation, and training of "universal" machine learning models. While several software frameworks have emerged to support these computational efforts, new developments such as machine learned force fields have increased demands for more flexible and programmable workflow solutions. This manuscript introduces atomate2, a comprehensive evolution of our original atomate framework, designed to address existing limitations in computational materials research infrastructure. Key features include the support for multiple electronic structure packages and interoperability between them, along with generalizable workflows that can be written in an abstract form irrespective of the DFT package or machine learning force field used within them. Our hope is that atomate2's improved usability and extensibility can reduce technical barriers for high-throughput research workflows and facilitate the rapid adoption of emerging methods in computational material science. - Repurposing quantum chemical descriptor datasets for on-the-fly generation of informative reaction representations: application to hydrogen atom transfer reactionsItem type: Journal Article
Digital DiscoveryAlfonso-Ramos, Javier E.; Neeser, Rebecca M.; Stuyver, Thijs (2024)In this work, we explore how existing datasets of quantum chemical properties can be repurposed to build data-efficient downstream machine learning models, with a particular focus on predicting the activation energy of hydrogen atom transfer (HAT) reactions. Starting from a valence bond (VB) analysis of a generic HAT process, a set of informative descriptors is identified. Next, a surrogate neural network model is constructed to predict an informative representation, based on the identified VB descriptors, with the help of a publicly available dataset of (pre-computed) quantum chemical properties of organic radicals. We demonstrate that coupling the resulting on-the-fly informative representation to a secondary machine-learning model for activation energy prediction outperforms various predictive model architectures starting from conventional machine-learning inputs by a wide margin, at no additional computational cost. By basing their final predictions on physically meaningful descriptors, our models enable the extraction of chemical insights, providing an additional benefit. Finally, because of the extreme data efficiency of our descriptor-augmented models, we are able to fine-tune and apply them to small datasets across various reaction conditions, settings and application domains, ranging from regular (liquid phase) synthesis, over metabolism and drug design, to atmospheric chemistry. - Understanding the language of molecules: predicting pure component parameters for the PC-SAFT equation of state from SMILESItem type: Journal Article
Digital DiscoveryWinter, Benedikt; Rehner, Philipp; Esper, Timm; et al. (2025)A major bottleneck in developing sustainable processes and materials is a lack of property data. Recently, machine learning approaches have vastly improved previous methods for predicting molecular properties. However, these machine learning models are often not able to handle thermodynamic constraints adequately. In this work, we present a machine learning model based on natural language processing to predict pure-component parameters for the perturbed-chain statistical associating fluid theory (PC-SAFT) equation of state. The model is based on our previously proposed SMILES-to-Properties-Transformer (SPT). By incorporating PC-SAFT into the neural network architecture, the machine learning model is trained directly on experimental vapor pressure and liquid density data. Combining established physical modeling approaches with state-of-the-art machine learning methods enables high-accuracy predictions across a wide range of pressures and temperatures, while keeping the thermodynamic consistency of an equation of state like PC-SAFT. SPTPC-SAFT demonstrates exceptional prediction accuracy even for complex molecules with various functional groups, outperforming traditional group contribution methods by a factor of four in the mean average percentage deviation. Moreover, SPTPC-SAFT captures the behavior of stereoisomers without any special consideration. To facilitate the application of our model, we provide predicted PC-SAFT parameters of 13 279 components, making PC-SAFT accessible to all researchers. - Developments and applications of the OPTIMADE API for materials discovery, design, and data exchangeItem type: Journal Article
Digital DiscoveryEvans, Matthew L.; Bergsma, Johan; Merkys, Andrius; et al. (2024)The Open Databases Integration for Materials Design (OPTIMADE) application programming interface (API) empowers users with holistic access to a growing federation of databases, enhancing the accessibility and discoverability of materials and chemical data. Since the first release of the OPTIMADE specification (v1.0), the API has undergone significant development, leading to the v1.2 release, and has underpinned multiple scientific studies. In this work, we highlight the latest features of the API format, accompanying software tools, and provide an update on the implementation of OPTIMADE in contributing materials databases. We end by providing several use cases that demonstrate the utility of the OPTIMADE API in materials research that continue to drive its ongoing development. - Correction: A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processingItem type: Other Journal Item
Digital DiscoveryWinter, Benedikt; Winter, Clemens; Schilling, Johannes; et al. (2024)Correction for 'A smile is all you need: predicting limiting activity coefficients from SMILES with natural language processing' by Benedikt Winter et al., Digital Discovery, 2022, 1, 859-869, https://doi.org/10.1039/D2DD00058J. - Quantum chemical data generation as fill-in for reliability enhancement of machine-learning reaction and retrosynthesis planningItem type: Journal Article
Digital DiscoveryToniato, Alessandra; Unsleber, Jan Patrick; Vaucher, Alain C.; et al. (2023)Data-driven synthesis planning has seen remarkable successes in recent years by virtue of modern approaches of artificial intelligence that efficiently exploit vast databases with experimental data on chemical reactions. However, this success story is intimately connected to the availability of existing experimental data. It may well occur in retrosynthetic and synthesis design tasks that predictions in individual steps of a reaction cascade are affected by large uncertainties. In such cases, it will, in general, not be easily possible to provide missing data from autonomously conducted experiments on demand. However, first-principles calculations can, in principle, provide missing data to enhance the confidence of an individual prediction or for model retraining. Here, we demonstrate the feasibility of such an ansatz and examine resource requirements for conducting autonomous first-principles calculations on demand.
Publications 1 - 10 of 13