Journal: Journal of Chemical Information and Modeling

Loading...

Abbreviation

J. Chem. Inf. Model.

Publisher

American Chemical Society

Journal Volumes

ISSN

1549-9596
0095-2338
1520-5142

Description

Search Results

Publications 1 - 10 of 68
  • Riniker, Sereina (2017)
    Journal of Chemical Information and Modeling
  • Riniker, Sereina; Fechner, Nikolas; Landrum, Gregory A. (2013)
    Journal of Chemical Information and Modeling
  • Braun, Jessica; Katzberger, Paul; Landrum, Gregory; et al. (2025)
    Journal of Chemical Information and Modeling
    Understanding the conformational ensemble of molecules in different environments is at the core of many research efforts. In conformer generation and geometry optimization, the complexity of the conformer space arises from the underlying torsion-angle distributions, which, in the case of force fields and some in silico conformer generators like ETKDG, are derived from accumulated torsion profiles for a predefined set of torsion motifs (termed ″torsion motif torsional-angle distributions″, tmTADs). Comparative studies of conformer generation and global optimization algorithms often neglect that tmTADs are sensitive to the environment they are extracted from, leading to comparisons of conformational ensembles and minimum-energy conformations from, e.g., crystal versus vacuum environments. Here, we present a large-scale comparative study of tmTADs across different environments, namely crystal, vacuum, water, and hexane, where the ensembles in the noncrystal environments are accessed through a computational workflow using the OpenFF-2.0.0 force field in combination with the graph neural network-based implicit solvent (GNNIS) approach. Our results show that the effects in the different environments, such as solvent–solute interactions in water and hexane, and packing effects in the crystal, produce strikingly distinct torsion distributions for most of the selected torsion motifs. In addition to qualitative and quantitative comparison of the extracted tmTADs, we also provide an automated fitting procedure that allows rapid parametrization of the distributions. These newly found parameters can be employed in a solvent-specific conformer generation procedure in the future.
  • Esposito, Carmen; Landrum, Gregory A.; Schneider, Nadine; et al. (2021)
    Journal of Chemical Information and Modeling
    Machine learning classifiers trained on class imbalanced data are prone to overpredict the majority class. This leads to a larger misclassification rate for the minority class, which in many real-world applications is the class of interest. For binary data, the classification threshold is set by default to 0.5 which, however, is often not ideal for imbalanced data. Adjusting the decision threshold is a good strategy to deal with the class imbalance problem. In this work, we present two different automated procedures for the selection of the optimal decision threshold for imbalanced classification. A major advantage of our procedures is that they do not require retraining of the machine learning models or resampling of the training data. The first approach is specific for random forest (RF), while the second approach, named GHOST, can be potentially applied to any machine learning classifier. We tested these procedures on 138 public drug discovery data sets containing structure–activity data for a variety of pharmaceutical targets. We show that both thresholding methods improve significantly the performance of RF. We tested the use of GHOST with four different classifiers in combination with two molecular descriptors, and we found that most classifiers benefit from threshold optimization. GHOST also outperformed other strategies, including random undersampling and conformal prediction. Finally, we show that our thresholding procedures can be effectively applied to real-world drug discovery projects, where the imbalance and characteristics of the data vary greatly between the training and test sets.
  • Wahab, Alexandra; Pfuderer, Lara; Paenurk, Eno; et al. (2022)
    Journal of Chemical Information and Modeling
    Chemical databases are an essential tool for data-driven investigation of structure–property relationships and for the design of novel functional compounds. We introduce the first phase of the COMPAS Project─a COMputational database of Polycyclic Aromatic Systems. In this phase, we developed two data sets containing the optimized ground-state structures and a selection of molecular properties of ∼34k and ∼9k cata-condensed polybenzenoid hydrocarbons (at the GFN2-xTB and B3LYP-D3BJ/def2-SVP levels, respectively) and placed them in the public domain. Herein, we describe the process of the data set generation, detail the information available within the data sets, and show the fundamental features of the generated data. We analyze the correlation between the two types of computations as well as the structure–property relationships of the calculated species. The data and insights gained from them can inform rational design of novel functional aromatic molecules for use in, e.g., organic electronics, and can provide a basis for additional data-driven machine- and deep-learning studies in chemistry.
  • Witek, Jagna; Keller, Bettina G.; Blatter, Markus; et al. (2017)
    Journal of Chemical Information and Modeling
  • Armacost, Kira A.; Riniker, Sereina; Cournia, Zoe (2020)
    Journal of Chemical Information and Modeling
  • Türtscher, Paul Lorenz; Reiher, Markus (2023)
    Journal of Chemical Information and Modeling
    While the field of first-principles explorations into chemical reaction space has been continuously growing, the development of strategies for analyzing resulting chemical reaction networks (CRNs) is lagging behind. A CRN consists of compounds linked by reactions. Analyzing how these compounds are transformed into one another based on kinetic modeling is a nontrivial task. Here, we present the graph-optimization-driven algorithm and program Pathfinder to allow for such an analysis of a CRN. The CRN for this work has been obtained with our open-source Chemoton reaction network exploration software. Chemoton probes reactive combinations of compounds for elementary steps and sorts them into reactions. By encoding these reactions of the CRN as a graph consisting of compound and reaction vertices and adding information about activation barriers as well as required reagents to the edges of the graph yields a complete graph-theoretical representation of the CRN. Since the probabilities of the formation of compounds depend on the starting conditions, the consumption of any compound during a reaction must be accounted for to reflect the availability of reagents. To account for this, we introduce compound costs to reflect compound availability. Simultaneously, the determined compound costs rank the compounds in the CRN in terms of their probability to be formed. This ranking then allows us to probe easily accessible compounds in the CRN first for further explorations into yet unexplored terrain. We first illustrate the working principle on an abstract small CRN. Afterward, Pathfinder is demonstrated in the example of the disproportionation of iodine with water and the comproportionation of iodic acid and hydrogen iodide. Both processes are analyzed within the same CRN, which we construct with our autonomous first-principles CRN exploration software Chemoton [ Unsleber, J. P. ; J. Chem. Theory Comput. 2022, 18, 5393−5409 ] guided by Pathfinder.
  • Gamboa Carballo, Juan José; Ferino-Pérez, Anthuan; Rana, Vijay Kumar; et al. (2020)
    Journal of Chemical Information and Modeling
  • Unsleber, Jan Patrick (2023)
    Journal of Chemical Information and Modeling
    Autonomously exploringchemical reaction networks withfirst-principlesmethods can generate vast data. Especially autonomous explorationswithout tight constraints risk getting trapped in regions of reactionnetworks that are not of interest. In many cases, these regions ofthe networks are only exited once fully searched. Consequently, therequired human time for analysis and computer time for data generationcan make these investigations unfeasible. Here, we show how simplereaction templates can facilitate the transfer of chemical knowledgefrom expert input or existing data into new explorations. This processsignificantly accelerates reaction network explorations and improvescost-effectiveness. We discuss the definition of the reaction templatesand their generation based on molecular graphs. The resulting simplefiltering mechanism for autonomous reaction network investigationsis exemplified with a polymerization reaction.
Publications 1 - 10 of 68