- Journal Article
Generation and prioritization of new molecules are the most central part of the drug design process. Matched molecular series analysis (MMSA) has recently been proposed as a formal approach that captures both of these key elements of design. In order to better understand the power of MMSA and its specific limitations, we here evaluate its performance as an ADME property prediction tool. We use four large and diverse inhouse data sets, logD, microsomal clearance, CYP2C9, and CYP3A4 inhibition. MMSA follows the concept of parallel structure–activity relationship (SAR), where if two identical substituent series on different scaffolds show similarity in their property profiles, SAR from one series can be transferred to the other series. We test four different similarity metrics to identify pairs of molecular series where information can be transferred. We find that the best prediction performance is achieved by a combination of centered root-mean-square deviation (cRMSD) and a network score approach previously published by Keefer et al. However, cRMSD alone strikes the best balance between accuracy and the number of predictions that can be made. We identify statistical metrics that allow estimating when MMSA predictions will work, similar to the well-known applicability domain concept in machine learning. MMSA achieves a prediction accuracy that is comparable to a standard machine-learning model and matched molecular pair analysis. In contrast to machine learning, however, it is very easy to understand where MMSA predictions are coming from. Finally, to prospectively test the power of MMSA, we retested compounds that were strong outliers in the initial predictions and show how the MMSA model can help to identify erroneous data points. © American Chemical Society 2020 Show more
Journal / seriesJournal of Chemical Information and Modeling
Pages / Article No.
PublisherAmerican Chemical Society
Organisational unit09458 - Riniker, Sereina Z. / Riniker, Sereina Z.
MoreShow all metadata