Show simple item record

dc.contributor.author
Tyzack, Jonathan D.
dc.contributor.author
Mussa, Hamse Y.
dc.contributor.author
Williamson, Mark J.
dc.contributor.author
Kirchmair, Johannes
dc.contributor.author
Glen, Robert C.
dc.date.accessioned
2019-06-27T15:24:45Z
dc.date.available
2017-06-11T09:54:55Z
dc.date.available
2019-06-27T15:24:45Z
dc.date.issued
2014-05-27
dc.identifier.issn
1758-2946
dc.identifier.other
10.1186/1758-2946-6-29
en_US
dc.identifier.uri
http://hdl.handle.net/20.500.11850/85604
dc.identifier.doi
10.3929/ethz-b-000085604
dc.description.abstract
Background The prediction of sites and products of metabolism in xenobiotic compounds is key to the development of new chemical entities, where screening potential metabolites for toxicity or unwanted side-effects is of crucial importance. In this work 2D topological fingerprints are used to encode atomic sites and three probabilistic machine learning methods are applied: Parzen-Rosenblatt Window (PRW), Naive Bayesian (NB) and a novel approach called RASCAL (Random Attribute Subsampling Classification ALgorithm). These are implemented by randomly subsampling descriptor space to alleviate the problem often suffered by data mining methods of having to exactly match fingerprints, and in the case of PRW by measuring a distance between feature vectors rather than exact matching. The classifiers have been implemented in CUDA/C++ to exploit the parallel architecture of graphical processing units (GPUs) and is freely available in a public repository. Results It is shown that for PRW a SoM (Site of Metabolism) is identified in the top two predictions for 85%, 91% and 88% of the CYP 3A4, 2D6 and 2C9 data sets respectively, with RASCAL giving similar performance of 83%, 91% and 88%, respectively. These results put PRW and RASCAL performance ahead of NB which gave a much lower classification performance of 51%, 73% and 74%, respectively. Conclusions 2D topological fingerprints calculated to a bond depth of 4-6 contain sufficient information to allow the identification of SoMs using classifiers based on relatively small data sets. Thus, the machine learning methods outlined in this paper are conceptually simpler and more efficient than other methods tested and the use of simple topological descriptors derived from 2D structure give results competitive with other approaches using more expensive quantum chemical descriptors. The descriptor space subsampling approach and ensemble methodology allow the methods to be applied to molecules more distant from the training data where data mining would be more likely to fail due to the lack of common fingerprints. The RASCAL algorithm is shown to give equivalent classification performance to PRW but at lower computational expense allowing it to be applied more efficiently in the ensemble scheme.
en_US
dc.format
application/pdf
en_US
dc.language.iso
en
en_US
dc.publisher
Chemistry Central
en_US
dc.rights.uri
http://creativecommons.org/licenses/by/4.0/
dc.subject
Cytochrome P450
en_US
dc.subject
Metabolism
en_US
dc.subject
Probabilistic
en_US
dc.subject
Classification
en_US
dc.subject
GPU
en_US
dc.subject
CUDA
en_US
dc.subject
2D
en_US
dc.title
Cytochrome P450 site of metabolism prediction from 2D topological fingerprints using GPU accelerated probabilistic classifiers
en_US
dc.type
Journal Article
dc.rights.license
Creative Commons Attribution 4.0 International
ethz.journal.title
Journal of Cheminformatics
ethz.journal.volume
6
en_US
ethz.journal.abbreviated
J Cheminform
ethz.pages.start
29
en_US
ethz.size
14 p.
en_US
ethz.version.deposit
publishedVersion
en_US
ethz.identifier.wos
ethz.identifier.scopus
ethz.identifier.nebis
005800388
ethz.publication.place
London
en_US
ethz.publication.status
published
en_US
ethz.date.deposited
2017-06-11T09:59:59Z
ethz.source
ECIT
ethz.identifier.importid
imp59365203d77a773182
ethz.ecitpid
pub:134893
ethz.eth
yes
en_US
ethz.availability
Open access
en_US
ethz.rosetta.installDate
2017-07-13T00:15:55Z
ethz.rosetta.lastUpdated
2021-02-15T04:55:25Z
ethz.rosetta.versionExported
true
ethz.COinS
ctx_ver=Z39.88-2004&rft_val_fmt=info:ofi/fmt:kev:mtx:journal&rft.atitle=Cytochrome%20P450%20site%20of%20metabolism%20prediction%20from%202D%20topological%20fingerprints%20using%20GPU%20accelerated%20probabilistic%20classifiers&rft.jtitle=Journal%20of%20Cheminformatics&rft.date=2014-05-27&rft.volume=6&rft.spage=29&rft.issn=1758-2946&rft.au=Tyzack,%20Jonathan%20D.&Mussa,%20Hamse%20Y.&Williamson,%20Mark%20J.&Kirchmair,%20Johannes&Glen,%20Robert%20C.&rft.genre=article&rft_id=info:doi/10.1186/1758-2946-6-29&
 Search print copy at ETH Library

Files in this item

Thumbnail

Publication type

Show simple item record