Accurate indel prediction using paired-end short reads
OPEN ACCESS
Loading...
Author / Producer
Date
2013-02
Publication Type
Journal Article
ETH Bibliography
yes
Citations
Altmetric
OPEN ACCESS
Data
Rights / License
Abstract
Background
One of the major open challenges in next generation sequencing (NGS) is the accurate identification of structural variants such as insertions and deletions (indels). Current methods for indel calling assign scores to different types of evidence or counter-evidence for the presence of an indel, such as the number of split read alignments spanning the boundaries of a deletion candidate or reads that map within a putative deletion. Candidates with a score above a manually defined threshold are then predicted to be true indels. As a consequence, structural variants detected in this manner contain many false positives.
Results
Here, we present a machine learning based method which is able to discover and distinguish true from false indel candidates in order to reduce the false positive rate. Our method identifies indel candidates using a discriminative classifier based on features of split read alignment profiles and trained on true and false indel candidates that were validated by Sanger sequencing. We demonstrate the usefulness of our method with paired-end Illumina reads from 80 genomes of the first phase of the 1001 Genomes Project (http://www.1001genomes.org) in Arabidopsis thaliana.
Conclusion
In this work we show that indel classification is a necessary step to reduce the number of false positive candidates. We demonstrate that missing classification may lead to spurious biological interpretations. The software is available at: http://agkb.is.tuebingen.mpg.de/Forschung/SV-M/.
Permanent link
Publication status
published
External links
Editor
Book title
Journal / series
Volume
14
Pages / Article No.
132
Publisher
BioMed Central
Event
Edition / version
Methods
Software
Geographic location
Date collected
Date created
Subject
Next generation sequencing; Indel detection; Discriminative machine learning; Paired-end short reads; Split-read mapping
Organisational unit
09486 - Borgwardt, Karsten M. (ehemalig) / Borgwardt, Karsten M. (former)