Graph-based modeling of tandem repeats improves global multiple sequence alignment

Open access
Date
2013-09-01Type
- Journal Article
ETH Bibliography
yes
Altmetrics
Abstract
Tandem repeats (TRs) are often present in proteins with crucial functions, responsible for resistance, pathogenicity and associated with infectious or neurodegenerative diseases. This motivates numerous studies of TRs and their evolution, requiring accurate multiple sequence alignment. TRs may be lost or inserted at any position of a TR region by replication slippage or recombination, but current methods assume fixed unit boundaries, and yet are of high complexity. We present a new global graph-based alignment method that does not restrict TR unit indels by unit boundaries. TR indels are modeled separately and penalized using the phylogeny-aware alignment algorithm. This ensures enhanced accuracy of reconstructed alignments, disentangling TRs and measuring indel events and rates in a biologically meaningful way. Our method detects not only duplication events but also all changes in TR regions owing to recombination, strand slippage and other events inserting or deleting TR units. We evaluate our method by simulation incorporating TR evolution, by either sampling TRs from a profile hidden Markov model or by mimicking strand slippage with duplications. The new method is illustrated on a family of type III effectors, a pathogenicity determinant in agriculturally important bacteria Ralstonia solanacearum. We show that TR indel rate variation contributes to the diversification of this protein family. Show more
Permanent link
https://doi.org/10.3929/ethz-b-000072947Publication status
publishedExternal links
Journal / series
Nucleic Acids ResearchVolume
Pages / Article No.
Publisher
Oxford University PressNotes
It was possible to publish this article open access thanks to a Swiss National Licence with the publisherMore
Show all metadata
ETH Bibliography
yes
Altmetrics