Journal: Algorithms for Molecular Biology

Loading...

Abbreviation

Algorithms Mol Biol

Publisher

BioMed Central

Journal Volumes

ISSN

1748-7188

Description

Search Results

Publications 1 - 6 of 6
  • Jahn, Katharina; Beerenwinkel, Niko; Zhang, Louxin (2021)
    Algorithms for Molecular Biology
    Background: Mutation trees are rooted trees in which nodes are of arbitrary degree and labeled with a mutation set. These trees, also referred to as clonal trees, are used in computational oncology to represent the mutational history of tumours. Classical tree metrics such as the popular Robinson–Foulds distance are of limited use for the comparison of mutation trees. One reason is that mutation trees inferred with different methods or for different patients often contain different sets of mutation labels. Results: We generalize the Robinson–Foulds distance into a set of distance metrics called Bourque distances for comparing mutation trees. We show the basic version of the Bourque distance for mutation trees can be computed in linear time. We also make a connection between the Robinson–Foulds distance and the nearest neighbor interchange distance.
  • Stadler, Tanja; Degnan, James H. (2012)
    Algorithms for Molecular Biology
    Background The ancestries of genes form gene trees which do not necessarily have the same topology as the species tree due to incomplete lineage sorting. Available algorithms determining the probability of a gene tree given a species tree require exponential computational runtime. Results In this paper, we provide a polynomial time algorithm to calculate the probability of a ranked gene tree topology for a given species tree, where a ranked tree topology is a tree topology with the internal vertices being ordered. The probability of a gene tree topology can thus be calculated in polynomial time if the number of orderings of the internal vertices is a polynomial number. However, the complexity of calculating the probability of a gene tree topology with an exponential number of rankings for a given species tree remains unknown. Conclusions Polynomial algorithms for calculating ranked gene tree probabilities may become useful in developing methodology to infer species trees based on a collection of gene trees, leading to a more accurate reconstruction of ancestral species relationships.
  • Gavryushkina, Alexandra; Welch, David; Drummond, Alexei J. (2013)
    Algorithms for Molecular Biology
    Background: In Bayesian phylogenetic inference we are interested in distributions over a space of trees. The number of trees in a tree space is an important characteristic of the space and is useful for specifying prior distributions. When all samples come from the same time point and no prior information available on divergence times, the tree counting problem is easy. However, when fossil evidence is used in the inference to constrain the tree or data are sampled serially, new tree spaces arise and counting the number of trees is more difficult. Results: We describe an algorithm that is polynomial in the number of sampled individuals for counting of resolutions of a constraint tree assuming that the number of constraints is fixed. We generalise this algorithm to counting resolutions of a fully ranked constraint tree. We describe a quadratic algorithm for counting the number of possible fully ranked trees on n sampled individuals. We introduce a new type of tree, called a fully ranked tree with sampled ancestors, and describe a cubic time algorithm for counting the number of such trees on n sampled individuals. Conclusions: These algorithms should be employed for Bayesian Markov chain Monte Carlo inference when fossil data are included or data are serially sampled.
  • Tschager, Thomas; Rösch, Simon; Gillet, Ludovic; et al. (2017)
    Algorithms for Molecular Biology
    Background Given a peptide as a string of amino acids, the masses of all its prefixes and suffixes can be found by a trivial linear scan through the amino acid masses. The inverse problem is the ideal de novo peptide sequencing problem: Given all prefix and suffix masses, determine the string of amino acids. In biological reality, the given masses are measured in a lab experiment, and measurements by necessity are noisy. The (real, noisy) de novo peptide sequencing problem therefore has a noisy input: a few of the prefix and suffix masses of the peptide are missing and a few other masses are given in addition. For this setting, we ask for an amino acid string that explains the given masses as accurately as possible. Results Past approaches interpreted accuracy by searching for a string that explains as many masses as possible. We feel, however, that it is not only bad to not explain a mass that appears, but also to explain a mass that does not appear. We propose to minimize the symmetric difference between the set of given masses and the set of masses that the string explains. For this new optimization problem, we propose an efficient algorithm that computes both the best and the k best solutions. Proof-of-concept experiments on measurements of synthesized peptides show that our approach leads to better results compared to finding a string that explains as many given masses as possible. Conclusions We conclude that considering the symmetric difference as optimization goal can improve the identification rates for de novo peptide sequencing. A preliminary version of this work has been presented at WABI 2016.
  • Frank, Yves; Hruz, Tomas; Tschager, Thomas; et al. (2018)
    Algorithms for Molecular Biology
    Background Liquid chromatography combined with tandem mass spectrometry is an important tool in proteomics for peptide identification. Liquid chromatography temporally separates the peptides in a sample. The peptides that elute one after another are analyzed via tandem mass spectrometry by measuring the mass-to-charge ratio of a peptide and its fragments. De novo peptide sequencing is the problem of reconstructing the amino acid sequences of a peptide from this measurement data. Past de novo sequencing algorithms solely consider the mass spectrum of the fragments for reconstructing a sequence. Results We propose to additionally exploit the information obtained from liquid chromatography. We study the problem of computing a sequence that is not only in accordance with the experimental mass spectrum, but also with the chromatographic retention time. We consider three models for predicting the retention time and develop algorithms for de novo sequencing for each model. Conclusions Based on an evaluation for two prediction models on experimental data from synthesized peptides we conclude that the identification rates are improved by exploiting the chromatographic information. In our evaluation, we compare our algorithms using the retention time information with algorithms using the same scoring model, but not the retention time.
  • The open-closed mod-minimizer algorithm
    Item type: Journal Article
    Groot Koerkamp, Ragnar; Liu, Daniel; Pibiri, Giulio Ermanno (2025)
    Algorithms for Molecular Biology
    Sampling algorithms that deterministically select a subset of k-mers are an important building block in bioinformat-ics applications. For example, they are used to index large textual collections, like DNA, and to compare sequences quickly. In such applications, a sampling algorithm is required to select one k-mer out of every window of w consecu-tive k-mers. The folklore and most used scheme is the random minimizer that selects the smallest k-mer in the win-dow according to some random order. This scheme is remarkably simple and versatile, and has a density (expected fraction of selected k-mers) of 2/(w+1) . In practice, lower density leads to faster methods and smaller indexes, and it turns out that the random minimizer is not the best one can do. Indeed, some schemes are known to approach optimal density 1/w when k ->infinity , like the recently introduced mod-minimizer (Groot Koerkamp and Pibiri, WABI 2024). In this work, we study methods that achieve low density when k <= w . In this small-k regime, a practical method with provably better density than the random minimizer is the miniception (Zheng et al., Bioinformatics 2021). This method can be elegantly described as sampling the smallest closed sycnmer (Edgar, PeerJ 2021) in the window according to some random order. We show that extending the miniception to prefer sampling open syncmers yields much better density. This new method-the open-closed minimizer-offers improved density for small k <= w while being as fast to compute as the random minimizer. Compared to methods based on decycling sets, that achieve very low density in the small-k regime, our method has comparable density while being computation-ally simpler and intuitive. Furthermore, we extend the mod-minimizer to improve density of any scheme that works well for small k to also work well when k>w is large. We hence obtain the open-closed mod-minimizer, a practical method that improves over the mod-minimizer for all k
Publications 1 - 6 of 6