Journal: BMC Bioinformatics

Loading...

Abbreviation

Publisher

BioMed Central

Journal Volumes

ISSN

1471-2105

Description

Search Results

Publications 1 - 10 of 68
  • Clough, Timothy; Thaminy, Safia; Ragg, Susanne; et al. (2012)
    BMC Bioinformatics
    Background Liquid chromatography coupled with tandem mass spectrometry (LC-MS/MS) is widely used for quantitative proteomic investigations. The typical output of such studies is a list of identified and quantified peptides. The biological and clinical interest is, however, usually focused on quantitative conclusions at the protein level. Furthermore, many investigations ask complex biological questions by studying multiple interrelated experimental conditions. Therefore, there is a need in the field for generic statistical models to quantify protein levels even in complex study designs. Results We propose a general statistical modeling approach for protein quantification in arbitrary complex experimental designs, such as time course studies, or those involving multiple experimental factors. The approach summarizes the quantitative experimental information from all the features and all the conditions that pertain to a protein. It enables both protein significance analysis between conditions, and protein quantification in individual samples or conditions. We implement the approach in an open-source R-based software package MSstats suitable for researchers with a limited statistics and programming background. Conclusions We demonstrate, using as examples two experimental investigations with complex designs, that a simultaneous statistical modeling of all the relevant features and conditions yields a higher sensitivity of protein significance analysis and a higher accuracy of protein quantification as compared to commonly employed alternatives. The software is available at http://www.stat.purdue.edu/~ovitek/Software.html.
  • Hatakeyama, Masaomi; Opitz, Lennart; Russo, Giancarlo; et al. (2016)
    BMC Bioinformatics
    Background Next generation sequencing (NGS) produces massive datasets consisting of billions of reads and up to thousands of samples. Subsequent bioinformatic analysis is typically done with the help of open source tools, where each application performs a single step towards the final result. This situation leaves the bioinformaticians with the tasks to combine the tools, manage the data files and meta-information, document the analysis, and ensure reproducibility. Results We present SUSHI, an agile data analysis framework that relieves bioinformaticians from the administrative challenges of their data analysis. SUSHI lets users build reproducible data analysis workflows from individual applications and manages the input data, the parameters, meta-information with user-driven semantics, and the job scripts. As distinguishing features, SUSHI provides an expert command line interface as well as a convenient web interface to run bioinformatics tools. SUSHI datasets are self-contained and self-documented on the file system. This makes them fully reproducible and ready to be shared. With the associated meta-information being formatted as plain text tables, the datasets can be readily further analyzed and interpreted outside SUSHI. Conclusion SUSHI provides an exquisite recipe for analysing NGS data. By following the SUSHI recipe, SUSHI makes data analysis straightforward and takes care of documentation and administration tasks. Thus, the user can fully dedicate his time to the analysis itself. SUSHI is suitable for use by bioinformaticians as well as life science researchers. It is targeted for, but by no means constrained to, NGS data analysis. Our SUSHI instance is in productive use and has served as data analysis interface for more than 1000 data analysis projects. SUSHI source code as well as a demo server are freely available.
  • Lyngso, Rune B.; Anderson, James W.J.; Sizikova, Elena; et al. (2012)
    BMC Bioinformatics
    Background RNA secondary structure prediction, or folding, is a classic problem in bioinformatics: given a sequence of nucleotides, the aim is to predict the base pairs formed in its three dimensional conformation. The inverse problem of designing a sequence folding into a particular target structure has only more recently received notable interest. With a growing appreciation and understanding of the functional and structural properties of RNA motifs, and a growing interest in utilising biomolecules in nano-scale designs, the interest in the inverse RNA folding problem is bound to increase. However, whereas the RNA folding problem from an algorithmic viewpoint has an elegant and efficient solution, the inverse RNA folding problem appears to be hard. Results In this paper we present a genetic algorithm approach to solve the inverse folding problem. The main aims of the development was to address the hitherto mostly ignored extension of solving the inverse folding problem, the multi-target inverse folding problem, while simultaneously designing a method with superior performance when measured on the quality of designed sequences. The genetic algorithm has been implemented as a Python program called Frnakenstein. It was benchmarked against four existing methods and several data sets totalling 769 real and predicted single structure targets, and on 292 two structure targets. It performed as well as or better at finding sequences which folded in silico into the target structure than all existing methods, without the heavy bias towards CG base pairs that was observed for all other top performing methods. On the two structure targets it also performed well, generating a perfect design for about 80% of the targets. Conclusions Our method illustrates that successful designs for the inverse RNA folding problem does not necessarily have to rely on heavy biases in base pair and unpaired base distributions. The design problem seems to become more difficult on larger structures when the target structures are real structures, while no deterioration was observed for predicted structures. Design for two structure targets is considerably more difficult, but far from impossible, demonstrating the feasibility of automated design of artificial riboswitches. The Python implementation is available at http://www.stats.ox.ac.uk/research/genome/software/frnakenstein.
  • Wang, Xiaochuan; Chen, Li; Li, Fuyi; et al. (2019)
    BMC Bioinformatics
    Background S-sulphenylation is a ubiquitous protein post-translational modification (PTM) where an S-hydroxyl (−SOH) bond is formed via the reversible oxidation on the Sulfhydryl group of cysteine (C). Recent experimental studies have revealed that S-sulphenylation plays critical roles in many biological functions, such as protein regulation and cell signaling. State-of-the-art bioinformatic advances have facilitated high-throughput in silico screening of protein S-sulphenylation sites, thereby significantly reducing the time and labour costs traditionally required for the experimental investigation of S-sulphenylation. Results In this study, we have proposed a novel hybrid computational framework, termed SIMLIN, for accurate prediction of protein S-sulphenylation sites using a multi-stage neural-network based ensemble-learning model integrating both protein sequence derived and protein structural features. Benchmarking experiments against the current state-of-the-art predictors for S-sulphenylation demonstrated that SIMLIN delivered competitive prediction performance. The empirical studies on the independent testing dataset demonstrated that SIMLIN achieved 88.0% prediction accuracy and an AUC score of 0.82, which outperforms currently existing methods. Conclusions In summary, SIMLIN predicts human S-sulphenylation sites with high accuracy thereby facilitating biological hypothesis generation and experimental validation. The web server, datasets, and online instructions are freely available at http://simlin.erc.monash.edu/ for academic purposes.
  • Meyer, Lasse; Eling, Nils; Bodenmiller, Bernd (2024)
    BMC Bioinformatics
    Background: Highly multiplexed imaging enables single-cell-resolved detection of numerous biological molecules in their spatial tissue context. Interactive visualization of multiplexed imaging data is crucial at any step of data analysis to facilitate quality control and the spatial exploration of single cell features. However, tools for interactive visualization of multiplexed imaging data are not available in the statistical programming language R. Results: Here, we describe cytoviewer, an R/Bioconductor package for interactive visualization and exploration of multi-channel images and segmentation masks. The cytoviewer package supports flexible generation of image composites, allows side-by-side visualization of single channels, and facilitates the spatial visualization of single-cell data in the form of segmentation masks. As such, cytoviewer improves image and segmentation quality control, the visualization of cell phenotyping results and qualitative validation of hypothesis at any step of data analysis. The package operates on standard data classes of the Bioconductor project and therefore integrates with an extensive framework for single-cell and image analysis. The graphical user interface allows intuitive navigation and little coding experience is required to use the package. We showcase the functionality and biological application of cytoviewer by analysis of an imaging mass cytometry dataset acquired from cancer samples. Conclusions: The cytoviewer package offers a rich set of features for highly multiplexed imaging data visualization in R that seamlessly integrates with the workflow for image and single-cell data analysis. It can be installed from Bioconductor via
  • Cardner, Mathias; Marass, Francesco; Gedvilaite, Erika; et al. (2023)
    BMC Bioinformatics
    Background: Liquid biopsy is a minimally-invasive method of sampling bodily fluids, capable of revealing evidence of cancer. The distribution of cell-free DNA (cfDNA) fragment lengths has been shown to differ between healthy subjects and cancer patients, whereby the distributional shift correlates with the sample’s tumour content. These fragmentomic data have not yet been utilised to directly quantify the proportion of tumour-derived cfDNA in a liquid biopsy. Results: We used statistical learning to predict tumour content from Fourier and wavelet transforms of cfDNA length distributions in samples from 118 cancer patients. The model was validated on an independent dilution series of patient plasma. Conclusions: This proof of concept suggests that our fragmentomic methodology could be useful for predicting tumour content in liquid biopsies.
  • Malmström, Lars; Marko-Varga, György; Westergren-Thorsson, Gunilla; et al. (2006)
    BMC Bioinformatics
    Background We present 2DDB, a bioinformatics solution for storage, integration and analysis of quantitative proteomics data. As the data complexity and the rate with which it is produced increases in the proteomics field, the need for flexible analysis software increases. Results 2DDB is based on a core data model describing fundamentals such as experiment description and identified proteins. The extended data models are built on top of the core data model to capture more specific aspects of the data. A number of public databases and bioinformatical tools have been integrated giving the user access to large amounts of relevant data. A statistical and graphical package, R, is used for statistical and graphical analysis. The current implementation handles quantitative data from 2D gel electrophoresis and multidimensional liquid chromatography/mass spectrometry experiments. Conclusion The software has successfully been employed in a number of projects ranging from quantitative liquid-chromatography-mass spectrometry based analysis of transforming growth factor-beta stimulated fi-broblasts to 2D gel electrophoresis/mass spectrometry analysis of biopsies from human cervix. The software is available for download at SourceForge.
  • Zamboni, Nicola; Kümmel, Anne; Heinemann, Matthias (2008)
    BMC Bioinformatics
    Background Compared to other omics techniques, quantitative metabolomics is still at its infancy. Complex sample preparation and analytical procedures render exact quantification extremely difficult. Furthermore, not only the actual measurement but also the subsequent interpretation of quantitative metabolome data to obtain mechanistic insights is still lacking behind the current expectations. Recently, the method of network-embedded thermodynamic (NET) analysis was introduced to address some of these open issues. Building upon principles of thermodynamics, this method allows for a quality check of measured metabolite concentrations and enables to spot metabolic reactions where active regulation potentially controls metabolic flux. So far, however, widespread application of NET analysis in metabolomics labs was hindered by the absence of suitable software. Results We have developed in Matlab a generalized software called 'anNET' that affords a user-friendly implementation of the NET analysis algorithm. anNET supports the analysis of any metabolic network for which a stoichiometric model can be compiled. The model size can span from a single reaction to a complete genome-wide network reconstruction including compartments. anNET can (i) test quantitative data sets for thermodynamic consistency, (ii) predict metabolite concentrations beyond the actually measured data, (iii) identify putative sites of active regulation in the metabolic reaction network, and (iv) help in localizing errors in data sets that were found to be thermodynamically infeasible. We demonstrate the application of anNET with three published Escherichia coli metabolome data sets. Conclusion Our user-friendly and generalized implementation of the NET analysis method in the software anNET allows users to rapidly integrate quantitative metabolome data obtained from virtually any organism. We envision that use of anNET in labs working on quantitative metabolomics will provide the systems biology and metabolic engineering communities with a mean to proof the quality of metabolome data sets and with all further benefits of the NET analysis approach.
  • Polit, Lélia; Kerdivel, Gwenneg; Gregoricchio, Sebastian; et al. (2021)
    BMC Bioinformatics
    Background Multiple studies rely on ChIP-seq experiments to assess the effect of gene modulation and drug treatments on protein binding and chromatin structure. However, most methods commonly used for the normalization of ChIP-seq binding intensity signals across conditions, e.g., the normalization to the same number of reads, either assume a constant signal-to-noise ratio across conditions or base the estimates of correction factors on genomic regions with intrinsically different signals between conditions. Inaccurate normalization of ChIP-seq signal may, in turn, lead to erroneous biological conclusions. Results We developed a new R package, CHIPIN, that allows normalizing ChIP-seq signals across different conditions/samples when spike-in information is not available, but gene expression data are at hand. Our normalization technique is based on the assumption that, on average, no differences in ChIP-seq signals should be observed in the regulatory regions of genes whose expression levels are constant across samples/conditions. In addition to normalizing ChIP-seq signals, CHIPIN provides as output a number of graphs and calculates statistics allowing the user to assess the efficiency of the normalization and qualify the specificity of the antibody used. In addition to ChIP-seq, CHIPIN can be used without restriction on open chromatin ATAC-seq or DNase hypersensitivity data. We validated the CHIPIN method on several ChIP-seq data sets and documented its superior performance in comparison to several commonly used normalization techniques. Conclusions The CHIPIN method provides a new way for ChIP-seq signal normalization across conditions when spike-in experiments are not available. The method is implemented in a user-friendly R package available on GitHub: https://github.com/BoevaLab/CHIPIN
  • Gronwald, Wolfram; Hohm, Tim; Hoffmann, Daniel (2008)
    BMC Bioinformatics
    Background As a rule, peptides are more flexible and unstructured than proteins with their substantial stabilizing hydrophobic cores. Nevertheless, a few stably folding peptides have been discovered. This raises the question whether there may be more such peptides that are unknown as yet. These molecules could be helpful in basic research and medicine. Results As a method to explore the space of conformationally stable peptides, we have developed an evolutionary algorithm that allows optimization of sequences with respect to several criteria simultaneously, for instance stability, accessibility of arbitrary parts of the peptide, etc. In a proof-of-concept experiment we have perturbed the sequence of the peptide Villin Headpiece, known to be stable in vitro. Starting from the perturbed sequence we applied our algorithm to optimize peptide stability and accessibility of a loop. Unexpectedly, two clusters of sequences were generated in this way that, according to our criteria, should form structures with higher stability than the wild-type. The structures in one of the clusters possess a fold that markedly differs from the native fold of Villin Headpiece. One of the mutants predicted to be stable was selected for synthesis, its molecular 3D-structure was characterized by nuclear magnetic resonance spectroscopy, and its stability was measured by circular dichroism. Predicted structure and stability were in good agreement with experiment. Eight other sequences and structures, including five with a non-native fold are provided as bona fide predictions. Conclusion The results suggest that much more conformationally stable peptides may exist than are known so far, and that small fold classes could comprise well-separated sub-folds.
Publications 1 - 10 of 68