Journal: GigaScience
Loading...
Abbreviation
Publisher
Oxford University Press
7 results
Filters
Reset filtersSearch Results
Publications 1 - 7 of 7
- A dataset profiling the multiomic landscape of the prefrontal cortex in amyotrophic lateral sclerosisItem type: Journal Article
GigaScienceHausmann, Fabian; Caldi Gomes, Lucas; Hänzelmann, Sonja; et al. (2024)Amyotrophic lateral sclerosis (ALS) is the most common motor neuron disease, which still lacks effective disease-modifying therapies. Similar to other neurodegenerative disorders, such as Alzheimer and Parkinson disease, ALS pathology is presumed to propagate over time, originating from the motor cortex and spreading to other cortical regions. Exploring early disease stages is crucial to understand the causative molecular changes underlying the pathology. For this, we sampled human postmortem prefrontal cortex (PFC) tissue from Brodmann area 6, an area that exhibits only moderate pathology at the time of death, and performed a multiomic analysis of 51 patients with sporadic ALS and 50 control subjects. To compare sporadic disease to genetic ALS, we additionally analyzed PFC tissue from 4 transgenic ALS mouse models (C9orf72-, SOD1-, TDP-43-, and FUS-ALS) using the same methods. This multiomic data resource includes transcriptome, small RNAome, and proteome data from female and male samples, aimed at elucidating early and sex-specific ALS mechanisms, biomarkers, and drug targets. - ricu: R's interface to intensive care dataItem type: Journal Article
GigaScienceBennett, Nicolas; Plečko, Drago; Ukor, Ida-Fong; et al. (2023)Objective: To develop a unified framework for analyzing data from 5 large publicly available intensive care unit (ICU) datasets. Findings: Using 3 American (Medical Information Mart for Intensive Care III, Medical Information Mart for Intensive Care IV, electronic ICU) and 2 European (Amsterdam University Medical Center Database, High Time Resolution ICU Dataset) databases, we constructed a mapping for each database to a set of clinically relevant concepts, which are grounded in the Observational Medical Outcomes Partnership Vocabulary wherever possible. Furthermore, we performed synchronization in the units of measurement and data type representation. On top of this, we built functionality, which allows the user to download, set up, and load data from all of the 5 databases, through a unified Application Programming Interface. The resulting ricu R-package represents the computational infrastructure for handling publicly available ICU datasets, and its latest release allows the user to load 119 existing clinical concepts from the 5 data sources. Conclusion: The ricu R-package (available on GitHub and CRAN) is the first tool that enables users to analyze publicly available ICU datasets simultaneously (datasets are available upon request from respective owners). Such an interface saves researchers time when analyzing ICU data and helps reproducibility. We hope that ricu can become a community-wide effort, so that data harmonization is not repeated by each research group separately. One current limitation is that concepts were added on a case-to-case basis, and therefore the resulting dictionary of concepts is not comprehensive. Further work is needed to make the dictionary comprehensive. - The haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar reveal novel pan-genome and allele-specific transcriptome featuresItem type: Journal Article
GigaScienceQi, Weihong; Lim, Yi-Wen; Patrignani, Andrea; et al. (2022)Background Cassava (Manihot esculenta) is an important clonally propagated food crop in tropical and subtropical regions worldwide. Genetic gain by molecular breeding has been limited, partially because cassava is a highly heterozygous crop with a repetitive and difficult-to-assemble genome. Findings Here we demonstrate that Pacific Biosciences high-fidelity (HiFi) sequencing reads, in combination with the assembler hifiasm, produced genome assemblies at near complete haplotype resolution with higher continuity and accuracy compared to conventional long sequencing reads. We present 2 chromosome-scale haploid genomes phased with Hi-C technology for the diploid African cassava variety TME204. With consensus accuracy >QV46, contig N50 >18 Mb, BUSCO completeness of 99%, and 35k phased gene loci, it is the most accurate, continuous, complete, and haplotype-resolved cassava genome assembly so far. Ab initio gene prediction with RNA-seq data and Iso-Seq transcripts identified abundant novel gene loci, with enriched functionality related to chromatin organization, meristem development, and cell responses. During tissue development, differentially expressed transcripts of different haplotype origins were enriched for different functionality. In each tissue, 20–30% of transcripts showed allele-specific expression (ASE) differences. ASE bias was often tissue specific and inconsistent across different tissues. Direction-shifting was observed in <2% of the ASE transcripts. Despite high gene synteny, the HiFi genome assembly revealed extensive chromosome rearrangements and abundant intra-genomic and inter-genomic divergent sequences, with large structural variations mostly related to LTR retrotransposons. We use the reference-quality assemblies to build a cassava pan-genome and demonstrate its importance in representing the genetic diversity of cassava for downstream reference-guided omics analysis and breeding. Conclusions The phased and annotated chromosome pairs allow a systematic view of the heterozygous diploid genome organization in cassava with improved accuracy, completeness, and haplotype resolution. They will be a valuable resource for cassava breeding and research. Our study may also provide insights into developing cost-effective and efficient strategies for resolving complex genomes with high resolution, accuracy, and continuity. - U-Limb: A multi-modal, multi-center database on arm motion control in healthy and post-stroke conditionsItem type: Journal Article
GigaScienceAverta, Giuseppe; Barontini, Federica; Catrambone, Vincenzo; et al. (2021)Background Shedding light on the neuroscientific mechanisms of human upper limb motor control, in both healthy and disease conditions (e.g., after a stroke), can help to devise effective tools for a quantitative evaluation of the impaired conditions, and to properly inform the rehabilitative process. Furthermore, the design and control of mechatronic devices can also benefit from such neuroscientific outcomes, with important implications for assistive and rehabilitation robotics and advanced human-machine interaction. To reach these goals, we believe that an exhaustive data collection on human behavior is a mandatory step. For this reason, we release U-Limb, a large, multi-modal, multi-center data collection on human upper limb movements, with the aim of fostering trans-disciplinary cross-fertilization. Contribution This collection of signals consists of data from 91 able-bodied and 65 post-stroke participants and is organized at 3 levels: (i) upper limb daily living activities, during which kinematic and physiological signals (electromyography, electro-encephalography, and electrocardiography) were recorded; (ii) force-kinematic behavior during precise manipulation tasks with a haptic device; and (iii) brain activity during hand control using functional magnetic resonance imaging. - AlcoR: alignment-free simulation, mapping, and visualization of low-complexity regions in biological dataItem type: Journal Article
GigaScienceSilva, Jorge M.; Qi, Weihong; Pinho, Armando J.; et al. (2023)Background:Low-complexity data analysis is the area that addresses the search and quantification of regions in sequences of elements that contain low-complexity or repetitive elements. For example, these can be tandem repeats, inverted repeats, homopolymer tails, GC-biased regions, similar genes, and hairpins, among many others. Identifying these regions is crucial because of their association with regulatory and structural characteristics. Moreover, their identification provides positional and quantity information where standard assembly methodologies face significant difficulties because of substantial higher depth coverage (mountains), ambiguous read mapping, or where sequencing or reconstruction defects may occur. However, the capability to distinguish low-complexity regions (LCRs) in genomic and proteomic sequences is a challenge that depends on the model’s ability to find them automatically. Low-complexity patterns can be implicit through specific or combined sources, such as algorithmic or probabilistic, and recurring to different spatial distances—namely, local, medium, or distant associations. Findings:This article addresses the challenge of automatically modeling and distinguishing LCRs, providing a new method and tool (AlcoR) for efficient and accurate segmentation and visualization of these regions in genomic and proteomic sequences. The method enables the use of models with different memories, providing the ability to distinguish local from distant low-complexity patterns. The method is reference and alignment free, providing additional methodologies for testing, including a highly flexible simulation method for generating biological sequences (DNA or protein) with different complexity levels, sequence masking, and a visualization tool for automatic computation of the LCR maps into an ideogram style. We provide illustrative demonstrations using synthetic, nearly synthetic, and natural sequences showing the high efficiency and accuracy of AlcoR. As large-scale results, we use AlcoR to unprecedentedly provide a whole-chromosome low-complexity map of a recent complete human genome and the haplotype-resolved chromosome pairs of a heterozygous diploid African cassava cultivar. Conclusions:The AlcoR method provides the ability of fast sequence characterization through data complexity analysis, ideally for scenarios entangling the presence of new or unknown sequences. AlcoR is implemented in C language using multithreading to increase the computational speed, is flexible for multiple applications, and does not contain external dependencies. The tool accepts any sequence in FASTA format. The source code is freely provided at https://github.com/cobilab/alcor. - V-pipe 3.0: a sustainable pipeline for within-sample viral genetic diversity estimationItem type: Journal Article
GigaScienceFuhrmann, Lara; Jablonski, Kim Philipp; Topolsky, Ivan; et al. (2024)The large amount and diversity of viral genomic datasets generated by next-generation sequencing technologies poses a set of challenges for computational data analysis workflows, including rigorous quality control, scaling to large sample sizes, and tailored steps for specific applications. Here, we present V-pipe 3.0, a computational pipeline designed for analyzing next-generation sequencing data of short viral genomes. It is developed to enable reproducible, scalable, adaptable, and transparent inference of genetic diversity of viral samples. By presenting 2 large-scale data analysis projects, we demonstrate the effectiveness of V-pipe 3.0 in supporting sustainable viral genomic data science. - The FIP 1.0 Data Set: Highly resolved annotated image time series of 4,000 wheat plots grown in 6 yearsItem type: Journal Article
GigaScienceRoth, Lukas; Boss, Mike; Kirchgessner, Norbert; et al. (2025)Background: Understanding genotype-environment interactions of plants is crucial for crop improvement, yet limited by the scarcity of quality phenotyping data. This Data Note presents the Field Phenotyping Platform 1.0 data set, a comprehensive resource for winter wheat research that combines imaging, trait, environmental, and genetic data. Findings: We provide time-series data for more than 4,000 wheat plots, including aligned high-resolution image sequences totaling more than 153,000 aligned images across 6 years. Measurement data for 8 key wheat traits are included-namely, canopy cover values, plant heights, wheat head counts, senescence ratings, heading date, final plant height, grain yield, and protein content. Genetic marker information and environmental data complement the time series. Data quality is demonstrated through heritability analyses and genomic prediction models, achieving accuracies aligned with previous research. Conclusions: This extensive data set offers opportunities for advancing crop modeling and phenotyping techniques, enabling researchers to develop novel approaches for understanding genotype-environment interactions, analyzing growth dynamics, and predicting crop performance. By making this resource publicly available, we aim to accelerate research in climate-adaptive agriculture and foster collaboration between plant science and machine learning communities.
Publications 1 - 7 of 7