Michal Ziemski


Loading...

Last Name

Ziemski

First Name

Michal

Organisational unit

09714 - Bokulich, Nicholas / Bokulich, Nicholas

Search Results

Publications 1 - 8 of 8
  • Robeson II, Michael S.; O'Rourke, Devon R.; Kaehler, Benjamin D.; et al. (2021)
    PLoS Computational Biology
    Nucleotide sequence and taxonomy reference databases are critical resources for widespread applications including marker-gene and metagenome sequencing for microbiome analysis, diet metabarcoding, and environmental DNA (eDNA) surveys. Reproducibly generating, managing, using, and evaluating nucleotide sequence and taxonomy reference databases creates a significant bottleneck for researchers aiming to generate custom sequence databases. Furthermore, database composition drastically influences results, and lack of standardization limits cross-study comparisons. To address these challenges, we developed RESCRIPt, a Python 3 software package and QIIME 2 plugin for reproducible generation and management of reference sequence taxonomy databases, including dedicated functions that streamline creating databases from popular sources, and functions for evaluating, comparing, and interactively exploring qualitative and quantitative characteristics across reference databases. To highlight the breadth and capabilities of RESCRIPt, we provide several examples for working with popular databases for microbiome profiling (SILVA, Greengenes, NCBI-RefSeq, GTDB), eDNA and diet metabarcoding surveys (BOLD, GenBank), as well as for genome comparison. We show that bigger is not always better, and reference databases with standardized taxonomies and those that focus on type strains have quantitative advantages, though may not be appropriate for all use cases. Most databases appear to benefit from some curation (quality filtering), though sequence clustering appears detrimental to database quality. Finally, we demonstrate the breadth and extensibility of RESCRIPt for reproducible workflows with a comparison of global hepatitis genomes. RESCRIPt provides tools to democratize the process of reference database acquisition and management, enabling researchers to reproducibly and transparently create reference materials for diverse research applications. RESCRIPt is released under a permissive BSD-3 license at https://github.com/bokulich-lab/RESCRIPt.
  • Ziemski, Michal; Adamov, Anja; Kim, Lina; et al. (2022)
    Bioinformatics
    Motivation The volume of public nucleotide sequence data has blossomed over the past two decades and is ripe for re- and meta-analyses to enable novel discoveries. However, reproducible re-use and management of sequence datasets and associated metadata remain critical challenges. We created the open source Python package q2-fondue to enable user-friendly acquisition, re-use and management of public sequence (meta)data while adhering to open data principles. Results q2-fondue allows fully provenance-tracked programmatic access to and management of data from the NCBI Sequence Read Archive (SRA). Unlike other packages allowing download of sequence data from the SRA, q2-fondue enables full data provenance tracking from data download to final visualization, integrates with the QIIME 2 ecosystem, prevents data loss upon space exhaustion and allows download of (meta)data given a publication library. To highlight its manifold capabilities, we present executable demonstrations using publicly available amplicon, whole genome and metagenome datasets. Availability and implementation q2-fondue is available as an open-source BSD-3-licensed Python package at https://github.com/bokulich-lab/q2-fondue. Usage tutorials are available in the same repository. All Jupyter notebooks used in this article are available under https://github.com/bokulich-lab/q2-fondue-examples. Supplementary information Supplementary data are available at Bioinformatics online.
  • Ziemski, Michal; Wisanwanichthan, Treepop; Bokulich, Nicholas; et al. (2021)
    Frontiers in Microbiology
    Naive Bayes classifiers (NBC) have dominated the field of taxonomic classification of amplicon sequences for over a decade. Apart from having runtime requirements that allow them to be trained and used on modest laptops, they have persistently provided class-topping classification accuracy. In this work we compare NBC with random forest classifiers, neural network classifiers, and a perfect classifier that can only fail when different species have identical sequences, and find that in some practical scenarios there is little scope for improving on NBC for taxonomic classification of 16S rRNA gene sequences. Further improvements in taxonomy classification are unlikely to come from novel algorithms alone, and will need to leverage other technological innovations, such as ecological frequency information.
  • Bokulich, Nicholas; Ziemski, Michal; Robeson II, Michael S.; et al. (2020)
    Computational and Structural Biotechnology Journal
    Microbiomes are integral components of diverse ecosystems, and increasingly recognized for their roles in the health of humans, animals, plants, and other hosts. Given their complexity (both in composition and function), the effective study of microbiomes (microbiomics) relies on the development, optimization, and validation of computational methods for analyzing microbial datasets, such as from marker-gene (e.g., 16S rRNA gene) and metagenome data. This review describes best practices for benchmarking and implementing computational methods (and software) for studying microbiomes, with particular focus on unique characteristics of microbiomes and microbiomics data that should be taken into account when designing and testing microbiomics methods.
  • Ziemski, Michal; Gehret, Liz; Simard, Anthony; et al. (2025)
    bioRxiv
    Metagenome sequencing has revolutionized functional microbiome analysis across diverse ecosystems, but is fraught with technical hurdles. We introduce MOSHPIT (https://moshpit.readthedocs.io), software built on the QIIME 2 framework (Q2F) that integrates best-in-class CAMI2-validated metagenome tools with robust provenance tracking and multiple user interfaces, enabling streamlined, reproducible metagenome analysis for all expertise levels. By building on Q2F, MOSHPIT enhances scalability, interoperability, and reproducibility in complex workflows, democratizing and accelerating discovery at the frontiers of metagenomics.
  • Dillon, Matthew R.; Bolyen, Evan; Adamov, Anja; et al. (2021)
    PLoS Computational Biology
    In October of 2020, in response to the Coronavirus Disease 2019 (COVID-19) pandemic, our team hosted our first fully online workshop teaching the QIIME 2 microbiome bioinformatics platform. We had 75 enrolled participants who joined from at least 25 different countries on 6 continents, and we had 22 instructors on 4 continents. In the 5-day workshop, participants worked hands-on with a cloud-based shared compute cluster that we deployed for this course. The event was well received, and participants provided feedback and suggestions in a postworkshop questionnaire. In January of 2021, we followed this workshop with a second fully online workshop, incorporating lessons from the first. Here, we present details on the technology and protocols that we used to run these workshops, focusing on the first workshop and then introducing changes made for the second workshop. We discuss what worked well, what didn’t work well, and what we plan to do differently in future workshops.
  • Hernández-Velázquez, Rodrigo; Ziemski, Michal; Bokulich, Nicholas (2025)
    Briefings in Bioinformatics
    Viruses play a crucial role in shaping microbial communities and global biogeochemical cycles, yet their vast genetic diversity remains underexplored. Next-generation sequencing technologies allow untargeted profiling of metagenomes from viral communities (viromes). However, existing workflows often lack modularity, flexibility, and seamless integration with other microbiome analysis platforms. Here, we introduce "ViromeXplore," a set of modular Nextflow workflows designed for efficient virome analysis. ViromeXplore incorporates state-of-the-art tools for contamination estimation, viral sequence identification, taxonomic assignment, functional annotation, and host prediction while optimizing computational resources. The workflows are containerized using Docker and Singularity, ensuring reproducibility and ease of deployment. Additionally, ViromeXplore offers optional integration with QIIME 2 and MOSHPIT, facilitating provenance tracking and interoperability with microbiome bioinformatics pipelines. By providing a scalable, user-friendly, and computationally efficient framework, ViromeXplore enhances viral metagenomic analysis and contributes to a deeper understanding of viral ecology. ViromeXplore is freely available at https://github.com/rhernandvel/ViromeXplore.
  • Sebechlebska, Zuzana; Ziemski, Michal; Bokulich, Nicholas A. (2026)
    Microbiology Resource Announcements
    Technical hurdles are a significant barrier for deposition of next-generation sequence data in public repositories. We present q2-ena-uploader, a software package for automated validation and upload of sequence data. It is BSD-3-licensed and available at https://github.com/bokulich-lab/q2-ena-uploader.
Publications 1 - 8 of 8