Short description
Metagenome analyses explore the functional potential and biodiversity of prokaryotes, eukaryotes, and viruses starting from sequencing data and recovering metagenome-assembled genomes (MAGs). This process involves several complex bioinformatics approaches, such as sequence assembly, genome binning and quality estimation, taxonomic assignment, functional annotation, and data integration with other analyses (metadata or other omics technologies).
Researchers working on metagenomic studies require comparable genome sequences and datasets. Many of the metagenomes deposited in public repositories have insufficient or incomplete metadata. This issue also extends to information on the bioinformatic tools used to generate these metagenomes.
To enable meta-analyses on metagenomes, MetaProv will assess and optimize the scalability and reproducibility of data generation tools and workflows and enhance user-friendliness. Creating a suitable tool to track provenance (e.g. used thresholds, tools, database versions) will enhance reproducibility and guide the users to define the necessary computer resources for their data analysis. MetaProv will contribute to developing a modular implementation of the current standards and analytical services provided by NFDI4Microbiota, facilitating the introduction or update of workflows.
Ultimately, MetaProv will showcase and enable users to easily search for extra metagenomes that could help answer their research question or test their hypothesis.
Graphical abstract
Graphical abstract “Use Case MetaProv” by Ulisses Nunes da Rocha and Jonas Coelho Kasmanas with visual adaptation by Charlie Pauvert is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).
How you can contribute
You are a …
- metagenome researcher looking for sequence data:
Find a unified and standardized metadata database from SRA metagenomic sequences where you can easily select relevant samples according to your research question.
- microbiologist with raw metagenomic sequences:
Contribute with relevant metadata. Go from raw sequences to annotated prokaryotic and eukaryotic genomes and viral sequences with consistent workflow provenance collection and data and metadata standards ready for submission. Get trained on using a complete metagenomic workflow.
- metagenome analyst frustrated with their workflow reproducibility:
Get an automatic report of the workflow provenance, contributing to the transparency and reproducibility of the metagenomic analysis.
- computation scientist, computational biologist, or bioinformatician that wants to start working with metagenome data:
Get trained on understanding the biological meaning of your data and required parameters for a metagenomic study.
Planned output
Database
- Establish unified, standardized metadata databases for metagenomic samples available in public repositories
- Provide structured databases with recovered reference of prokaryotic metagenome-assembled genomes (MAGs)
- Provide a structured database with recovered reference of uncultivated viral sequences from Whole-genome sequencing (WGS) samples
Recovery workflow
- Optimizing the recovery of metagenomic sequences from Whole-Genome Sequencing (WGS) samples deposited in public repositories
- Establishing an easily configurable and modular metagenome-assembled genomes recovery workflow that creates annotated prokaryotic, viral, and eukaryotic sequences from raw reads optimized for different ecological or biotechnological applications
- Release and maintain the automated provenance collection tool for metagenomic workflows
Training
- Creating teaching material for performing scalable and reproducible metagenomic studies to enable users to create their metagenome data analysis pipeline
Achievements