<< Back to All Use Cases


Provenance and usage of metagenome data

Short description 

Metagenome analyses explore the functional potential and biodiversity of prokaryotes, eukaryotes, and viruses starting from sequencing data and recovering metagenome-assembled genomes (MAGs). This process involves several complex bioinformatics approaches, such as sequence assembly, genome binning and quality estimation, taxonomic assignment, functional annotation, and data integration with other analyses (metadata or other omics technologies). Researchers working on metagenomic studies require comparable genome sequences and datasets. Many of the metagenomes deposited in public repositories have insufficient or incomplete metadata. This issue also extends to information on the bioinformatic tools used to generate these metagenomes.

To enable meta-analyses on metagenomes, MetaProv will assess and optimize the scalability and reproducibility of data generation tools and workflows and enhance user-friendliness. Creating a suitable tool to track provenance (e.g. used thresholds, tools, database versions) will enhance reproducibility and guide the users to define the necessary computer resources for their data analysis. MetaProv will contribute to developing a modular implementation of the current standards and analytical services provided by NFDI4Microbiota, facilitating the introduction or update of workflows. Ultimately, MetaProv will showcase and enable users to easily search for extra metagenomes that could help answer their research question or test their hypothesis.

Graphical abstract

Graphical abstract Use Case MetaProv

Graphical abstract “Use Case MetaProv” by Ulisses Nunes da Rocha and Jonas Coelho Kasmanas with visual adaptation by Charlie Pauvert is licensed under a Creative Commons Attribution 4.0 International License (CC BY 4.0).

How you can contribute 

You are a … 

  • metagenome researcher looking for sequence data:
    Find a unified and standardized metadata database from SRA metagenomic sequences where you can easily select relevant samples according to your research question.
  • microbiologist with raw metagenomic sequences:
    Contribute with relevant metadata. Go from raw sequences to annotated prokaryotic and eukaryotic genomes and viral sequences with consistent workflow provenance collection and data and metadata standards ready for submission. Get trained on using a complete metagenomic workflow.
  • metagenome analyst frustrated with their workflow reproducibility:
    Get an automatic report of the workflow provenance, contributing to the transparency and reproducibility of the metagenomic analysis.
  • computation scientist, computational biologist, or bioinformatician that wants to start working with metagenome data:
    Get trained on understanding the biological meaning of your data and required parameters for a metagenomic study.


Below, you will find the output provided by the Use Case. If you are interested in the development stages of the project, these are indicated by the following tenses and suffixes.
The output for the research community is already established (-ed for past tense), is currently being (-ing for present progressive) in progress, or will be soon set-up (present tense) for future endeavors.

Database as output
  • Establish unified, standardized metadata databases for metagenomic samples available in public repositories
  • Provide structured databases with recovered reference of prokaryotic metagenome-assembled genomes (MAGs)
  • Provide a structured database with recovered reference of uncultivated viral sequences from Whole-genome sequencing (WGS) samples
Recovery workflow as output
  • Optimizing the recovery of metagenomic sequences from Whole-Genome Sequencing (WGS) samples deposited in public repositories
  • Establishing an easily configurable and modular metagenome-assembled genomes recovery workflow that creates annotated prokaryotic, viral, and eukaryotic sequences from raw reads optimized for different ecological or biotechnological applications
Provenance tool as output
  • Release and maintain the automated provenance collection tool for metagenomic workflows
  • Creating teaching material for performing scalable and reproducible metagenomic studies to enable users to create their metagenome data analysis pipeline


Project Lead

card image

Ulisses Nunes da Rocha

ORCID ID: 0000-0001-6972-6692
Helmholtz Centre for Environmental Research (UFZ), Department of Environmental Microbiology (UMB)
card image

Jonas Coelho Kasmanas

ORCID ID: 0000-0001-6513-5350
Helmholtz Centre for Environmental Research (UFZ), Department of Environmental Microbiology (UMB)




metadata standard


metagenome-assembled genomes