Frequently Asked Questions

General Topics

The NFDI is a non-profit association that aims to manage research data systematically, preserve it in the long term, and make it accessible both nationally and internationally. For more information, please refer to the NFDI website: NFDI | Nationale Forschungsdateninfrastruktur e. V..

NFDI4Microbiota is a consortium within the NFDI that specializes in microbiological data. It is made up of ten German research institutions with high expertise in microbiology. The consortium aims to advance microbiological research through digital transformation. For more information, please refer to the NFDI4Microbiota Knowledge Base.

If you would like to get involved with NFDI4Microbiota, please visit our website: NFDI4Microbiota-Home. There, you will find information on participation options, like our ambassador program, as well as on the services and infrastructures we offer, including ARUNA Object Storage, the Cloud-based Workflow Manager (CloWM), training events, the Knowledge Base and the Helpdesk.

If you would like to register as a participant in NFDI4Microbiota, please follow the instructions on this page: Participants.

The NFDI4Microbiota Ambassador program aims to connect and train early-career researchers within the microbiology research community. Our goal is to help these researchers expand their networks and teach them best practices for handling data, metadata standards, standardized bioinformatic workflows, and related topics. For more information on the program, please refer to this page: The Ambassador program.

If you would like to become an NFDI4Microbiota ambassador, please register here: NFDI4Microbiota Ambassador Registration

We welcome questions from all individuals working with microbial data – whether students, early-career researchers, senior scientists, or data stewards. Support is provided irrespective of the organism (e.g. bacteria, archaea, eukaryotic microbes or viruses), environment (e.g. soil, aquatic, host-associated or plant), or data type (e.g. nucleic acid sequences, protein data, functional genomics, image data).

NFDI4Microbiota Services

You can find news, events and newsletters in the ‘Newsroom’ tab of the main NFDI4Microbiota page: NFDI4Microbiota-Home. You can also subscribe to our Newsletter and follow us on LinkedIn, Mastodon, and Bluesky.

NFDI4Microbiota supports a variety of microbial data, including, but not limited to, nucleic acid sequences, protein data, functional genomics and image data. You can find a list of common microbiology data types in our Knowledge Base: Research Data.

NFDI4Microbiota offers a range of specialized services and tools to support you throughout the research data lifecycle. These include a Data Management Plan (DMP) template, a collection of experimental protocols, recommendations on metadata standards, and the databases StrainInfo and VirJenDB. You can find out more about all our services on our website: NFDI4Microbiota Services.

No, all of NFDI4Microbiota’s services and platforms are offered free-of-charge, since we are funded by the German Research Foundation (Deutsche Forschungsgesellschaft - DFG).

The Cloud-based Workflow Manager (CloWM)

The Cloud-based Workflow Manager (CloWM) is a fully open platform that the research community and non-profit organisations can use free of charge. To make your workflow available on the CloWM platform, you must first apply for the developer role by emailing info@clowm.bi.denbi.de. The workflow must also be written in the Nextflow workflow language and adhere to the NF-Core standard, meaning that every step must be containerized to guarantee maximum portability, and the workflow must be well documented. Please note that workflows must undergo a review process by workflow reviewers to ensure that CloWM compute resources are used appropriately and not misused.

In the unlikely event that CloWM is no longer operational or available, this would not directly affect the availability of your workflows. Even if a workflow is only registered on the CloWM platform, it remains and is stored in its original GitHub or GitLab repository. The workflow can be accessed, downloaded and modified from this repository completely independently of CloWM.

NF-Core, EPI2ME Labs and CloWM all adhere to the same workflow standard, known as the NF-Core standard. If developers follow this standard and the recommended best practices, the workflow should run everywhere. The workflows can also technically be added to other public workflow collections, such as nf-core. However, we would like to point out that they have their own workflow review process.

Currently, no data encryption is used on the platform. However, the platform provides secure, personalized access.

Although we do not explicitly prohibit this, we would like to point out that we do not take responsibility for any user data stored on the platform.

CloWM relies on a highly scalable execution layer powered by de.NBI cloud resources, one of the largest academic clouds in Europe. Consequently, there is almost no limit on resources. If you are planning something that is particularly demanding in terms of computing or storage, or if the current quotas are insufficient for your needs, please contact info@clowm.bi.denbi.de.

None. Workflows can be executed via a user-friendly web interface.

Training

In general, anyone can attend the training events advertised on our website, unless otherwise specified. Some events are organized for specific research groups or institutions only, and these are indicated as ‘closed’. You can view upcoming training events here: Training.

You can view upcoming training events here: Training.

We currently offer on-demand training on topics related to research data management, such as data management plans (DMPs), electronic lab notebooks (ELNs), data organisation, data documentation, data sharing and publishing, and data discovery and reuse.

Past training materials are archived in the Zenodo Community.

Research Data Management (RDM)

Research Data Management (RDM) is the care and maintenance required to (1) obtain high-quality data, (2) make the data available and usable in the long term and (3) make research results reproducible beyond the research project. For more information on RDM, check out our Knowledge Base: The NFDI4Microbiota Knowledge Base.

For general recommendations on metadata standards, please refer to our Knowledge Base. We have also collected important ethnical, biological and environmental minimal metadata suggestions on our GitHub page.

Third parties, such as research funders, institutions and publishers, may have specific requirements regarding how researchers should handle their data. One example of such a requirement in the field of microbiology is Nature Microbiology’s policy on reporting standards and availability of data, materials, code and protocols.

  • RDM platforms:

    • BExIS2 by NFDI4Biodiversity at FSU Jena
    • Coscine by RWTH Aachen
      • Coscine is a research data management platform for your research projects. Coscine adheres to the FAIR principles with structured storage, metadata management, collaborative working and long-term storage of your data from research projects in accordance with good scientific practice. To get started simply register with your university account or ORCID and create a project.
      • Coscine is available for use by employees of participating universities or research institutions in North Rhine-Westphalia (NRW). Usage is also permitted by third parties who have been invited by an employee of a participating university or research institution in NRW to collaborate on a Coscine project.
      • For more information check out the Coscine documentation page: About Coscine - Documentation | Coscine
    • GfBio consortium services
    • Research Data Management Competence Base (RDM Compas) by KonsortSWD (social, behavioural, educational and economic sciences)
  • Tools:

    • bio.tools (ELIXIR): essential scientific and technical information on software tools, databases and services for bioinformatics and the life science.

    • The Research Data Management toolkit for Life Sciences (RDMkit by ELIXIR)

    • ToolPool Gesundheitsforschung (TMF): The TMF-Portal was launched in 2017 and is operated by the Technologie- und Methodenplattform für die vernetzte medizinische Forschung e.V. (TMF). It provides a collection of IT infrastructure-related products for networked medical research. There are products from the TMF and from other providers such as companies and research institutions. There are over 80 products, more than half of which are software tools. Other product categories include eServices, reports and expert opinions, working materials and checklists, consultancy services and training courses. Products can be filtered by category, topic, project phase, keywords, provider and year. Similar products can also be compared using a feature matrix. On each product page you will find information about the use of the product in projects, testimonials from other users and references. New products can be submitted by anyone. Each product is then reviewed by a team of TMF members against a set of criteria before being added to the portal.

      To use the portal, follow this link. Many offerings are free and can be accessed directly from the portal. Software products usually require local installation and configuration.

  • SOPs:

  • Caliskan, A., Dangwal, S., & Dandekar, T. (2023). Metadata integrity in bioinformatics: Bridging the gap between data and knowledge. Computational and Structural Biotechnology Journal, 21, 4895–4913. https://doi.org/10.1016/j.csbj.2023.10.006
  • Egli, A., Schrenzel, J., & Greub, G. (2020). Digital microbiology. Clinical Microbiology and Infection, 26(10), 1324–1331. https://doi.org/10.1016/j.cmi.2020.06.023
  • Kyrpides, N. C., Eloe-Fadrosh, E. A., & Ivanova, N. N. (2016). Microbiome Data Science: Understanding Our Microbial Planet. Trends in Microbiology, 24(6), 425–427. https://doi.org/10.1016/j.tim.2016.02.011
  • Nasr, E., Amato, P., Bernt, M., Bhardwaj, A., Blankenberg, D., Brites, D., Cumbo, F., Do, K., Ferrari, E., Griffin, T. J., Gruening, B., Hiltemann, S., Hyde, C. J., Jagtap, P., Mehta, S., Métris, K. L., Momin, S., Oba, A., Pavloudi, C., … Batut, B. (2024). The Microbiology Galaxy Lab: A community-driven gateway to tools, workflows, and training for reproducible and FAIR analysis of microbial data. Cold Spring Harbor Laboratory. https://doi.org/10.1101/2024.12.23.629682
  • Zhou, R., Ng, S. K., Sung, J. J. Y., Goh, W. W. B., & Wong, S. H. (2023). Data pre-processing for analyzing microbiome data – A mini review. Computational and Structural Biotechnology Journal, 21, 4804–4815. https://doi.org/10.1016/j.csbj.2023.10.001

Plan

A Data Management Plan (DMP) is a formal and living document that defines responsibilities and provides guidance. It describes data and data management during a project as well as measures for archiving and making the data and research results available, usable and understandable after the project has ended.

The NFDI4Microbiota DMP template is available on Zenodo.

There is no need to use a simple text editor anymore, many different tools are available for writing a DMP. These tools offer similar functions and benefits and mainly differ in DMP specifications requested by different funding agencies. Using a DMP tool makes managing a DMP and collaborating much easier.

The Research Data Management Organiser (RDMO) is the most common DMP tool used in Germany. It is an open-source web application developed to support the structured and collaborative planning and implementation of RDM. It allows users to create DMPs in text format and offers templates for questionnaires, project descriptions, tasks, and DMPs. Input is collected through a structured interview, and all responses are stored in a database. Question catalogues can be modified without losing information, and many questions allow dataset-specific answers. Key features include versioning, import/export functions, collaborative editing, snapshots, a timeline of RDM-related tasks, and notifications for upcoming events. DMP4NFDI offers demonstartions on how to set up DMPS using RDMO.

DMPonline was developed by the Digital Curation Centre in the UK. It is an open-source, web-based tool designed for researchers, primarily those working on UK-funded projects, though it is also used internationally. DMPonline enables users to create, review, and share DMPs that comply with institutional and funder requirements.

The Data Stewardship Wizard (DSW) was developed by ELIXIR Netherlands and ELIXIR Czech Republic. It is an open-source, dynamic web-based system aimed at data stewards who support researchers in creating machine-readable DMPs. The DSW is recommended by the Horizon Europe Programme Guide. It features user-friendly questionnaires, a variety of built-in templates, and the ability to develop custom templates. Various ELIXIR nodes offer training on how to use the DSW.

Other DMP tools include ARBOS, DataPLAN, DataWiz, DMPRoadMap, DMPTool, GFBio DMPT and TUB-DMP. A comprehensive guide to DMP tools is available on Zenodo.

Collect

Microbial data are highly heterogeneous, as are the methods used to collect them. The following list comprises examples of microbial data and the collection method(s) associated with each:

  • Microbiome data: High-Throughput Sequencing (HTS), Next Generation Sequencing (NGS)
  • Crystallographic data for small molecules: Single crystal X-ray diffraction
  • Protein sequences: Mass spectrometry, Edman degradation using a protein sequenator
  • Nucleic acid sequences: (RT-)PCR, sequencing, …
  • Linked genotype and phenotype data: High-throughput genotyping and ongoing patient care/clinical trial 
  • Macromolecular structures: Diffraction, electron cryo-microscopy
  • Clinical data: Ongoing patient care, clinical trial
  • Functional genomics and gene expression data: High-throughput functional genomics experiments
  • Standardized bacterial information: Culture collections, species descriptions

Protocols for collecting microbial data can be found, for example, on the NFDI4Microbiota protocols.io workspace and on the websites of the International Human Microbiome Standards (IHMS) and the Earth Microbiome Project (EMP). IHMS’s protocols focus on the collection, identification, extraction, sequencing and analysis of faecal samples. The EMP’s protocols focus on the extraction and sequencing of DNA from environmental samples. The NFDI4Microbiota’s protocols.io instance aims to collect relevant protocols from the community for the community.

To select an ELN, we recommend that you define selection criteria that reflect the needs of your institution and labs. You can then use these criteria to compare the available ELNs with your requirements, for example, by entering the criteria into the ELN Finder. The ELN Finder is a tool developed by the University and State Library Darmstadt and ZB MED – Information Centre for Life Sciences. It is an interactive tool for filtering ELNs based on 40 criteria.

Important criteria to consider include discipline, whether the ELN is proprietary or open-source, whether it is a cloud-computing service (SaaS) or self-hosted, and performance and stability. Other important criteria to consider when selecting an ELN include your lab’s established practices and preferences, your institution’s ELN policy, the security level needed for your data and your budget.

Once you have selected an ELN, you need to licence it and introduce it to your institution’s research groups. This involves ensuring that all technical requirements are met (e.g. a stable wireless connection), creating and implementing a distribution plan, training users, and setting up support services. Finally, you will need to monitor the application.

If you would like to find out more about working with eLabFTW, take a look at this demo with eLabFTW or this video tutorial from ZB MED, which explores both eLabFTW and Labfolder. You can also request NFDI4Microbiota training on how to work with eLabFTW if you are interested.

If your home institution does not support an open-source ELN and your research group would like to set up a proprietary solution, we can offer insights and suggestions based on our experience with eLabJournal, for instance.

For data collection that does not require wet lab experiments, there are alternative documentation methods for both the collection and subsequent analysis, such as README files, literate programming, narrative descriptions, data dictionaries and codebooks.

Process

The most common formats for microbial data are as follows:

  • FASTA (*.fasta, .fas, .fa, .fna, .ffn, .faa, .mpfa, *.frn) for nucleotide and protein sequences.
  • FASTQ (*.fq, *.fastq) for raw biological sequences and their corresponding quality scores.
  • General Feature Format (GFF) (*.gff, *.gff3) sequence annotations.
  • Sequence Alignment Map (SAM) (*.sam) and Binary Alignment Map (BAM)  (*. bam) for biological sequences aligned to a reference sequence.
  • Variant Call Format (VCF) (*.vcf) for gene sequence variations.

For more information on suitable file formats for long-term archiving, please refer to the Digital Preservation page of our Knowledge Base.

Analyzing Data

If you would like to learn more about using Bash, Python and R for data analysis, please take a look at our training calendar to see if there are any upcoming training events on these topics, or take a look at the training materials we have published on Zenodo. You can also take a look at The Carpentries’ teaching materials, such as the following:

One way to share your scripts is to archive your GitHub repository on Zenodo and assign it a Digital Object Identifier (DOI). To do so, follow these steps:

  1. Create a Zenodo account.
  2. Create a Binder-ready repository on GitHub (see here for instructions).
  3. Make sure your repository is ready to be published.
  4. Create a Zenodo DOI for your repository (see here for best practices).
  5. Create a Binder link for your Zenodo DOI (see here for the form).

Workflow Standards and Provenance

We currently offer a guided evaluation system for provenance standards for your workflows. If you are interested in rating your workflows and tools, please reach out to the helpdesk. We recommend evaluating five different aspects individually to check if they are easily reproducible and verifiable (see below). You can find more detailed information on the guidelines and rating system in this guideline (Provenance Guidelines for Workflow and Tool Developers)

(1) Improve reproducibility
(2) Version report (output file: versions.yml)
(3) Data and metadata management (output file: provenance.yml)
(4) Documentation
(5) Validation, cooperation and sharing

Preserving Data

A proper backup and storage strategy for any type of research should include the following:

  • Consult your local IT or library staff to learn about backup and storage options.
  • Follow the 3-2-1 rule when backing up your data: keep 3 copies of any important file; store your files on 2 different types of media; and keep at least 1 copy offsite or in the cloud.
  • Back up versions.
  • Use incremental backups or specialized storage systems for large data sets.
  • Generate checksums for files and compare them after data transfers to verify data integrity. A checksum is a digit representing the sum of the correct digits in a piece of stored or transmitted digital data, against which later comparisons can be made to detect errors in the data.
  • Plan for regular updates and migration to newer technologies.

As a researcher, if you want to ensure that your data is preserved in the long term, you must handle it sustainably. This includes complying with community standards (e.g. your discipline’s metadata standard), providing curated and extensive metadata and contextual information for your data (e.g. comments, detailed descriptions of methods, units and formats, and user licences), organizing your data, validating your data (i.e. cleaning and quality-controlling your data), and using acceptable file formats.

Sharing Data

If you want to share your microbial data while you are still working on your research project, you can use tools such as Academic Torrents, B2DROP or the Open Science Framework (OSF). There are also Git-based tools, such as GitHub, GitLab and DataLad. For large datasets, take a look at Git-annex and Git Large File Storage, which provide file management and versioning systems without requiring you to check the file contents into Git. If you are working with health-related data, take a look at the Framework for the Responsible Sharing of Genomic and Health-Related Data. This framework centers on human rights and is intended for researchers, clinicians, data generators and others. Its foundational principles are to respect individuals, families and communities, advance research and scientific knowledge, promote health and well-being, and foster trust, integrity and reciprocity.

Publishing in Open Access means that the research publication or data is publicly and freely available, allowing anyone to access it.

If you are looking for a trustworthy repository for your microbial data, please refer to our Knowledge Base page on Data Repositories or visit re3data.org.

Reusing Data

If you are looking for existing, trusted microbial datasets to reuse, please refer to our ‘Resources to Facilitate Data Reuse in Microbiology’ list on our Knowledge Base page on Data Reuse.