Quickstart

For a rapid and simple quickstart that enables to understand most of the available features we will use as input the Escherichia coli reference genome.

Required inputs

To run the pipeline, we basically need a samplesheet describing the genomes to be samples to be analysed (--input) and the path to the directory containing the databases used by bacannot (--bacannot_db).

Downloading/Generating the inputs

Input genome and samplesheet

First we need to download the genome:

# Download the ecoli ref genome
wget -O ecoli_ref.fna.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.2_ASM886v2/GCF_000008865.2_ASM886v2_genomic.fna.gz
gzip -d ecoli_ref.fna.gz

After downloading it, we must create a samplesheet for the input data as described in the samplesheet manual page. A proper formated file for this data would look like that:

samplesheet: # this header is required
  - id: ecoli
    assembly: ecoli_ref.fna
    resfinder: Escherichia coli

Tip

Download this file and save it as bacannot_samplesheet.yaml to help on later reference to it

Bacannot databases

Bacannot databases are not inside the docker images anymore to avoid huge images and problems with connections and limit rates with dockerhub.

Pre-formatted

Users can directly download pre-formatted databases from Zenodo: https://doi.org/10.5281/zenodo.7615811

Useful for standardization and also overcoming known issues that may arise when formatting databases with singularity profile.

A module to download the latest pre-formatted database has also been made available:

# Download pipeline pre-built databases
nextflow run fmalmeida/bacannot --get_zenodo_db --output ./ -profile <docker/singularity>

I want to generate a new formatted database

# Download pipeline databases
nextflow run fmalmeida/bacannot \
    --get_dbs \
    --output bacannot_dbs \
    -profile docker

About profiles

Users must select one of the available profiles: docker or singularity. Conda may come in future. Please read more about how to proper select NF profiles

Run the pipeline

In this step we will get a major overview of the main pipeline's steps. To run it, we will use the databases (bacannot_dbs) downloaded in the previous step.

# Run the pipeline using the Escherichia coli resfinder database
nextflow run fmalmeida/bacannot \
    --input bacannot_samplesheet.yaml \
    --output _ANNOTATION \
    --bacannot_db ./bacannot_dbs \
    --max_cpus 10 \
    -profile docker

About resfinder

The resfinder species could also be selected via the command line with --resfinder_species. Please, read more about it at manual and samplesheet reference pages.

Outputs

A glimpse over the main outputs produced by bacannot is given at outputs section.

Testing more workflows

Moreover, we have also made available a few example datasets in the pipeline so users can test all capabilities at once, from assembling raw reads to annotating genomes. To test it users must run:

# Run the pipeline using the provided (bigger) test dataset
nextflow run fmalmeida/bacannot -profile docker,test --bacannot_db ./bacannot_dbs --max_cpus 10

# Or run the quick test
nextflow run fmalmeida/bacannot -profile docker,quicktest --bacannot_db ./bacannot_dbs ---max_cpus 10

Unfortunately, due to file sizes, we could not provide fast5 files for users to check on the methylation step.

Annotation with bakta

User can also perform the core generic annotation with bakta instead of prokka. Please read the manual.