For a rapid and simple quickstart that enables to understand most of the available features we will use as input the Escherichia coli reference genome.
To run the pipeline, we basically need a samplesheet describing the genomes to be samples to be analysed (
--input) and the path to the directory containing the databases used by bacannot (
Downloading/Generating the inputs
Input genome and samplesheet
First we need to download the genome:
# Download the ecoli ref genome wget -O ecoli_ref.fna.gz https://ftp.ncbi.nlm.nih.gov/genomes/all/GCF/000/008/865/GCF_000008865.2_ASM886v2/GCF_000008865.2_ASM886v2_genomic.fna.gz gzip -d ecoli_ref.fna.gz
After downloading it, we must create a samplesheet for the input data as described in the samplesheet manual page. A proper formated file for this data would look like that:
samplesheet: # this header is required - id: ecoli assembly: ecoli_ref.fna resfinder: Escherichia coli
Download this file and save it as
bacannot_samplesheet.yaml to help on later reference to it
Bacannot databases are not inside the docker images anymore to avoid huge images and problems with connections and limit rates with dockerhub.
Users can directly download pre-formatted databases from Zenodo: https://doi.org/10.5281/zenodo.7615811
Useful for standardization and also overcoming known issues that may arise when formatting databases with
I want to generate a new formatted database
# Download pipeline databases nextflow run fmalmeida/bacannot \ --get_dbs \ --output bacannot_dbs \ -profile docker
Users must select one of the available profiles: docker or singularity. Conda may come in future. Please read more about how to proper select NF profiles
Run the pipeline
In this step we will get a major overview of the main pipeline's steps. To run it, we will use the databases (
bacannot_dbs) downloaded in the previous step.
# Run the pipeline using the Escherichia coli resfinder database nextflow run fmalmeida/bacannot \ --input bacannot_samplesheet.yaml \ --output _ANNOTATION \ --bacannot_db ./bacannot_dbs \ --max_cpus 10 \ -profile docker
The resfinder species could also be selected via the command line with
--resfinder_species. Please, read more about it at manual and samplesheet reference pages.
A glimpse over the main outputs produced by bacannot is given at outputs section.
Testing more workflows
Moreover, we have also made available a few example datasets in the pipeline so users can test all capabilities at once, from assembling raw reads to annotating genomes. To test it users must run:
# Run the pipeline using the provided (bigger) test dataset nextflow run fmalmeida/bacannot -profile docker,test --bacannot_db ./bacannot_dbs --max_cpus 10 # Or run the quick test nextflow run fmalmeida/bacannot -profile docker,quicktest --bacannot_db ./bacannot_dbs ---max_cpus 10
Unfortunately, due to file sizes, we could not provide fast5 files for users to check on the methylation step.
Annotation with bakta
User can also perform the core generic annotation with bakta instead of prokka. Please read the manual.