For a rapid and simple quickstart that enables to understand most of the available features we will use as input the Escherichia coli reference genome.
To run the pipeline, we basically need a samplesheet describing the genomes to be samples to be analysed (
--input) and the path to the directory containing the databases used by bacannot (
Downloading/Generating the inputs
Input genome and samplesheet
First we need to download the genome:
After downloading it, we must create a samplesheet for the input data as described in the samplesheet manual page. A proper formated file for this data would look like that:
Download this file and save it as
bacannot_samplesheet.yaml to help on later reference to it
Bacannot databases are not inside the docker images anymore to avoid huge images and problems with connections and limit rates with dockerhub.
Users can directly download pre-formatted databases from Zenodo: https://doi.org/10.5281/zenodo.7615811
Useful for standardization and also overcoming known issues that may arise when formatting databases with
I want to generate a new formatted database
Users must select one of the available profiles: docker or singularity. Conda may come in near future. Please read more about how to proper select NF profiles
Run the pipeline
In this step we will get a major overview of the main pipeline's steps. To run it, we will use the databases (
bacannot_dbs) downloaded in the previous step.
A glimpse over the main outputs produced by bacannot is given at outputs section.
Testing more workflows
Moreover, we have also made available a few example datasets in the pipeline so users can test all capabilities at once, from assembling raw reads to annotating genomes. To test it users must run:
Unfortunately, due to file sizes, we could not provide fast5 files for users to check on the methylation step.
Annotation with bakta
User can also perform the core generic annotation with bakta instead of prokka. Please read the manual.