Skip to content

Manual

# Get help in the command line
nextflow run fmalmeida/bacannot --help

Tip

All these parameters are configurable through a configuration file. We encourage users to use the configuration file since it will keep your execution cleaner and more readable. See a config example.

Input description

Required

To execute the annotation pipeline users must provide genomic data as either raw reads or assembled genomes as input. When raw reads are used, Unicycler and Flye assemblers are used to create, respectively, shortreads-only and hybrid assemblies, or longreads-only assemblies for the annotation process. Which means, the minimum required input files are:

  • An assembled genome in FASTA format, or;
  • Raw sequencing reads.

Optional

The pipeline accepts as input two other input files types that are used to perform additional annotation processes, they are:

  • path to a directory of FAST5
    • Then used together with nanopore reads it will call DNA methylation with Nanopolish.
  • path to custom databases as described in custom-db reference page
    • These custom databases will be used to perform additional annotation processes using BLAST. Please check the both the explanation about the parameters and about its configuration.

Input/output options

Parameter
Required Default Description
--input NA Input samplesheet describing all the samples to be analysed
--enable_deduplication false Run deduplication command on input reads before assembly. Only useful for samples where reads are given instead of a genome fasta.
--output results Name of directory to store output values. A sub-directory for each genome will be created inside this main directory.
--bacannot_db NA Path for root directory containing required bacannot databases

About the samplesheet

Please read the samplesheet manual page to better understand its format.

Database download options

Parameter
Required Default Description
--get_dbs false Instead of running the analysis workflow, it will try to download required databases and save them in --output
--force_update false Instead of only downloading missing databases, download everything again and overwrite.
--get_zenodo_db false Download pre-built databases stored in zenodo. See quickstart.

The quickstart shows a common usage of these parameters.

Prokka annotation

Parameter
Required Default Description
--prokka_kingdom Bacteria Prokka annotation mode. Possibilities: Archaea
--prokka_genetic_code 11 Genetic Translation code. Must be set if a different kingdom is customized.
--prokka_use_rnammer false Tells Prokka whether to use rnammer instead of barrnap
--prokka_use_pgap false Include comprehensive PGAP hmm database in prokka annotation instead of TIGRFAM. Although comprehensive it increases runtime

About prokka annotation

In order to increase the accuracy of prokka annotation, this pipeline includes an additional HMM database to prokka's defaults. It can be either TIGRFAM (smaller but curated) or PGAP (bigger comprehensive NCBI database that contains TIGRFAM).

Bakta annotation

Using Bakta

If desired, users can use bakta instead of prokka to perform the core generic annotation of their prokaryotic genomes. For that, users must simply download and store bakta database in their machine, and pass its path to bacannot with the --bakta_db parameter.

We opted for having it like this because bakta database is quite big.

Parameter
Required Default Description
--bakta_db NA Path to bakta database. If given, bacannot will use bakta instead of prokka.

Resfinder annotation

The use of this parameter sets a default value for input samples. If a sample has a different value given inside the samplesheet, the pipeline will use, for that sample, the value found inside the samplesheet.

Parameter
Required Default Description
--resfinder_species NA Resfinder species panel. It activates the resfinder annotation process using the given species panel. Check the available species at their main page and in their repository page. If your species is not available in Resfinder panels, you may use it with the "Other" panel (--resfinder_species "Other").

Sourmash comparison

The parameteers below, configure how sourmash is executed in the pipeline. They are relatively simple, and have sensible defaults.

Parameter
Required Default Description
--sourmash_kmer 31 Kmer size for sourmash genome comparison
--sourmash_scale 1000 Scale for for sourmash genome comparison. A scale 1000 on a 5Mb genome will generate 5000 hashes. 1000 is generally recommended by the tool's developers

On/Off processes

Parameter
Required Default Description
--skip_virulence_search false Tells whether not to run virulence factors annotation. It skips both vfdb and victors annotation
--skip_plasmid_search false Tells whether not to run plasmid detection/typing modules
--skip_resistance_search false Tells whether not to run resistance genes annotation modules
--skip_iceberg_search false Tells whether not to run mobile genetic elements annotation with ICEberg
--skip_prophage_search false Tells whether not to run prophage annotation modules
--skip_kofamscan false Tells whether not to run KEGG orthology (KO) annotation with KofamScan
--skip_antismash false Tells whether or not to run antiSMASH (secondary metabolite) annotation. AntiSMASH is executed using only its core annotation modules in order to keep it fast.
--skip_sourmash false Tells whether or not to run sourmash to compare input genomes and closest reference genomes
--skip_circos false Tells whether or not to run the final CIRCOS module. When the input genome has many contigs, its results are not meaningful.
--skip_integron_finder false Tells whether or not to run the integron finder tool.

Custom databases

Users can give fasta files (nucl or prot) properly formatted or a text file containing a list of NCBI protein IDs (one per line). Please check the custom db manual for more information. Sequences are searched against the genome, with blastn for nucl sequences and tblastn for prot sequences.

Parameter
Required Default Description
--custom_db NA Custom gene nucleotide/protein databases to be used for additional annotations. N files are accepted separated by commas. E.g. --custom_db db1.fasta,db2.fasta,db3.fasta.
--ncbi_proteins NA Path to file with NCBI protein IDs. The pipeline will download, format and use them for additional annotation.

Annotation thresholds

Parameter
Required Default Description
--blast_virulence_minid 90 Identity (%) threshold to be used when annotating virulence factors from VFDB and Victors
--blast_virulence_mincov 90 Coverage (%) threshold to be used when annotating virulence factors from VFDB and Victors
--blast_resistance_minid 90 Identity (%) threshold to be used when annotating AMR genes with CARD-RGI, Resfinder, ARGminer and AMRFinderPlus.
--blast_resistance_mincov 90 Coverage (%) threshold to be used when annotating AMR genes with Resfinder, ARGminer and AMRFinderPlus. CARD-RGI is not affected.
--plasmids_minid 90 Identity (%) threshold to be used when detecting plasmids with Plasmidfinder
--plasmids_mincov 60 Coverage (%) threshold to be used when detecting plasmids with Plasmidfinder
--blast_MGEs_minid 85 Coverage (%) threshold to be used when annotating AMR genes with Resfinder, ARGminer and AMRFinderPlus. CARD-RGI is not affected.
--blast_MGEs_mincov 85 Coverage (%) threshold to be used when annotating prophages and mobile elements from PHAST and ICEberg databases
--blast_custom_minid 65 Identity (%) threshold to be used when annotating with user's custom databases
--blast_custom_mincov 65 Coverage (%) threshold to be used when annotating with user's custom databases

Merge distance

Parameter
Required Default Description
--bedtools_merge_distance NA Minimum number of required overlapping bases to merge genes. By default it is not executed.

Non-core tools versions

Users can now select the version of the non-core tools Bakta, Unicyler and Flye. These tools now have a parameter which controls which tag, thus version, from quay.io to use.

Parameter Default Description
--bakta_version 1.7.0--pyhdfd78af_1 Bakta tool version
--flye_version 2.9--py39h39abbe0_0 Flye tool version
--unicycler_version 0.4.8--py38h8162308_3 Unicycler tool version

Max job request options

Set the top limit for requested resources for any single job. If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.

Note

Note that you can not increase the resources requested by any job using these options. For that you will need your own configuration file. See the nf-core website for details.

Parameter Default Description
--max_cpus 16 Maximum number of CPUs that can be requested for any single job
--max_memory 20.GB Maximum amount of memory that can be requested for any single job
--max_time 40.h Maximum amount of time that can be requested for any single job