Manual
Tip
All these parameters are configurable through a configuration file. We encourage users to use the configuration file since it will keep your execution cleaner and more readable. See a config example.
Input description
Required
To execute the annotation pipeline users must provide genomic data as either raw reads or assembled genomes as input. When raw reads are used, Unicycler and Flye assemblers are used to create, respectively, shortreads-only and hybrid assemblies, or longreads-only assemblies for the annotation process. Which means, the minimum required input files are:
- An assembled genome in FASTA format, or;
- Raw sequencing reads.
Optional
The pipeline accepts as input two other input files types that are used to perform additional annotation processes, they are:
- path to a directory of FAST5
- Then used together with nanopore reads it will call DNA methylation with Nanopolish.
- path to custom databases as described in custom-db reference page
- These custom databases will be used to perform additional annotation processes using BLAST. Please check the both the explanation about the parameters and about its configuration.
Input/output options
Parameter |
Required | Default | Description |
---|---|---|---|
--input |
NA | Input samplesheet describing all the samples to be analysed | |
--enable_deduplication |
false | Run deduplication command on input reads before assembly. Only useful for samples where reads are given instead of a genome fasta. | |
--output |
results | Name of directory to store output values. A sub-directory for each genome will be created inside this main directory. | |
--bacannot_db |
NA | Path for root directory containing required bacannot databases |
About the samplesheet
Please read the samplesheet manual page to better understand its format.
Database download options
Parameter |
Required | Default | Description |
---|---|---|---|
--get_dbs |
false | Instead of running the analysis workflow, it will try to download required databases and save them in --output |
|
--force_update |
false | Instead of only downloading missing databases, download everything again and overwrite. | |
--get_zenodo_db |
false | Download pre-built databases stored in zenodo. See quickstart. |
The quickstart shows a common usage of these parameters.
Prokka annotation
Parameter |
Required | Default | Description |
---|---|---|---|
--prokka_kingdom |
Bacteria | Prokka annotation mode. Possibilities: Archaea | |
--prokka_genetic_code |
11 | Genetic Translation code. Must be set if a different kingdom is customized. | |
--prokka_use_rnammer |
false | Tells Prokka whether to use rnammer instead of barrnap | |
--prokka_use_pgap |
false | Include comprehensive PGAP hmm database in prokka annotation instead of TIGRFAM. Although comprehensive it increases runtime |
About prokka annotation
In order to increase the accuracy of prokka annotation, this pipeline includes an additional HMM database to prokka's defaults. It can be either TIGRFAM (smaller but curated) or PGAP (bigger comprehensive NCBI database that contains TIGRFAM).
Bakta annotation
Using Bakta
If desired, users can use bakta
instead of prokka
to perform the core generic annotation of their prokaryotic genomes. For that, users must simply download and store bakta database in their machine, and pass its path to bacannot
with the --bakta_db
parameter.
We opted for having it like this because bakta database is quite big.
Parameter |
Required | Default | Description |
---|---|---|---|
--bakta_db |
NA | Path to bakta database. If given, bacannot will use bakta instead of prokka. |
Resfinder annotation
The use of this parameter sets a default value for input samples. If a sample has a different value given inside the samplesheet, the pipeline will use, for that sample, the value found inside the samplesheet.
Parameter |
Required | Default | Description |
---|---|---|---|
--resfinder_species |
NA | Resfinder species panel. It activates the resfinder annotation process using the given species panel. Check the available species at their main page and in their repository page. If your species is not available in Resfinder panels, you may use it with the "Other" panel (--resfinder_species "Other" ). |
Sourmash comparison
The parameteers below, configure how sourmash is executed in the pipeline. They are relatively simple, and have sensible defaults.
Parameter |
Required | Default | Description |
---|---|---|---|
--sourmash_kmer |
31 | Kmer size for sourmash genome comparison | |
--sourmash_scale |
1000 | Scale for for sourmash genome comparison. A scale 1000 on a 5Mb genome will generate 5000 hashes. 1000 is generally recommended by the tool's developers |
On/Off processes
Parameter |
Required | Default | Description |
---|---|---|---|
--skip_virulence_search |
false | Tells whether not to run virulence factors annotation. It skips both vfdb and victors annotation | |
--skip_plasmid_search |
false | Tells whether not to run plasmid detection/typing modules | |
--skip_resistance_search |
false | Tells whether not to run resistance genes annotation modules | |
--skip_iceberg_search |
false | Tells whether not to run mobile genetic elements annotation with ICEberg | |
--skip_prophage_search |
false | Tells whether not to run prophage annotation modules | |
--skip_kofamscan |
false | Tells whether not to run KEGG orthology (KO) annotation with KofamScan | |
--skip_antismash |
false | Tells whether or not to run antiSMASH (secondary metabolite) annotation. AntiSMASH is executed using only its core annotation modules in order to keep it fast. | |
--skip_sourmash |
false | Tells whether or not to run sourmash to compare input genomes and closest reference genomes | |
--skip_circos |
false | Tells whether or not to run the final CIRCOS module. When the input genome has many contigs, its results are not meaningful. |
|
--skip_integron_finder |
false | Tells whether or not to run the integron finder tool. |
Custom databases
Users can give fasta files (nucl or prot) properly formatted or a text file containing a list of NCBI protein IDs (one per line). Please check the custom db manual for more information. Sequences are searched against the genome, with blastn
for nucl sequences and tblastn
for prot sequences.
Parameter |
Required | Default | Description |
---|---|---|---|
--custom_db |
NA | Custom gene nucleotide/protein databases to be used for additional annotations. N files are accepted separated by commas. E.g. --custom_db db1.fasta,db2.fasta,db3.fasta . |
|
--ncbi_proteins |
NA | Path to file with NCBI protein IDs. The pipeline will download, format and use them for additional annotation. |
Annotation thresholds
Parameter |
Required | Default | Description |
---|---|---|---|
--blast_virulence_minid |
90 | Identity (%) threshold to be used when annotating virulence factors from VFDB and Victors | |
--blast_virulence_mincov |
90 | Coverage (%) threshold to be used when annotating virulence factors from VFDB and Victors | |
--blast_resistance_minid |
90 | Identity (%) threshold to be used when annotating AMR genes with CARD-RGI, Resfinder, ARGminer and AMRFinderPlus. | |
--blast_resistance_mincov |
90 | Coverage (%) threshold to be used when annotating AMR genes with Resfinder, ARGminer and AMRFinderPlus. CARD-RGI is not affected. | |
--plasmids_minid |
90 | Identity (%) threshold to be used when detecting plasmids with Plasmidfinder | |
--plasmids_mincov |
60 | Coverage (%) threshold to be used when detecting plasmids with Plasmidfinder | |
--blast_MGEs_minid |
85 | Coverage (%) threshold to be used when annotating AMR genes with Resfinder, ARGminer and AMRFinderPlus. CARD-RGI is not affected. | |
--blast_MGEs_mincov |
85 | Coverage (%) threshold to be used when annotating prophages and mobile elements from PHAST and ICEberg databases | |
--blast_custom_minid |
65 | Identity (%) threshold to be used when annotating with user's custom databases | |
--blast_custom_mincov |
65 | Coverage (%) threshold to be used when annotating with user's custom databases |
Merge distance
Parameter |
Required | Default | Description |
---|---|---|---|
--bedtools_merge_distance |
NA | Minimum number of required overlapping bases to merge genes. By default it is not executed. |
Non-core tools versions
Users can now select the version of the non-core tools Bakta, Unicyler and Flye. These tools now have a parameter which controls which tag, thus version, from quay.io to use.
Parameter | Default | Description |
---|---|---|
--bakta_version |
1.7.0--pyhdfd78af_1 | Bakta tool version |
--flye_version |
2.9--py39h39abbe0_0 | Flye tool version |
--unicycler_version |
0.4.8--py38h8162308_3 | Unicycler tool version |
Max job request options
Set the top limit for requested resources for any single job. If you are running on a smaller system, a pipeline step requesting more resources than are available may cause the Nextflow to stop the run with an error. These options allow you to cap the maximum resources requested by any single job so that the pipeline will run on your system.
Note
Note that you can not increase the resources requested by any job using these options. For that you will need your own configuration file. See the nf-core website for details.
Parameter | Default | Description |
---|---|---|
--max_cpus |
16 | Maximum number of CPUs that can be requested for any single job |
--max_memory |
20.GB | Maximum amount of memory that can be requested for any single job |
--max_time |
40.h | Maximum amount of time that can be requested for any single job |