Samplesheet (input files)
The samplesheet is a required YAML document that is used to describe the input samples and, if desired, its "sample-specific" configuration. The input samplesheet is expected with the
Get a template
A samplesheet template can be downloaded with:
nextflow run fmalmeida/bacannot --get_samplesheet
The first line of the file must be the header followed by an indentation:
Each indentation level is set by two blank spaces
Each sample must be identified by the tag id in the YAML file, followed by the sample's input tags (keys) that will accomodate the files and values to be used by the pipeline for each sample.
Input tags (keys)
Input tags are used to represent/set the inputs that shall be used for each input sample. By default, for resfinder species panel, if it is not set inside the samplesheet, the pipeline will use the defaults set via the "nextflow config file" or via the command line. Otherwise, if set inside the samplesheet, it will overwrite the pipeline's configuration for that specific sample.
Whenever an assembled genome is given with
assembly key, the pipeline will not perform genome assembly even if reads are given. Users may use the
assembly tag together with
fast5 tags, which will trigger methylation calling with Nanopolish
Please, the manual reference page to understand the global/defaults configurations.
The available keys (input tags) are:
|Input tags (YAML keys)||Description|
||Used to set path to genomic FASTA of an assembled bacterial genome|
||Used to set path to illumina raw reads (paired, unpaired or both)|
||Used to set path to pacbio raw reads (mutually excludable with
||Used to set path to nanopore raw reads (mutually excludable with
||Used to set path to nanopore raw FAST5 data (used together with
||Used to set resfinder species panel for resistance annotation with resfinder (must be exactly as shown in their web page). If your species is not available in Resfinder panels, you may use it with the
About illumina tag/key
- When using both paired and unpaired reads, the paired reads must be given first, in the order: pair 1, pair 2, unpaired.
- Otherwise, if using only paired reads, they must be given in the order: pair 1, pair 2.
- If using only unpaired reads, only one entry is expected. Check samples in the template to 1, 4 and 5 to understand it.
- The illumina tag is the only one that must be set in indented newlines
- two white spaces relative to the
- one line per read as shown in the complete samplesheet example.
All the other input tags must be set in the same line, right after the separator (":"), without quotations, white spaces or signs.
Complete samplesheet example
samplesheet: - id: sample_1 illumina: - sample_1/1.fastq - sample_1/2.fastq nanopore: sample_1/ont.fastq - id: sample_2 assembly: sample_2/assembly.fasta nanopore: sample_2/ont.fastq fast5: sample_2/fast5_pass resfinder: Klebsiella # this tells the pipeline a differente value for only this sample - id: sample_3 nanopore: sample_3/ont.fastq fast5: sample_3/fast5_pass - id: sample_4 pacbio: sample_4/pacbio.fastq illumina: - sample_4/merged_unpaired.fastq - id: sample_5 illumina: - sample_5/1.fastq - sample_5/2.fastq - sample_5/merged.fastq