Skip to content

Output files

Here, using the results produced in the quickstart section, we give users a glimpse over the main outputs produced by bacannot. The command used in the quickstart wrote the results under the _ANNOTATION directory.

Note

Please take note that the pipeline uses the directory set with the --output parameter as a storage place in which it will create a folder for each sample using its id. Therefore the the same --output can be used for different annotations.

Directory tree

After a successful execution, you will have something like this:

# Directory tree from the running dir
.
├── _ANNOTATION
|   └── ecoli_ref.fna
   └── ecoli
       ├── assembly          # Assembly files (when raw reads are given)
       ├── annotation        # Prokka annotation files
       ├── antiSMASH         # antiSMASH secondary annotation files
       ├── circos            # circos conf files
       ├── digIS             # Insertion sequences predicted with digIS
|       ├── gbk               # Gbk files produced from the resulting GFF
|       ├── gffs              # A copy of the main GFF files produced during the annotation
|       ├── genomic_islands   # Genomic Islands predicted with IslandPath-DIMOB
|       ├── ICEs              # Results from ICEberg database annotation
|       ├── integron_finder   # Results from Integron Finder tool annotation
|       ├── jbrowse           # The files that set up the JBrowse genome browser
|       ├── KOfamscan         # Results from annotation with KEGG database
|       ├── methylations      # Methylated sites predicted with Nanopolish (if fast5 is given)
|       ├── MLST              # MLST results with mlst pipeline
|       ├── plasmids          # Plasmid annotation results from Platon, Plasmidfinder and MOB Suite
|       ├── prophages         # Prophage annotation results from PhiSpy, Phigaro and PHAST
|       ├── refseq_masher     # Closest NCBI Resfseq genomes identified with refseq_masher
|       ├── report_files      # Annotation reports in HTML format
|       ├── resistance        # AMR annotation results from ARGminer, AMRFinderPlus, RGI and Resfinder
|       ├── rRNA              # barrnap annotation results
|       ├── SequenceServerDBs # SequenceServer pre-formatted databases to be used with SequenceServer blast application
|       ├── SQLdb             # The SQLdb of the annotation used by the shiny server for rapid parsing
|       ├── tools_versioning  # Versions of tools and databases used (whenever available)
|       ├── virulence         # Virulence genes annotation results from Victors and VFDB databases
|       └── run_server.sh     # The shiny parser runner that enables a rapid and simple exploration of the results (see below)

KEGG KO annotation heatmap

Using both KofamScan and KEGGDecoder, bacannot is capable of annotating KOs and plotting a heatmap of detected pathways as exemplified below.

Click on the image to zoom it! :)

Bacannot automatic reports

Bacannot will use R Markdown to produce automatic annotation reports. To date, the available reports are:

Genome Browser

With aid of JBrowse, Bacannot already give users a totally customised and redered Genome Browser for exploration of annotation results.

Warning

The JBrowse wrapper in the shiny server is not capable of displaying the GC content and methylation plots when available. It can only display the simpler tracks. If the user wants to visualise and interrogate the GC or methylation tracks it must open the JBrowse outside from the shiny server. For that, two options are available:

  • You can navigate to the jbrowse directory under your sample's output folder and simply execute http-server. This command can be found at: https://www.npmjs.com/package/http-server
  • Or, you can download the JBrowse Desktop app <https://jbrowse.org/docs/jbrowse_desktop.html) and, from inside the app, select the folderjbrowse/data` that is available in your sample's output directory.

In order to provide an integrative solution, the genome browser is already packed inside the shiny app that can be launched with the run_server.sh script that loads the server docker image (See below at Bacannot shiny parser).

Bacannot shiny parser

The bacannot shiny server is basically a wrapper of the main outputs of the pipeline that is packed up in a docker image called fmalmeida/bacannot:server. This server is triggered by going under the results folder, in our quickstart case for instance, the _ANNOTATION/ecoli folder, and executing the command:

# Trigger the server
./run_server.sh -s

# This will open the pipeline in localhost:3838
# log message:
The server has started in: http://localhost:3838/
When finished, run the command:
        docker rm -f ServerBacannot

# To stop the server you just need to execute
docker rm -f ServerBacannot

Server homepage

In the first page of the shiny app, the main HTML reports and the JBrowse genome browser are indexed as url links for quick opening (See the image below).

Server SQLdb parser

In the second page, the SQL database (SQLdb) produced in the pipeline is used to provide a rapid and simple way to query and filter the genome annotation.

Note

The SQLdb parser contains a set of features that enables users to filter the annotation following their desires. It is possible to filter based on contigs, sources, start, end, strand and more.

Additionally, it accepts as input a file of patterns. These patterns are used to filter the annotation based on the values available in the attributes column of the GFF (9th column).

Any value available in this column can be used as filters, the only requirement is to write each pattern in one line, exactly as it is found in the annotation result. For example, it can be used to select only a few genes based on their IDs.

Server BLAST (for intersection) app

In the its third page, the server provides a simple way to BLAST the genome with new queries and to automatically identify intersections between the blast results and the the main annotation.

Server BLAST (SequenceServer) app

In its the last page, the server provides an implementation of SequenceServer which allows users to BLAST their samples and visualise the alignments produced.

Circos plot

The automatic circos plot has been generated with the aid of easy_circos package. For now it is very minimal but already creates a sketch that allows users to further customize and play with the circos visualization tool.

  • For now, it only contains:
    • forward features
    • reverse features
    • rRNA
    • tRNA
    • AMRFinderPlus and VFDB annotated genes (as labels)
    • PhiSpy annotated prophages
    • GC Skew

The pipeline will automatically generate a plot like the following:

Configuration files

The output directory looks like this:

circos/
├── concatenated_genomes.fasta
├── conf
   ├── bacannot_labels.txt
   ├── circos.conf
   ├── circos.png
   ├── circos.sequences.txt
   ├── circos.svg
   ├── forward_features.txt
   ├── GC_skew.txt
   ├── links_concatenated_colored_no_intrachr.txt
   ├── links_concatenated_colored.txt
   ├── mges.txt
   ├── reverse_features.txt
   ├── rrna.txt
   └── trna.txt
└── input.fofn

For now, the pipeline generates a single plot with all the available contigs. However, users can freely play with the data to customize at their heart's content. These are meant to be only a start. The master piece for such is circos.conf which will allow you to play with your data.

For example, supposing you have a very fragmented assembly and only want to see one contig in your plot. In that case, you should look at lines 7-10 in the config file. By default you will have this:

# Show all chromosomes in karyotype file. By default, this is
# true. If you want to explicitly specify which chromosomes
# to draw, set this to 'no' and use the 'chromosomes' parameter.
chromosomes_display_default = yes

But, by changing to:

# Show all chromosomes in karyotype file. By default, this is
# true. If you want to explicitly specify which chromosomes
# to draw, set this to 'no' and use the 'chromosomes' parameter.
chromosomes_display_default = no
chromosomes = contig_1

and running the circos tooling again, you will render a plot with only that contig. So, have fun, and use this as a start to customize your visualizations!

Note

To understand more about circos configurations please refer to the tooling manual: http://circos.ca/