Here, using the results produced in the quickstart section, we give users a glimpse over the main outputs produced by bacannot. The command used in the quickstart wrote the results under the
Please take note that the pipeline uses the directory set with the
--output parameter as a storage place in which it will create a folder for each sample using its
id. Therefore the the same
--output can be used for different annotations.
After a successful execution, you will have something like this:
# Directory tree from the running dir . ├── _ANNOTATION | └── ecoli_ref.fna │ └── ecoli │ ├── assembly # Assembly files (when raw reads are given) │ ├── annotation # Prokka annotation files │ ├── antiSMASH # antiSMASH secondary annotation files │ ├── circos # circos conf files │ ├── digIS # Insertion sequences predicted with digIS | ├── gbk # Gbk files produced from the resulting GFF | ├── gffs # A copy of the main GFF files produced during the annotation | ├── genomic_islands # Genomic Islands predicted with IslandPath-DIMOB | ├── ICEs # Results from ICEberg database annotation | ├── jbrowse # The files that set up the JBrowse genome browser | ├── KOfamscan # Results from annotation with KEGG database | ├── methylations # Methylated sites predicted with Nanopolish (if fast5 is given) | ├── MLST # MLST results with mlst pipeline | ├── plasmids # Plasmid annotation results from Platon and Plasmidfinder | ├── prophages # Prophage annotation results from PhiSpy, Phigaro and PHAST | ├── refseq_masher # Closest NCBI Resfseq genomes identified with refseq_masher | ├── report_files # Annotation reports in HTML format | ├── resistance # AMR annotation results from ARGminer, AMRFinderPlus, RGI and Resfinder | ├── rRNA # barrnap annotation results | ├── SequenceServerDBs # SequenceServer pre-formatted databases to be used with SequenceServer blast application | ├── SQLdb # The SQLdb of the annotation used by the shiny server for rapid parsing | ├── tools_versioning # Versions of tools and databases used (whenever available) | ├── virulence # Virulence genes annotation results from Victors and VFDB databases | └── run_server.sh # The shiny parser runner that enables a rapid and simple exploration of the results (see below)
KEGG KO annotation heatmap
Click on the image to zoom it! :)
Bacannot automatic reports
Bacannot will use R Markdown to produce automatic annotation reports. To date, the available reports are:
- Report of general annotation features
- Report of Antimicrobial resistance (AMR) genes annotation
- Report of virulence genes annotation
- Report of mobile genetic elements annotation
- Including plasmids, prophages, ICEs and genomic islands.
- Report of user's custom db annotations.
- The quickstart does not produce an example, however, the report is similar to the ICEberg section in the MGE example report.
- See custom-db reference page
- Report of antiSMASH annotation
- The annotation report is provided by the antiSMASH tool
With aid of JBrowse, Bacannot already give users a totally customised and redered Genome Browser for exploration of annotation results.
The JBrowse wrapper in the shiny server is not capable of displaying the GC content and methylation plots when available. It can only display the simpler tracks. If the user wants to visualise and interrogate the GC or methylation tracks it must open the JBrowse outside from the shiny server. For that, two options are available:
- You can navigate to the
jbrowsedirectory under your sample's output folder and simply execute
http-server. This command can be found at: https://www.npmjs.com/package/http-server
- Or, you can download the
JBrowse Desktop app <https://jbrowse.org/docs/jbrowse_desktop.html) and, from inside the app, select the folderjbrowse/data` that is available in your sample's output directory.
In order to provide an integrative solution, the genome browser is already packed inside the shiny app that can be launched with the
run_server.sh script that loads the server docker image (See below at Bacannot shiny parser).
Bacannot shiny parser
The bacannot shiny server is basically a wrapper of the main outputs of the pipeline that is packed up in a docker image called
fmalmeida/bacannot:server. This server is triggered by going under the results folder, in our quickstart case for instance, the
_ANNOTATION/ecoli folder, and executing the command:
In the first page of the shiny app, the main HTML reports and the JBrowse genome browser are indexed as url links for quick opening (See the image below).
Server SQLdb parser
In the second page, the SQL database (SQLdb) produced in the pipeline is used to provide a rapid and simple way to query and filter the genome annotation.
The SQLdb parser contains a set of features that enables users to filter the annotation following their desires. It is possible to filter based on
strand and more.
Additionally, it accepts as input a file of patterns. These patterns are used to filter the annotation based on the values available in the attributes column of the GFF (9th column).
Any value available in this column can be used as filters, the only requirement is to write each pattern in one line, exactly as it is found in the annotation result. For example, it can be used to select only a few genes based on their IDs.
Server BLAST (for intersection) app
In the its third page, the server provides a simple way to BLAST the genome with new queries and to automatically identify intersections between the blast results and the the main annotation.
Server BLAST (SequenceServer) app
In its the last page, the server provides an implementation of SequenceServer which allows users to BLAST their samples and visualise the alignments produced.
The automatic circos plot has been generated with the aid of
easy_circos package. For now it is very minimal but already creates a sketch that allows users to further customize and play with the
circos visualization tool.
- For now, it only contains:
- forward features
- reverse features
- AMRFinderPlus and VFDB annotated genes (as labels)
- PhiSpy annotated prophages
- GC Skew
The pipeline will automatically generate a plot like the following:
The output directory looks like this:
circos/ ├── concatenated_genomes.fasta ├── conf │ ├── bacannot_labels.txt │ ├── circos.conf │ ├── circos.png │ ├── circos.sequences.txt │ ├── circos.svg │ ├── forward_features.txt │ ├── GC_skew.txt │ ├── links_concatenated_colored_no_intrachr.txt │ ├── links_concatenated_colored.txt │ ├── mges.txt │ ├── reverse_features.txt │ ├── rrna.txt │ └── trna.txt └── input.fofn
For now, the pipeline generates a single plot with all the available contigs. However, users can freely play with the data to customize at their heart's content. These are meant to be only a start. The master piece for such is
circos.conf which will allow you to play with your data.
For example, supposing you have a very fragmented assembly and only want to see one contig in your plot. In that case, you should look at lines 7-10 in the config file. By default you will have this:
But, by changing to:
and running the
circos tooling again, you will render a plot with only that contig. So, have fun, and use this as a start to customize your visualizations!
To understand more about
circos configurations please refer to the tooling manual: http://circos.ca/