A Nextflow pipeline for genome and metagenome annotation. Combines structural annotation (Bakta / Pyrodigal / Prodigal / Prokka), functional annotation (EggNOG-mapper, KofamScan, dbCAN, VFDB), and optional ARG screening (AMRFinderPlus, fARGene, RGI, DeepARG, ABRicate).
nextflow run main.nf -profile standard --mode single --manifest samples.csvThe manifest is a CSV with at minimum two columns:
ID,assembly
sample1,/path/to/sample1.fasta
sample2,/path/to/sample2.fastaThe --mode parameter controls which annotation tool is used and sets sensible defaults
for the rest of the pipeline. There are two named presets and four direct tool selectors.
For finished or high-quality draft isolate genomes.
- Annotator: Pyrodigal (
-p single, fast ORF prediction) - Protein clustering: off
nextflow run main.nf -profile standard --mode single --manifest samples.csvFor metagenome-assembled genomes (MAGs) or any set of assemblies to be treated as a comparative/metagenomic dataset.
- Annotator: Pyrodigal (
-p meta, fast ORF prediction) - Protein clustering: on (MMseqs2
easy-linclustbefore functional annotation)
nextflow run main.nf -profile standard --mode meta --manifest samples.csvPass the tool name as --mode to bypass the presets entirely. Pipeline-level switches
(cluster_proteome) remain at their defaults (off) unless you also set them.
--mode |
Annotator | -p flag |
|---|---|---|
bakta |
Bakta | n/a |
pyrodigal |
Pyrodigal | -p single |
prodigal |
Prodigal | -p single |
prokka |
Prokka | n/a |
# Prokka with ARG screening
nextflow run main.nf -profile standard --mode prokka --manifest samples.csv --arg_annotate
# Pyrodigal with protein clustering
nextflow run main.nf -profile standard --mode pyrodigal --manifest samples.csv --cluster_proteome| Parameter | Default | Description |
|---|---|---|
--manifest |
— | Path to input CSV (required) |
--outdir |
results |
Output directory |
--mode |
single |
Run mode (see above) |
--arg_annotate |
false |
Run ARG screening subworkflow |
--func_annotate |
false |
Run Functional screening subworkflow |
--cluster_proteome |
false |
Cluster proteins with MMseqs2 before functional annotation |
| Parameter | Default path |
|---|---|
--annotation_bakta_db |
/data/pam/software/bakta/v6.0/ |
--arg_abricate_db |
/data/pam/software/abricate/db/ |
--arg_amrfinderplus_db |
/data/pam/software/amrfinder/latest/ |
--eggnog_data_dir |
/data/pam/software/eggnog/v5.0/ |
--dbcan_db |
/data/pam/software/run_dbcan/5.2.5/db/ |
--vfdb_db |
/data/pam/team230/sd28/scratch/secondment_162/dbs/vfdb/VFDB_setB_pro.dmnd |
--kofam_db |
/data/pam/software/kofam_scan/ |
| Parameter | Default | Description |
|---|---|---|
--module_completeness |
0.5 |
KEGG module completeness threshold |
--kofamscan_eval |
0.00001 |
KofamScan e-value cutoff |
--vfdb_identity |
50 |
VFDB minimum identity (%) |
--vfdb_coverage |
80 |
VFDB minimum subject coverage (%) |
--vfdb_e_value |
1e-10 |
VFDB e-value cutoff |
--cazyme_hmm_eval |
default |
dbCAN HMM e-value cutoff |
--cazyme_hmm_cov |
default |
dbCAN HMM minimum coverage |
--mmseqs_min_id |
0.8 |
MMseqs2 minimum sequence identity |
--mmseqs_min_cov |
0.9 |
MMseqs2 minimum coverage |
Tool parameters (translation table, HMM thresholds, per-tool ARG flags etc.)
are defined with sensible defaults in conf/advanced.config. Override them on the CLI as
needed — no need to edit the file directly.
To see a full list of these params use the --helpFull command
nextflow run main.nf --helpFullWhen using a direct tool selector (--mode pyrodigal, --mode prodigal, --mode bakta)
the tools default to single-genome mode. Use the corresponding flag to force meta mode:
--mode |
Override flag | Effect |
|---|---|---|
pyrodigal |
--annotation_pyrodigal_meta |
-p meta |
prodigal |
--annotation_prodigal_meta |
-p meta |
bakta |
--annotation_bakta_meta |
--meta |
nextflow run main.nf -profile standard --mode pyrodigal --manifest samples.csv \
--annotation_pyrodigal_meta
nextflow run main.nf -profile standard --mode bakta --manifest samples.csv \
--annotation_bakta_metaWhen --arg_annotate, all five tools run by default. Skip individual ones:
nextflow run main.nf -profile standard --mode single --manifest samples.csv \
--arg_annotate \
--arg_skip_deeparg \
--arg_skip_fargeneresults/
annotation/
pyrodigal/<sample_id>/ # or bakta/, prodigal/, prokka/
func/
kofamscan/
dbcan/
vfdb/
eggnogmapper/
arg/ # only when --arg_annotate
amrfinderplus/<sample_id>/
abricate/<sample_id>/
rgi/<sample_id>/
deeparg/<sample_id>/
fargene/<sample_id>/
hamronization/
argnorm/
mmseqs/ # only when --cluster_proteome
reports/
hamronization_summarize/