CLI Reference

AlleleFlux ships a main alleleflux entrypoint plus console scripts that power the Snakemake workflow. Most users only need alleleflux run; the other tools are available for advanced or ad‑hoc use.

Main commands

alleleflux run — execute the workflow

Execute the complete AlleleFlux pipeline with flexible resource control and scheduling options.

alleleflux run --config config.yml [options] [-- <extra snakemake args>]

Arguments

Option

Default

Description

-c, --config

(required)

Path to the AlleleFlux configuration YAML.

-w, --working-dir

.

Working directory for Snakemake execution.

-j, --jobs

None

Max concurrent jobs (local only; ignored when --profile is set).

-t, --threads

None

Total threads available for local runs.

-m, --memory

None

Total memory for local runs (e.g., 64G, 128GB).

-p, --profile

None

Snakemake profile directory for cluster/HPC execution (e.g., slurm_profile/).

-n, --dry-run

False

Plan the DAG without running jobs.

--unlock

False

Unlock a previously crashed working directory.

--snakemake-args

None

Quoted string of extra Snakemake flags (alternative to --).

Examples

# Run with a config file
alleleflux run --config config.yml

# Run with limited resources
alleleflux run --config config.yml --threads 16 --memory 64G

# Dry run to see what would be executed
alleleflux run --config config.yml --dry-run

# Run with SLURM profile
alleleflux run --config config.yml --profile slurm_profile/

# Force rerun all jobs with reasoning
alleleflux run --config config.yml -- --forceall --reason

# Run with specific working directory
alleleflux run --config config.yml --working-dir /path/to/workdir

Notes

  • Pass additional Snakemake flags either after -- or via --snakemake-args.

  • When using --profile, job/thread/memory parameters are overridden by profile settings.

  • See Running the Workflow for detailed scheduling instructions.

alleleflux init — create a config

Create a new AlleleFlux configuration file interactively or from a template.

alleleflux init [--template] [--output alleleflux_config.yml]

Arguments

Argument

Default

Description

--template

False

Print a template config to stdout instead of interactive mode.

--output

alleleflux_config.yml

Output configuration file path.

Examples

# Interactive mode (prompts for settings)
alleleflux init

# Print template to stdout
alleleflux init --template

# Interactive mode with custom output file
alleleflux init --output my_alleleflux_config.yml

# Save template to file
alleleflux init --template > my_template.yml

alleleflux info — show install paths

Display version, package location, and Snakefile paths. Useful for debugging installation issues.

alleleflux info

alleleflux tools — list console scripts

List all available console scripts grouped by functional category.

alleleflux tools [--category {Analysis,Preprocessing,Statistics,Evolution,Accessory,Visualization}]

Arguments

Argument

Default

Description

--category

None

Filter by category (optional). Lists all if not specified.

Examples

# List all tools
alleleflux tools

# List only Analysis tools
alleleflux tools --category Analysis

# List Preprocessing tools
alleleflux tools --category Preprocessing

Console scripts by stage

These are invoked automatically by the workflow but can be run manually for testing or custom tasks. Run any script with --help for full arguments.


Analysis tools

alleleflux-profile — profile BAM files into per-MAG allele tables

Extract base-level coverage and allele information from aligned BAM files.

alleleflux-profile --bam-path BAM --fasta-path FASTA --prodigal-fasta GENES \
  --mag-mapping-file MAPPING --output-dir DIR [options]

Required Arguments

Argument

Description

--bam-path

Path to sorted BAM file.

--fasta-path

Path to reference FASTA file (must match BAM alignment reference).

--prodigal-fasta

Path to Prodigal predicted genes (DNA FASTA format).

--mag-mapping-file

Tab-separated file mapping contigs to MAG IDs (columns: contig_name, mag_id).

--output-dir

Output directory for profiles.

Optional Arguments

Argument

Default

Description

--cpus

All available CPUs

Number of processors to use.

--sampleID

From BAM filename

Sample identifier (auto-extracted if not provided).

--min-base-quality

30

Minimum base quality score to include a base.

--min-mapping-quality

2

Minimum mapping quality score to include a read.

--no-ignore-orphans

False

Include reads without properly paired mate.

--no-ignore-overlaps

False

Do not ignore overlapping read segments (may double-count).

--log-level

INFO

Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL).

Output

Creates {output_dir}/{sampleID}/{sampleID}_{mag_id}_profiled.tsv.gz with columns:

  • contig: Contig identifier

  • position: 0-based genomic position

  • ref_base: Reference base at position

  • total_coverage: Total read coverage

  • A, C, G, T, N: Base counts

  • mapq_scores: MAPQ scores for reads

  • gene_id: Overlapping gene identifier (if any)

Examples

# Basic profiling
alleleflux-profile --bam-path sample1.bam --fasta-path reference.fa \
  --prodigal-fasta genes.fna --mag-mapping-file mag_mapping.tsv \
  --output-dir profiles/

# With custom sample ID and resource limits
alleleflux-profile --bam-path sample1.bam --fasta-path reference.fa \
  --prodigal-fasta genes.fna --mag-mapping-file mag_mapping.tsv \
  --output-dir profiles/ --sampleID my_sample --cpus 8

# Stricter quality filtering
alleleflux-profile --bam-path sample1.bam --fasta-path reference.fa \
  --prodigal-fasta genes.fna --mag-mapping-file mag_mapping.tsv \
  --output-dir profiles/ --min-base-quality 35 --min-mapping-quality 10

alleleflux-allele-freq — compute allele frequencies per MAG

Analyze allele frequencies across samples and timepoints.

alleleflux-allele-freq --mag-id MAG --mag-metadata-file METADATA \
  --fasta FASTA --output-dir DIR [options]

Required Arguments

Argument

Description

--mag-id

MAG identifier to process.

--mag-metadata-file

Path to MAG metadata file (TSV with sample_id, file_path, group, time).

--fasta

Path to reference FASTA file.

--output-dir

Output directory for results.

Optional Arguments

Argument

Default

Description

--breadth-threshold

0.1

Minimum breadth of coverage (0-1).

--data-type

longitudinal

Analysis type: single or longitudinal.

--disable-zero-diff-filtering

False

Keep constant positions (all samples same allele).

Output

Creates {mag_id}_allele_freq.tsv.gz with allele frequency data per position/sample.


alleleflux-scores — calculate parallelism and divergence scores

Derive MAG-level scores from statistical test results.

alleleflux-scores --rootDir DIR --output-dir DIR [options]

Arguments

Argument

Default

Description

--rootDir

Required

Directory containing *_metadata.tsv files.

--output-dir

Required

Output directory.

--cpus

All available

Number of processors.

Examples

# Score all MAGs
alleleflux-scores --rootDir metadata/ --output-dir scores/

# With custom CPU count
alleleflux-scores --rootDir metadata/ --output-dir scores/ --cpus 16

alleleflux-cmh-scores — CMH-specific score aggregation

Calculate CMH test scores for a MAG. See also: alleleflux-cmh for running CMH tests.

alleleflux-cmh-scores --cmh-df INPUT --mag-id MAG --output-dir DIR [options]

Preprocessing tools

alleleflux-metadata — build MAG metadata from profiles

Generate MAG metadata files from sample profiles and sample sheet.

alleleflux-metadata --metadata-file INPUT --profiles-dir DIR \
  --mag-id MAG --output-dir DIR [options]

Arguments

Argument

Description

--metadata-file

Input sample metadata file (CSV/TSV).

--profiles-dir

Directory containing profile files.

--mag-id

MAG ID to process.

--output-dir

Output directory.


alleleflux-qc — quality control on profiles

Perform coverage and breadth QC on MAG profiles.

alleleflux-qc --root-dir PROFILES --mag-id MAG --output-dir DIR [options]

Required Arguments

Argument

Description

--root-dir

Directory containing profile files.

--mag-id

MAG ID to process.

--output-dir

Output directory.

Optional Arguments

Argument

Default

Description

--fasta

None

Path to reference FASTA (optional).

--mag-mapping-file

None

Contig-to-MAG mapping file (optional).

--breadth-threshold

0.1

Minimum breadth of coverage (0-1).

--coverage-threshold

1.0

Minimum average coverage depth.

--data-type

longitudinal

Analysis type: single or longitudinal.

Output

Creates {mag_id}_QC.tsv with QC results including breadth_threshold_passed column.

Examples

# Basic QC
alleleflux-qc --root-dir profiles/ --mag-id MAG000001 --output-dir qc/

# Custom thresholds
alleleflux-qc --root-dir profiles/ --mag-id MAG000001 --output-dir qc/ \
  --breadth-threshold 0.2 --coverage-threshold 5.0

alleleflux-eligibility — generate MAG eligibility tables

Create eligibility tables for statistical tests based on QC results.

alleleflux-eligibility --qc-dir QC_DIR --output-file OUTPUT [options]

Arguments

Argument

Default

Description

--qc-dir

Required

Directory containing QC files.

--output-file

Required

Output eligibility file path.

--min-sample-num

4

Minimum number of samples required.

--data-type

longitudinal

Analysis type: single or longitudinal.

Output

Creates eligibility table with columns:

  • mag_id: MAG identifier

  • unpaired_test_eligible: Eligible for unpaired tests

  • paired_test_eligible: Eligible for paired tests

  • single_sample_eligible_*: Per-group single-sample eligibility


Statistical test tools

alleleflux-cmh — Cochran-Mantel-Haenszel stratified test

Run CMH tests for stratified allele frequency analysis (typically stratified by replicate).

alleleflux-cmh --input-df INPUT --mag-id MAG --output-dir DIR [options]

Required Arguments

Argument

Description

--input-df

Path to input allele frequency dataframe.

--mag-id

MAG ID to process.

--output-dir

Output directory.

Optional Arguments

Argument

Default

Description

--preprocessed-df

None

Path to filtered dataframe for position filtering.

--min-sample-num

4

Minimum number of strata (replicates) required.

--data-type

longitudinal

Analysis mode: single, longitudinal, or across_time.

--group

None

Group name for across_time mode.

--cpus

All available

Number of processors.

Output

Creates {mag_id}_cmh.tsv.gz with columns:

  • mag_id: MAG identifier

  • contig: Contig identifier

  • gene_id: Gene identifier

  • position: 0-based position

  • num_pairs: Number of replicate pairs tested

  • p_value_CMH: CMH test p-value

  • time: Timepoint (for longitudinal data)

  • notes: Error messages or warnings

Examples

# Basic CMH test
alleleflux-cmh --input-df allele_freq.tsv --mag-id MAG000001 --output-dir cmh_results/

# Across timepoints mode
alleleflux-cmh --input-df allele_freq.tsv --mag-id MAG000001 \
  --output-dir cmh_results/ --data-type across_time --group fat

# With preprocessing filter
alleleflux-cmh --input-df allele_freq.tsv --mag-id MAG000001 \
  --output-dir cmh_results/ --preprocessed-df preproc.tsv --cpus 16

alleleflux-lmm — Linear mixed models for longitudinal analysis

Run LMM tests for longitudinal data with mixed effects.

alleleflux-lmm --input-df INPUT --preprocessed-df PREPROCESSED \
  --group GROUP --mag-id MAG --output-dir DIR [options]

Arguments

Argument

Description

--input-df

Path to input allele frequency dataframe.

--preprocessed-df

Path to filtered dataframe.

--group

Group name to analyze.

--mag-id

MAG ID to process.

--output-dir

Output directory.


alleleflux-two-sample-unpaired — unpaired two-sample tests

Perform unpaired Mann-Whitney U tests comparing two groups.

alleleflux-two-sample-unpaired --input-df INPUT --mag-id MAG \
  --output-dir DIR [options]

Arguments

Argument

Description

--input-df

Path to input allele frequency dataframe.

--mag-id

MAG ID to process.

--output-dir

Output directory.


alleleflux-two-sample-paired — paired two-sample tests

Perform paired Wilcoxon signed-rank tests on matched samples.

alleleflux-two-sample-paired --input-df INPUT --mag-id MAG \
  --output-dir DIR [options]

Arguments

Argument

Description

--input-df

Path to input allele frequency dataframe.

--mag-id

MAG ID to process.

--output-dir

Output directory.


Evolution tools

alleleflux-dnds-from-timepoints — calculate dN/dS ratios

Compute dN/dS ratios from significant evolutionary sites.

alleleflux-dnds-from-timepoints --input INPUT --output OUTPUT [options]

Arguments

Argument

Default

Description

--input

Required

Path to input significant sites table.

--output

Required

Output dN/dS results file.

See dN/dS Analysis Guide for detailed workflow.


Accessory tools

alleleflux-create-mag-mapping — generate MAG mapping and combined FASTA

Create contig-to-MAG mapping file and concatenate individual MAG FASTA files.

alleleflux-create-mag-mapping --dir MAG_DIR --extension EXT \
  --output-fasta COMBINED --output-mapping MAPPING [options]

Required Arguments

Argument

Description

--dir

Directory containing individual MAG FASTA files.

--extension

File extension of MAG files (e.g., fa, fasta).

--output-fasta

Path for combined output FASTA.

--output-mapping

Path for contig-to-MAG mapping file (TSV).

Output

  • Combined FASTA: all contigs from all MAGs concatenated

  • Mapping file: contig_name\tmag_id (tab-separated)

Examples

# Create mapping from directory of MAG FASTAs
alleleflux-create-mag-mapping --dir mags/ --extension fa \
  --output-fasta combined_reference.fa --output-mapping mag_mapping.tsv

# With different extension
alleleflux-create-mag-mapping --dir mags/ --extension fasta \
  --output-fasta reference.fasta --output-mapping mapping.tsv

alleleflux-add-bam-path — add BAM file paths to metadata

Fill bam_path column in sample metadata by matching with BAM files.

alleleflux-add-bam-path --metadata INPUT --output OUTPUT \
  --bam-dir DIR [options]

Arguments

Argument

Default

Description

--metadata

Required

Path to input metadata file.

--output

Required

Path to save updated metadata.

--bam-dir

.

Directory containing BAM files.

--bam-extension

.bam

Extension of BAM files.

--drop-missing

False

Drop samples without matching BAM files.


alleleflux-coverage-allele-stats — compute coverage and allele statistics

Calculate coverage and allele statistics summary for all MAGs.

alleleflux-coverage-allele-stats --input-dir DIR --output-file OUTPUT [options]

Arguments

Argument

Default

Description

--input-dir

Required

Directory containing profile files.

--output-file

Required

Output statistics file path.

--cpus

All available

Number of processors.

Output

Summary statistics per MAG: mean coverage, breadth, allele diversity metrics.


alleleflux-list-mags — enumerate MAG IDs

List all unique MAG IDs from a directory of profile files.

alleleflux-list-mags --input-dir DIR [--output-file FILE]

Arguments

Argument

Default

Description

--input-dir

Required

Directory containing MAG profile files.

--output-file

None

Optional output file (prints to stdout if not specified).

--pattern

*

Glob pattern for file matching.


Additional accessory tools

  • alleleflux-positions-qc — Position-level QC filtering

  • alleleflux-copy-profiles — Copy or symlink profile files

  • alleleflux-single-sample — Within-group single-sample test

  • alleleflux-preprocess-between-groups — Position filtering between groups

  • alleleflux-preprocess-within-group — Position filtering within groups

  • alleleflux-preprocessing-eligibility — Aggregate preprocessing status

  • alleleflux-p-value-summary — Summarize p-values across tests

  • alleleflux-outliers — Flag outlier genes

  • alleleflux-taxa-scores — Derive taxa-level scores

  • alleleflux-gene-scores — Derive gene-level scores


Visualization tools

alleleflux-plot-trajectories — plot allele frequency trajectories

Generate allele frequency trajectory visualizations from tracked allele data.

alleleflux-plot-trajectories --input-file FILE [options]

Required Arguments

Argument

Description

--input-file

Long-format frequency table from alleleflux-track-alleles.

Optional Arguments

Argument

Default

Description

--value-col

min_p_value

Column for ranking sites: min_p_value or q_value.

--n-sites-line

10

Number of top sites for line plots (or all).

--n-sites-dist

all

Number of sites for box/violin plots.

--x-col

time

X-axis column: time or day.

--x-order

None

Custom x-axis order (space-separated values).

--plot-types

line

Plot types: line, box, violin (space-separated).

--per-site

False

Generate individual plots per site.

--n-sites-per-site

None

Number of sites for per-site plots.

--output-dir

./plots

Output directory.

--output-format

png

Format: png, pdf, svg.

--group-by-replicate

False

Aggregate trajectories by replicate.

--bin-width

None

Day binning width (requires day column).

--min-samples-per-bin

1

Minimum samples per time bin.

--line-alpha

0.8

Line transparency (0-1).

Output

  • {mag_id}_line_plot.{format}: Combined line trajectories

  • {mag_id}_box_plot.{format}: Box plots by timepoint

  • {mag_id}_violin_plot.{format}: Violin plots by timepoint

  • per_site/{contig}_{position}_{gene}_line.{format}: Per-site plots (if enabled)

Examples

# Basic plotting
alleleflux-plot-trajectories --input-file tracked_alleles.tsv

# Multiple plot types with custom output
alleleflux-plot-trajectories --input-file tracked_alleles.tsv \
  --plot-types line box violin --output-dir results/plots/ \
  --output-format pdf

# Per-site plots for top 5 sites
alleleflux-plot-trajectories --input-file tracked_alleles.tsv \
  --per-site --n-sites-per-site 5 --output-format svg

# With binning and custom axis order
alleleflux-plot-trajectories --input-file tracked_alleles.tsv \
  --bin-width 7 --x-order "baseline week1 week2 week4 week8"

alleleflux-track-alleles — track allele trajectories

Track anchor allele frequencies across all samples and timepoints.

alleleflux-track-alleles --mag-id MAG --anchor-file FILE \
  --metadata META --output-dir DIR [options]

Required Arguments

Argument

Description

--mag-id

MAG identifier to process.

--anchor-file

Path to terminal nucleotides file (from alleleflux-terminal-nucleotide).

--metadata

Enhanced metadata file with sample_profile_dir column.

--output-dir

Output directory.

Optional Arguments

Argument

Default

Description

--anchor-column

terminal_nucleotide_mean_freq

Anchor column to use for tracking.

--min-cov-per-site

0

Minimum coverage required per site.

--cpus

All available

Number of processors.

Output

  • {mag_id}_frequency_table.wide.tsv: Sites × samples matrix

  • {mag_id}_frequency_table.long.tsv: Tidy format (for plotting)


alleleflux-prepare-metadata — prepare metadata for visualization

Standardize and combine metadata tables for visualization workflows.

alleleflux-prepare-metadata --metadata-in INPUT --metadata-out OUTPUT \
  --base-profile-dir DIR [options]

Required Arguments

Argument

Description

--metadata-in

Input metadata table (TSV).

--metadata-out

Output standardized metadata file.

--base-profile-dir

Base directory containing sample profile subdirectories.

Optional Arguments

Argument

Default

Description

--sample-col

sample_id

Column name for sample IDs.

--group-col

group

Column name for experimental groups.

--time-col

time

Column name for timepoints.

--day-col

day

Column name for day/order (optional).

--replicate-col

replicate

Column name for replicates (optional).

--subject-col

subjectID

Column name for subject IDs.

Output

Standardized metadata with columns: sample_id, group, time, subjectID, sample_profile_dir.


alleleflux-terminal-nucleotide — identify terminal nucleotides

Find dominant terminal alleles at significant genomic sites.

alleleflux-terminal-nucleotide --significant-sites SITES \
  --profile-dir DIR --metadata META --group GROUP \
  --timepoint TP --output DIR [options]

Required Arguments

Argument

Description

--significant-sites

Path to significant sites table (from p-value summary).

--profile-dir

Directory containing sample profile subdirectories.

--metadata

Sample metadata file.

--group

Target group name for terminal nucleotide calculation.

--timepoint

Target timepoint (typically endpoint).

--output

Output directory.

Optional Arguments

Argument

Default

Description

--p-value-column

q_value

Significance column: min_p_value or q_value.

--p-value-threshold

0.05

Maximum p-value to include site.

--test-type

two_sample_paired_tTest

Test type to filter sites.

--group-filter

None

Optional additional group filter.

--cpus

All available

Number of processors.

--log-level

INFO

Logging level.

Output

  • {mag_id}/{mag_id}_terminal_nucleotides.tsv: Terminal alleles per site

  • {mag_id}/{mag_id}_frequencies.tsv: Full frequency data

  • terminal_nucleotide_analysis_summary.tsv: Summary across MAGs


Getting help

View detailed help for any tool:

# Main command help
alleleflux --help

# Subcommand help
alleleflux run --help
alleleflux init --help

# Console script help
alleleflux-profile --help
alleleflux-cmh --help
alleleflux-plot-trajectories --help

For configuration details, see Configuration Reference. For how to run the workflow end to end, see Running the Workflow.