# CLI Reference AlleleFlux ships a main `alleleflux` entrypoint plus console scripts that power the Snakemake workflow. Most users only need `alleleflux run`; the other tools are available for advanced or ad‑hoc use. ## Main commands ### `alleleflux run` — execute the workflow Execute the complete AlleleFlux pipeline with flexible resource control and scheduling options. ```bash alleleflux run --config config.yml [options] [-- ] ``` #### Arguments | Option | Default | Description | |--------|---------|-------------| | `-c, --config` | (required) | Path to the AlleleFlux configuration YAML. | | `-w, --working-dir` | `.` | Working directory for Snakemake execution. | | `-j, --jobs` | None | Max concurrent jobs (local only; ignored when `--profile` is set). | | `-t, --threads` | None | Total threads available for local runs. | | `-m, --memory` | None | Total memory for local runs (e.g., `64G`, `128GB`). | | `-p, --profile` | None | Snakemake profile directory for cluster/HPC execution (e.g., `slurm_profile/`). | | `-n, --dry-run` | False | Plan the DAG without running jobs. | | `--unlock` | False | Unlock a previously crashed working directory. | | `--snakemake-args` | None | Quoted string of extra Snakemake flags (alternative to `--`). | #### Examples ```bash # Run with a config file alleleflux run --config config.yml # Run with limited resources alleleflux run --config config.yml --threads 16 --memory 64G # Dry run to see what would be executed alleleflux run --config config.yml --dry-run # Run with SLURM profile alleleflux run --config config.yml --profile slurm_profile/ # Force rerun all jobs with reasoning alleleflux run --config config.yml -- --forceall --reason # Run with specific working directory alleleflux run --config config.yml --working-dir /path/to/workdir ``` #### Notes - Pass additional Snakemake flags either after `--` or via `--snakemake-args`. - When using `--profile`, job/thread/memory parameters are overridden by profile settings. - See [Running the Workflow](../usage/running_workflow.md) for detailed scheduling instructions. ### `alleleflux init` — create a config Create a new AlleleFlux configuration file interactively or from a template. ```bash alleleflux init [--template] [--output alleleflux_config.yml] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--template` | False | Print a template config to stdout instead of interactive mode. | | `--output` | `alleleflux_config.yml` | Output configuration file path. | #### Examples ```bash # Interactive mode (prompts for settings) alleleflux init # Print template to stdout alleleflux init --template # Interactive mode with custom output file alleleflux init --output my_alleleflux_config.yml # Save template to file alleleflux init --template > my_template.yml ``` ### `alleleflux info` — show install paths Display version, package location, and Snakefile paths. Useful for debugging installation issues. ```bash alleleflux info ``` ### `alleleflux tools` — list console scripts List all available console scripts grouped by functional category. ```bash alleleflux tools [--category {Analysis,Preprocessing,Statistics,Evolution,Accessory,Visualization}] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--category` | None | Filter by category (optional). Lists all if not specified. | #### Examples ```bash # List all tools alleleflux tools # List only Analysis tools alleleflux tools --category Analysis # List Preprocessing tools alleleflux tools --category Preprocessing ``` ## Console scripts by stage These are invoked automatically by the workflow but can be run manually for testing or custom tasks. Run any script with `--help` for full arguments. --- ## Analysis tools ### `alleleflux-profile` — profile BAM files into per-MAG allele tables Extract base-level coverage and allele information from aligned BAM files. ```bash alleleflux-profile --bam-path BAM --fasta-path FASTA --prodigal-fasta GENES \ --mag-mapping-file MAPPING --output-dir DIR [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--bam-path` | Path to sorted BAM file. | | `--fasta-path` | Path to reference FASTA file (must match BAM alignment reference). | | `--prodigal-fasta` | Path to Prodigal predicted genes (DNA FASTA format). | | `--mag-mapping-file` | Tab-separated file mapping contigs to MAG IDs (columns: `contig_name`, `mag_id`). | | `--output-dir` | Output directory for profiles. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--cpus` | All available CPUs | Number of processors to use. | | `--sampleID` | From BAM filename | Sample identifier (auto-extracted if not provided). | | `--min-base-quality` | 30 | Minimum base quality score to include a base. | | `--min-mapping-quality` | 2 | Minimum mapping quality score to include a read. | | `--no-ignore-orphans` | False | Include reads without properly paired mate. | | `--no-ignore-overlaps` | False | Do not ignore overlapping read segments (may double-count). | | `--log-level` | INFO | Logging level (DEBUG, INFO, WARNING, ERROR, CRITICAL). | #### Output Creates `{output_dir}/{sampleID}/{sampleID}_{mag_id}_profiled.tsv.gz` with columns: - `contig`: Contig identifier - `position`: 0-based genomic position - `ref_base`: Reference base at position - `total_coverage`: Total read coverage - `A`, `C`, `G`, `T`, `N`: Base counts - `mapq_scores`: MAPQ scores for reads - `gene_id`: Overlapping gene identifier (if any) #### Examples ```bash # Basic profiling alleleflux-profile --bam-path sample1.bam --fasta-path reference.fa \ --prodigal-fasta genes.fna --mag-mapping-file mag_mapping.tsv \ --output-dir profiles/ # With custom sample ID and resource limits alleleflux-profile --bam-path sample1.bam --fasta-path reference.fa \ --prodigal-fasta genes.fna --mag-mapping-file mag_mapping.tsv \ --output-dir profiles/ --sampleID my_sample --cpus 8 # Stricter quality filtering alleleflux-profile --bam-path sample1.bam --fasta-path reference.fa \ --prodigal-fasta genes.fna --mag-mapping-file mag_mapping.tsv \ --output-dir profiles/ --min-base-quality 35 --min-mapping-quality 10 ``` --- ### `alleleflux-allele-freq` — compute allele frequencies per MAG Analyze allele frequencies across samples and timepoints. ```bash alleleflux-allele-freq --mag-id MAG --mag-metadata-file METADATA \ --fasta FASTA --output-dir DIR [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--mag-id` | MAG identifier to process. | | `--mag-metadata-file` | Path to MAG metadata file (TSV with sample_id, file_path, group, time). | | `--fasta` | Path to reference FASTA file. | | `--output-dir` | Output directory for results. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--breadth-threshold` | 0.1 | Minimum breadth of coverage (0-1). | | `--data-type` | `longitudinal` | Analysis type: `single` or `longitudinal`. | | `--disable-zero-diff-filtering` | False | Keep constant positions (all samples same allele). | #### Output Creates `{mag_id}_allele_freq.tsv.gz` with allele frequency data per position/sample. --- ### `alleleflux-scores` — calculate parallelism and divergence scores Derive MAG-level scores from statistical test results. ```bash alleleflux-scores --rootDir DIR --output-dir DIR [options] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--rootDir` | Required | Directory containing `*_metadata.tsv` files. | | `--output-dir` | Required | Output directory. | | `--cpus` | All available | Number of processors. | #### Examples ```bash # Score all MAGs alleleflux-scores --rootDir metadata/ --output-dir scores/ # With custom CPU count alleleflux-scores --rootDir metadata/ --output-dir scores/ --cpus 16 ``` --- ### `alleleflux-cmh-scores` — CMH-specific score aggregation Calculate CMH test scores for a MAG. See also: `alleleflux-cmh` for running CMH tests. ```bash alleleflux-cmh-scores --cmh-df INPUT --mag-id MAG --output-dir DIR [options] ``` --- ## Preprocessing tools ### `alleleflux-metadata` — build MAG metadata from profiles Generate MAG metadata files from sample profiles and sample sheet. ```bash alleleflux-metadata --metadata-file INPUT --profiles-dir DIR \ --mag-id MAG --output-dir DIR [options] ``` #### Arguments | Argument | Description | |----------|-------------| | `--metadata-file` | Input sample metadata file (CSV/TSV). | | `--profiles-dir` | Directory containing profile files. | | `--mag-id` | MAG ID to process. | | `--output-dir` | Output directory. | --- ### `alleleflux-qc` — quality control on profiles Perform coverage and breadth QC on MAG profiles. ```bash alleleflux-qc --root-dir PROFILES --mag-id MAG --output-dir DIR [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--root-dir` | Directory containing profile files. | | `--mag-id` | MAG ID to process. | | `--output-dir` | Output directory. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--fasta` | None | Path to reference FASTA (optional). | | `--mag-mapping-file` | None | Contig-to-MAG mapping file (optional). | | `--breadth-threshold` | 0.1 | Minimum breadth of coverage (0-1). | | `--coverage-threshold` | 1.0 | Minimum average coverage depth. | | `--data-type` | `longitudinal` | Analysis type: `single` or `longitudinal`. | #### Output Creates `{mag_id}_QC.tsv` with QC results including `breadth_threshold_passed` column. #### Examples ```bash # Basic QC alleleflux-qc --root-dir profiles/ --mag-id MAG000001 --output-dir qc/ # Custom thresholds alleleflux-qc --root-dir profiles/ --mag-id MAG000001 --output-dir qc/ \ --breadth-threshold 0.2 --coverage-threshold 5.0 ``` --- ### `alleleflux-eligibility` — generate MAG eligibility tables Create eligibility tables for statistical tests based on QC results. ```bash alleleflux-eligibility --qc-dir QC_DIR --output-file OUTPUT [options] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--qc-dir` | Required | Directory containing QC files. | | `--output-file` | Required | Output eligibility file path. | | `--min-sample-num` | 4 | Minimum number of samples required. | | `--data-type` | `longitudinal` | Analysis type: `single` or `longitudinal`. | #### Output Creates eligibility table with columns: - `mag_id`: MAG identifier - `unpaired_test_eligible`: Eligible for unpaired tests - `paired_test_eligible`: Eligible for paired tests - `single_sample_eligible_*`: Per-group single-sample eligibility --- ## Statistical test tools ### `alleleflux-cmh` — Cochran-Mantel-Haenszel stratified test Run CMH tests for stratified allele frequency analysis (typically stratified by replicate). ```bash alleleflux-cmh --input-df INPUT --mag-id MAG --output-dir DIR [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--input-df` | Path to input allele frequency dataframe. | | `--mag-id` | MAG ID to process. | | `--output-dir` | Output directory. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--preprocessed-df` | None | Path to filtered dataframe for position filtering. | | `--min-sample-num` | 4 | Minimum number of strata (replicates) required. | | `--data-type` | `longitudinal` | Analysis mode: `single`, `longitudinal`, or `across_time`. | | `--group` | None | Group name for `across_time` mode. | | `--cpus` | All available | Number of processors. | #### Output Creates `{mag_id}_cmh.tsv.gz` with columns: - `mag_id`: MAG identifier - `contig`: Contig identifier - `gene_id`: Gene identifier - `position`: 0-based position - `num_pairs`: Number of replicate pairs tested - `p_value_CMH`: CMH test p-value - `time`: Timepoint (for longitudinal data) - `notes`: Error messages or warnings #### Examples ```bash # Basic CMH test alleleflux-cmh --input-df allele_freq.tsv --mag-id MAG000001 --output-dir cmh_results/ # Across timepoints mode alleleflux-cmh --input-df allele_freq.tsv --mag-id MAG000001 \ --output-dir cmh_results/ --data-type across_time --group fat # With preprocessing filter alleleflux-cmh --input-df allele_freq.tsv --mag-id MAG000001 \ --output-dir cmh_results/ --preprocessed-df preproc.tsv --cpus 16 ``` --- ### `alleleflux-lmm` — Linear mixed models for longitudinal analysis Run LMM tests for longitudinal data with mixed effects. ```bash alleleflux-lmm --input-df INPUT --preprocessed-df PREPROCESSED \ --group GROUP --mag-id MAG --output-dir DIR [options] ``` #### Arguments | Argument | Description | |----------|-------------| | `--input-df` | Path to input allele frequency dataframe. | | `--preprocessed-df` | Path to filtered dataframe. | | `--group` | Group name to analyze. | | `--mag-id` | MAG ID to process. | | `--output-dir` | Output directory. | --- ### `alleleflux-two-sample-unpaired` — unpaired two-sample tests Perform unpaired Mann-Whitney U tests comparing two groups. ```bash alleleflux-two-sample-unpaired --input-df INPUT --mag-id MAG \ --output-dir DIR [options] ``` #### Arguments | Argument | Description | |----------|-------------| | `--input-df` | Path to input allele frequency dataframe. | | `--mag-id` | MAG ID to process. | | `--output-dir` | Output directory. | --- ### `alleleflux-two-sample-paired` — paired two-sample tests Perform paired Wilcoxon signed-rank tests on matched samples. ```bash alleleflux-two-sample-paired --input-df INPUT --mag-id MAG \ --output-dir DIR [options] ``` #### Arguments | Argument | Description | |----------|-------------| | `--input-df` | Path to input allele frequency dataframe. | | `--mag-id` | MAG ID to process. | | `--output-dir` | Output directory. | --- ## Evolution tools ### `alleleflux-dnds-from-timepoints` — calculate dN/dS ratios Compute dN/dS ratios from significant evolutionary sites. ```bash alleleflux-dnds-from-timepoints --input INPUT --output OUTPUT [options] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--input` | Required | Path to input significant sites table. | | `--output` | Required | Output dN/dS results file. | See [dN/dS Analysis Guide](../usage/dnds_analysis.md) for detailed workflow. --- ## Accessory tools ### `alleleflux-create-mag-mapping` — generate MAG mapping and combined FASTA Create contig-to-MAG mapping file and concatenate individual MAG FASTA files. ```bash alleleflux-create-mag-mapping --dir MAG_DIR --extension EXT \ --output-fasta COMBINED --output-mapping MAPPING [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--dir` | Directory containing individual MAG FASTA files. | | `--extension` | File extension of MAG files (e.g., `fa`, `fasta`). | | `--output-fasta` | Path for combined output FASTA. | | `--output-mapping` | Path for contig-to-MAG mapping file (TSV). | #### Output - Combined FASTA: all contigs from all MAGs concatenated - Mapping file: `contig_name\tmag_id` (tab-separated) #### Examples ```bash # Create mapping from directory of MAG FASTAs alleleflux-create-mag-mapping --dir mags/ --extension fa \ --output-fasta combined_reference.fa --output-mapping mag_mapping.tsv # With different extension alleleflux-create-mag-mapping --dir mags/ --extension fasta \ --output-fasta reference.fasta --output-mapping mapping.tsv ``` --- ### `alleleflux-add-bam-path` — add BAM file paths to metadata Fill `bam_path` column in sample metadata by matching with BAM files. ```bash alleleflux-add-bam-path --metadata INPUT --output OUTPUT \ --bam-dir DIR [options] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--metadata` | Required | Path to input metadata file. | | `--output` | Required | Path to save updated metadata. | | `--bam-dir` | `.` | Directory containing BAM files. | | `--bam-extension` | `.bam` | Extension of BAM files. | | `--drop-missing` | False | Drop samples without matching BAM files. | --- ### `alleleflux-coverage-allele-stats` — compute coverage and allele statistics Calculate coverage and allele statistics summary for all MAGs. ```bash alleleflux-coverage-allele-stats --input-dir DIR --output-file OUTPUT [options] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--input-dir` | Required | Directory containing profile files. | | `--output-file` | Required | Output statistics file path. | | `--cpus` | All available | Number of processors. | #### Output Summary statistics per MAG: mean coverage, breadth, allele diversity metrics. --- ### `alleleflux-list-mags` — enumerate MAG IDs List all unique MAG IDs from a directory of profile files. ```bash alleleflux-list-mags --input-dir DIR [--output-file FILE] ``` #### Arguments | Argument | Default | Description | |----------|---------|-------------| | `--input-dir` | Required | Directory containing MAG profile files. | | `--output-file` | None | Optional output file (prints to stdout if not specified). | | `--pattern` | `*` | Glob pattern for file matching. | --- ### Additional accessory tools - `alleleflux-positions-qc` — Position-level QC filtering - `alleleflux-copy-profiles` — Copy or symlink profile files - `alleleflux-single-sample` — Within-group single-sample test - `alleleflux-preprocess-between-groups` — Position filtering between groups - `alleleflux-preprocess-within-group` — Position filtering within groups - `alleleflux-preprocessing-eligibility` — Aggregate preprocessing status - `alleleflux-p-value-summary` — Summarize p-values across tests - `alleleflux-outliers` — Flag outlier genes - `alleleflux-taxa-scores` — Derive taxa-level scores - `alleleflux-gene-scores` — Derive gene-level scores --- ## Visualization tools ### `alleleflux-plot-trajectories` — plot allele frequency trajectories Generate allele frequency trajectory visualizations from tracked allele data. ```bash alleleflux-plot-trajectories --input-file FILE [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--input-file` | Long-format frequency table from `alleleflux-track-alleles`. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--value-col` | `min_p_value` | Column for ranking sites: `min_p_value` or `q_value`. | | `--n-sites-line` | 10 | Number of top sites for line plots (or `all`). | | `--n-sites-dist` | `all` | Number of sites for box/violin plots. | | `--x-col` | `time` | X-axis column: `time` or `day`. | | `--x-order` | None | Custom x-axis order (space-separated values). | | `--plot-types` | `line` | Plot types: `line`, `box`, `violin` (space-separated). | | `--per-site` | False | Generate individual plots per site. | | `--n-sites-per-site` | None | Number of sites for per-site plots. | | `--output-dir` | `./plots` | Output directory. | | `--output-format` | `png` | Format: `png`, `pdf`, `svg`. | | `--group-by-replicate` | False | Aggregate trajectories by replicate. | | `--bin-width` | None | Day binning width (requires `day` column). | | `--min-samples-per-bin` | 1 | Minimum samples per time bin. | | `--line-alpha` | 0.8 | Line transparency (0-1). | #### Output - `{mag_id}_line_plot.{format}`: Combined line trajectories - `{mag_id}_box_plot.{format}`: Box plots by timepoint - `{mag_id}_violin_plot.{format}`: Violin plots by timepoint - `per_site/{contig}_{position}_{gene}_line.{format}`: Per-site plots (if enabled) #### Examples ```bash # Basic plotting alleleflux-plot-trajectories --input-file tracked_alleles.tsv # Multiple plot types with custom output alleleflux-plot-trajectories --input-file tracked_alleles.tsv \ --plot-types line box violin --output-dir results/plots/ \ --output-format pdf # Per-site plots for top 5 sites alleleflux-plot-trajectories --input-file tracked_alleles.tsv \ --per-site --n-sites-per-site 5 --output-format svg # With binning and custom axis order alleleflux-plot-trajectories --input-file tracked_alleles.tsv \ --bin-width 7 --x-order "baseline week1 week2 week4 week8" ``` --- ### `alleleflux-track-alleles` — track allele trajectories Track anchor allele frequencies across all samples and timepoints. ```bash alleleflux-track-alleles --mag-id MAG --anchor-file FILE \ --metadata META --output-dir DIR [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--mag-id` | MAG identifier to process. | | `--anchor-file` | Path to terminal nucleotides file (from `alleleflux-terminal-nucleotide`). | | `--metadata` | Enhanced metadata file with `sample_profile_dir` column. | | `--output-dir` | Output directory. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--anchor-column` | `terminal_nucleotide_mean_freq` | Anchor column to use for tracking. | | `--min-cov-per-site` | 0 | Minimum coverage required per site. | | `--cpus` | All available | Number of processors. | #### Output - `{mag_id}_frequency_table.wide.tsv`: Sites × samples matrix - `{mag_id}_frequency_table.long.tsv`: Tidy format (for plotting) --- ### `alleleflux-prepare-metadata` — prepare metadata for visualization Standardize and combine metadata tables for visualization workflows. ```bash alleleflux-prepare-metadata --metadata-in INPUT --metadata-out OUTPUT \ --base-profile-dir DIR [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--metadata-in` | Input metadata table (TSV). | | `--metadata-out` | Output standardized metadata file. | | `--base-profile-dir` | Base directory containing sample profile subdirectories. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--sample-col` | `sample_id` | Column name for sample IDs. | | `--group-col` | `group` | Column name for experimental groups. | | `--time-col` | `time` | Column name for timepoints. | | `--day-col` | `day` | Column name for day/order (optional). | | `--replicate-col` | `replicate` | Column name for replicates (optional). | | `--subject-col` | `subjectID` | Column name for subject IDs. | #### Output Standardized metadata with columns: `sample_id`, `group`, `time`, `subjectID`, `sample_profile_dir`. --- ### `alleleflux-terminal-nucleotide` — identify terminal nucleotides Find dominant terminal alleles at significant genomic sites. ```bash alleleflux-terminal-nucleotide --significant-sites SITES \ --profile-dir DIR --metadata META --group GROUP \ --timepoint TP --output DIR [options] ``` #### Required Arguments | Argument | Description | |----------|-------------| | `--significant-sites` | Path to significant sites table (from p-value summary). | | `--profile-dir` | Directory containing sample profile subdirectories. | | `--metadata` | Sample metadata file. | | `--group` | Target group name for terminal nucleotide calculation. | | `--timepoint` | Target timepoint (typically endpoint). | | `--output` | Output directory. | #### Optional Arguments | Argument | Default | Description | |----------|---------|-------------| | `--p-value-column` | `q_value` | Significance column: `min_p_value` or `q_value`. | | `--p-value-threshold` | 0.05 | Maximum p-value to include site. | | `--test-type` | `two_sample_paired_tTest` | Test type to filter sites. | | `--group-filter` | None | Optional additional group filter. | | `--cpus` | All available | Number of processors. | | `--log-level` | INFO | Logging level. | #### Output - `{mag_id}/{mag_id}_terminal_nucleotides.tsv`: Terminal alleles per site - `{mag_id}/{mag_id}_frequencies.tsv`: Full frequency data - `terminal_nucleotide_analysis_summary.tsv`: Summary across MAGs --- ## Getting help View detailed help for any tool: ```bash # Main command help alleleflux --help # Subcommand help alleleflux run --help alleleflux init --help # Console script help alleleflux-profile --help alleleflux-cmh --help alleleflux-plot-trajectories --help ``` For configuration details, see [Configuration Reference](configuration.md). For how to run the workflow end to end, see [Running the Workflow](../usage/running_workflow.md).