# Quickstart

Get started with AlleleFlux in minutes. This guide walks through configuration, execution, and output inspection.

## Prerequisites

Ensure you have:

- AlleleFlux installed (see [Installation](installation.md))
- Input files prepared:
  - BAM files (sorted and indexed, with `.bai` index files)
  - Reference FASTA (combined MAG contigs)
  - Prodigal gene predictions (`.fna` nucleotide output)
  - Sample metadata TSV (with `sample_id`, `bam_path`, `group`, `time` columns)
  - MAG mapping file (`contig_name` to `mag_id` assignments)

See [Input Preparation](../usage/input_preparation.md) for detailed file format specifications.

## 1. Initialize Configuration

The easiest way to create a configuration file is with the interactive wizard:

```bash
# Interactive configuration wizard (recommended)
alleleflux init
```

This prompts you for input file paths, analysis type, and parameters, then writes a ready-to-use `alleleflux_config.yml`.

Alternatively, generate a template to edit manually:

```bash
# Print the template to a file
alleleflux init --template > config.yml

# Or copy the template manually
cp $(python -c "import alleleflux; print(alleleflux.__path__[0])")/smk_workflow/config.template.yml config.yml
```

:::{tip}
`alleleflux init` is the recommended starting point for new users. It validates file paths and sets sensible defaults based on your analysis type (single vs. longitudinal).
:::

## 2. Edit Configuration

Open `config.yml` and set your file paths and analysis parameters. A minimal longitudinal configuration:

```yaml
run_name: "my_analysis"

input:
  fasta_path: "/data/combined_mags.fasta"
  prodigal_path: "/data/prodigal_genes.fna"
  metadata_path: "/data/sample_metadata.tsv"
  gtdb_path: "/data/gtdbtk.bac120.summary.tsv"
  mag_mapping_path: "/data/mag_mapping.tsv"

output:
  root_dir: "./alleleflux_output"

analysis:
  data_type: "longitudinal"
  use_significance_tests: true
  use_lmm: true
  use_cmh: true
  timepoints_combinations:
    - timepoint: ["pre", "post"]
      focus: "post"
  groups_combinations:
    - ["treatment", "control"]
```

For a single-timepoint study:

```yaml
analysis:
  data_type: "single"
  timepoints_combinations:
    - timepoint: ["baseline"]
  groups_combinations:
    - ["disease", "healthy"]
```

See [Configuration Reference](../reference/configuration.md) for all options.

## 3. Run the Pipeline

```bash
# Run locally with 16 threads
alleleflux run --config config.yml --threads 16

# Dry run to preview the execution plan
alleleflux run --config config.yml --dry-run

# Run with memory limit
alleleflux run --config config.yml --threads 16 --memory 64G
```

For HPC clusters with SLURM:

```bash
# Copy the bundled SLURM profile
cp -r $(python -c "import alleleflux; print(alleleflux.__path__[0])")/smk_workflow/slurm_profile ./

# Run with SLURM job submission
alleleflux run --config config.yml --profile ./slurm_profile
```

## 4. Examine Output

Results are organized under the output directory:

```text
alleleflux_output/
└── longitudinal/                # Subdirectory matches data_type
    ├── profiles/                # Per-sample allele frequency profiles
    ├── inputMetadata/           # Per-MAG metadata tables
    ├── QC/                      # Quality control metrics per MAG
    ├── eligibility_table_*.tsv  # Which MAGs qualify for each test
    ├── allele_analysis/         # Allele frequency analysis results
    ├── significance_tests/      # Statistical test results
    │   ├── two_sample_paired_*/ #   Paired t-test & Wilcoxon
    │   ├── two_sample_unpaired_*/ # Unpaired t-test & Mann-Whitney
    │   ├── single_sample_*/     #   Single-sample t-test
    │   ├── lmm_*/               #   Linear mixed models
    │   └── cmh_*/               #   CMH tests
    ├── scores/                  # Parallelism & divergence scores
    │   ├── intermediate/        #   Per-MAG raw scores
    │   └── processed/           #   Combined & taxonomic aggregations
    │       ├── combined/        #     MAG-level score tables
    │       └── gene_scores_*/   #     Gene-level score tables
    ├── outlier_genes/           # High-scoring genes (selection targets)
    └── dnds_analysis/           # dN/dS evolutionary rate analysis
```

**Key output files to check first:**

```bash
OUT=alleleflux_output/longitudinal

# 1. Check MAG eligibility
column -t -s $'\t' $OUT/eligibility_table_pre_post-treatment_control.tsv | head

# 2. View MAG-level scores (parallelism and divergence)
column -t -s $'\t' \
  $OUT/scores/processed/combined/MAG/scores_two_sample_paired-pre_post-treatment_control-MAGs.tsv

# 3. Inspect outlier genes for a specific MAG
zcat $OUT/outlier_genes/pre_post-treatment_control/*_outlier_genes.tsv.gz | head

# 4. View dN/dS results
zcat $OUT/dnds_analysis/pre_post-treatment_control/*_gene_summary_ng86.tsv.gz | head
```

See [Output Files Reference](../reference/outputs.md) for complete file format specifications.

## Understanding the Output

After a run completes, here is a recommended order for inspecting results:

1. **Eligibility tables** (`eligibility_table_*.tsv`): Start here. These tables show which MAGs passed quality control and are eligible for each statistical test. If a MAG you expect is missing from downstream results, check its eligibility status and QC metrics first.

2. **Scores** (`scores/processed/combined/`): The MAG-level score tables rank populations by parallelism and divergence. High parallelism scores indicate allele frequency changes occurring consistently across replicates -- a hallmark of selection. Sort by `parallelism_score` descending to find the most interesting MAGs.

3. **Outlier genes** (`outlier_genes/`): These are genes with scores significantly above the genome-wide background, representing candidate targets of selection. Each file contains gene IDs, scores, and functional annotations (if GTDB taxonomy was provided).

4. **Statistical tests** (`significance_tests/`): Per-position p-values from the various tests (paired/unpaired, LMM, CMH). These are primarily consumed by the scoring step but can be inspected directly for specific positions of interest.

5. **dN/dS analysis** (`dnds_analysis/`): Evolutionary rate estimates per gene, useful for distinguishing positive selection (dN/dS > 1) from neutral drift.

## 5. Run Individual Tools

Each analysis step is also available as a standalone command:

```bash
# Create MAG mapping from individual FASTA files
alleleflux-create-mag-mapping \
    --dir mag_fastas/ --extension fa \
    --output-fasta combined.fasta --output-mapping mapping.tsv

# Profile a single BAM file
alleleflux-profile \
    --bam_path sample.bam \
    --fasta_path reference.fa \
    --prodigal_fasta genes.fna \
    --mag_mapping_file mapping.tsv \
    --output_dir profiles/

# Quality control for a single MAG
alleleflux-qc \
    --root_dir profiles/ \
    --fasta reference.fa \
    --mag_mapping_file mapping.tsv \
    --mag_id Bacteroides_001 \
    --output_dir qc/ \
    --breadth_threshold 0.1

# Calculate dN/dS for a MAG
alleleflux-dnds-from-timepoints \
    --input p_value_summary.tsv \
    --output dnds/ \
    --mag_id Bacteroides_001 \
    --profiles_dir profiles/ \
    --prodigal_fasta genes.fna \
    --fasta reference.fa \
    --ancestral_timepoint pre \
    --derived_timepoint post
```

For complete CLI documentation: `alleleflux-<tool> --help` or see [CLI Reference](../reference/cli_reference.md).

## What's Next?

Now that you have run your first analysis, here is a suggested progression through the documentation:

- **[Running the Workflow](../usage/running_workflow.md)** -- Advanced execution options, resuming failed runs, and cluster configuration
- **[Input Preparation](../usage/input_preparation.md)** -- Detailed file format specifications and tips for preparing your data
- **[Interpreting Results](../usage/interpreting_results.md)** -- In-depth guide to understanding scores, p-values, and outlier detection
- **[Visualization Guide](../usage/visualization_guide.md)** -- Plot allele frequency trajectories and generate publication-ready figures
- **[Configuration Reference](../reference/configuration.md)** -- Full documentation of every configuration option
- **[Tutorial](../examples/tutorial.md)** -- End-to-end walkthrough with example data