Interpreting Results¶
This guide explains how to interpret the results produced by AlleleFlux.
Output Structure¶
AlleleFlux organizes results by analysis type:
output/
├── profiles/ # Per-sample allele counts
├── metadata/ # Per-MAG sample metadata
├── QC/ # Quality control results
├── eligibility_table_*.tsv # MAG eligibility for tests
├── allele_analysis/ # Allele frequency analysis
├── significance_tests/ # Statistical test results
│ ├── two_sample_unpaired/
│ ├── two_sample_paired/
│ ├── single_sample/
│ ├── lmm/
│ └── cmh/
├── scores/
│ ├── intermediate/ # Per-MAG scores
│ └── processed/
│ ├── combined/ # Aggregated MAG/taxa scores
│ └── gene_scores/ # Gene-level scores
└── outliers/ # High-scoring outlier genes
Key Files¶
1. Eligibility Table
eligibility_table_{timepoints}-{groups}.tsv - Which MAGs qualify for each test based on coverage/samples
2. Statistical Tests (in significance_tests/)
Per-MAG files with p-values and test statistics:
- {mag}_two_sample_unpaired.tsv.gz - Unpaired group comparisons
- {mag}_lmm.tsv.gz - Linear mixed models
- {mag}_cmh.tsv.gz - Cochran-Mantel-Haenszel tests
Key columns: contig, position, gene_id, p_value_{test}, q_value_{test}
3. Scores (in scores/processed/combined/)
scores_{test}-{tp}-{gr}-MAGs.tsv- MAG-level parallelism/divergence scoresscores_{test}-{tp}-{gr}-{taxon}.tsv- Taxonomic aggregations (phylum to species)
4. Gene Scores (in scores/processed/gene_scores/)
{mag}_{test}_gene_scores_individual.tsv- Per-gene scores{mag}_{test}_outlier_genes.tsv- High-scoring genes under selection
Score Interpretation¶
Parallelism Score (0-100%) Measures consistent allele changes across replicates within a group. High scores → deterministic evolution (not random drift).
Divergence Score (0-100%) Quantifies allele frequency differences between groups. High scores → differential selection between conditions.
CMH Test Detects parallel allele changes across timepoints while controlling for individual variation. Particularly powerful for longitudinal studies.
File Format Details¶
Profile files (profiles/{sample}_{mag}_profiled.tsv.gz):
contig, position, ref_base, total_coverage, A, C, G, T, gene_id
Statistical test results (significance_tests/{test}/{mag}_{test}.tsv.gz):
contig, position, gene_id, p_value_{test}, q_value_{test}
Gene scores (scores/processed/gene_scores/{mag}_{test}_gene_scores_individual.tsv):
gene_id, total_sites, significant_sites, score_%
Outliers (outliers/{mag}_{test}_outlier_genes.tsv):
gene_id, gene_score_%, mag_score_%, p_value_binomial, p_value_poisson
Analysis Workflow¶
Step 1: Check Eligibility
cat eligibility_table_pre_post-treatment_control.tsv
Identify MAGs with sufficient coverage for statistical tests.
Step 2: Examine Scores
# MAG-level scores
head scores_two_sample_unpaired-pre_post-treatment_control-MAGs.tsv
# Taxonomic aggregation (family level)
head scores_two_sample_unpaired-pre_post-treatment_control-family.tsv
Focus on MAGs/taxa with high parallelism or divergence scores.
Step 3: Investigate Genes
# Gene scores for a high-scoring MAG
head MAG123_two_sample_unpaired_gene_scores_individual.tsv
# Outlier genes
head MAG123_two_sample_unpaired_outlier_genes.tsv
Identify candidate genes under strong selection.
Step 4: Compare Tests
Check consistency across statistical approaches (two-sample, LMM, CMH). Genes significant in multiple tests are most robust.
Step 5: Functional Analysis
Annotate outlier genes (KEGG, COG, Pfam)
Check biological relevance to experimental conditions
Consider genomic context (operons, mobile elements)
Troubleshooting¶
No results / empty files
- Check eligibility table: MAGs may not meet min_sample_num or breadth_threshold
- Verify input file paths in configuration
- Check log files in logs/ directory
Low scores across all MAGs
- Insufficient selective pressure or inappropriate timepoints
- Try lowering p_value_threshold (e.g., 0.1 instead of 0.05)
- Check if experimental conditions are strong enough
Inconsistent results between tests - LMM is sensitive to experimental design complexity - Two-sample tests affected by unbalanced groups - CMH best for detecting consistent directional changes - Use multiple tests for robust conclusions
Missing gene IDs
- Ensure Prodigal predictions match reference FASTA contig names
- Verify prodigal_path in configuration
- Check gene FASTA headers match contig naming
For visualization of results, see Visualization Guide.