Output Files Reference¶
AlleleFlux generates structured outputs organized by analysis type and data type.
Output Directory Structure¶
{root_dir}/
└── {data_type}/ # "single" or "longitudinal"
├── profiles/ # Sample profiles (per MAG)
├── inputMetadata/ # MAG-sample mappings
├── QC/ # Quality control metrics
├── eligibility_table_*.tsv # MAG test eligibility
├── allele_analysis/ # Allele frequencies
├── significance_tests/ # Statistical test results
│ ├── two_sample_unpaired_*/
│ ├── two_sample_paired_*/
│ ├── single_sample_*/
│ ├── lmm_*/
│ └── cmh_*/
├── scores/ # Parallelism & divergence scores
│ ├── intermediate/MAG_scores_*/
│ └── processed/
│ ├── combined/ # MAG-level summaries
│ └── gene_scores_*/ # Gene-level summaries
└── outlier_genes/ # Outlier gene detection
Core Output Files¶
Profile Files¶
Path: profiles/{sample}/{sample}_{mag}_profiled.tsv.gz
Base-level coverage and allele counts per sample-MAG pair.
Column |
Type |
Description |
|---|---|---|
|
str |
Contig identifier |
|
int |
0-based position |
|
str |
Reference base (A/C/G/T/N) |
|
int |
Total read depth |
|
int |
Base counts |
|
str |
Overlapping gene (null if intergenic) |
Quality Control Files¶
Path: QC/QC_{timepoints}-{groups}/{mag}_qc.tsv
Sample-level QC metrics.
Column |
Type |
Description |
|---|---|---|
|
str |
Sample identifier |
|
float |
Fraction of genome covered (0-1) |
|
float |
Average depth |
|
bool |
Whether sample passed QC |
Eligibility Table¶
Path: eligibility_table_{timepoints}-{groups}.tsv
Determines which MAGs qualify for each statistical test.
Column |
Description |
|---|---|
|
MAG identifier |
|
Eligible for unpaired tests and LMM |
|
Eligible for paired tests and CMH |
|
Per-group single-sample eligibility |
Allele Frequency Files¶
Path: allele_analysis/allele_analysis_{timepoints}-{groups}/{mag}_allele_frequency_*.tsv.gz
Position-level allele frequencies across samples.
Column |
Type |
Description |
|---|---|---|
|
str |
Contig identifier |
|
int |
0-based position |
|
str |
Reference base |
|
float |
Allele frequency in sample (0-1) |
|
str |
Most common non-reference allele |
|
int |
Read depth |
|
str |
Overlapping gene |
Statistical Test Results¶
Two-Sample Tests¶
Paths:
significance_tests/two_sample_unpaired_{timepoints}-{groups}/{mag}_*.tsv.gzsignificance_tests/two_sample_paired_{timepoints}-{groups}/{mag}_*.tsv.gz
Column |
Type |
Description |
|---|---|---|
|
str, int |
Genomic location |
|
str |
Overlapping gene |
|
float |
T-test p-value |
|
float |
Mann-Whitney U p-value |
|
float |
Mean allele frequency difference |
|
float |
Effect size |
Single-Sample Test¶
Path: significance_tests/single_sample_{timepoints}-{groups}/{mag}_*.tsv.gz
Tests deviation from reference within each group.
Column |
Description |
|---|---|
|
Mean allele frequency in group |
|
One-sample t-test p-value |
CMH Test¶
Path: significance_tests/cmh_{timepoints}-{groups}/{mag}_*.tsv.gz
Cochran-Mantel-Haenszel test stratified by replicate/timepoint.
Column |
Description |
|---|---|
|
CMH test p-value |
|
|
Stratum columns |
Allele counts per stratum (replicate or timepoint) |
LMM Test¶
Path: significance_tests/lmm_{timepoints}-{groups}/{mag}_*.tsv.gz
Linear mixed-effects model for longitudinal data.
Column |
Description |
|---|---|
|
Fixed-effect p-value |
|
Estimated effect size |
Score Files¶
MAG-Level Scores¶
Path: scores/processed/combined/{timepoints}-{groups}/{test_type}_mag_scores.tsv.gz
Parallelism and divergence scores per MAG.
Column |
Description |
|---|---|
|
MAG identifier |
|
Count of significant positions |
|
Score for within-group consistency (0-1) |
|
Score for between-group difference (0-1) |
|
|
Gene-Level Scores¶
Path: scores/processed/gene_scores_{timepoints}-{groups}/{test_type}_gene_scores.tsv.gz
Scores aggregated by gene.
Column |
Description |
|---|---|
|
Identifiers |
|
Mean parallelism across gene positions |
|
Mean divergence across gene positions |
|
Count of significant sites in gene |
Outlier Gene Files¶
Path: outlier_genes/{timepoints}-{groups}/{test_type}_outlier_genes.tsv.gz
Genes with exceptionally high scores (potential adaptive targets).
Column |
Description |
|---|---|
|
Identifiers |
|
Gene-level parallelism score |
|
|
|
Standard deviations from MAG mean |
dN/dS Analysis Outputs¶
Generated by alleleflux-dnds-from-timepoints (see dN/dS Analysis Guide).
Codon Events: {mag}_codon_events_ng86.tsv.gz – Path-averaged S/NS counts per codon
Gene Summary: {mag}_gene_summary_ng86.tsv.gz – dN/dS ratios per gene
MAG Summary: {mag}_mag_summary_ng86.tsv.gz – Overall MAG dN/dS
Global Summary: {mag}_global_summary_ng86.tsv – Aggregate statistics
Key columns:
dN_dS: dN/dS ratio (>1 = positive selection, <1 = purifying)potential_S,potential_N: Expected synonymous/non-synonymous sitesobserved_S,observed_N: Fractional observed counts (path-averaged)k: Number of positions changed in codon (1, 2, or 3)
File Format Notes¶
Most files are gzip-compressed TSV (
.tsv.gz)Position numbering is 0-based
Missing values:
NaNor empty stringp-values: [0, 1] range; significant sites typically p < 0.05
Allele frequencies: [0, 1] range (proportion of reads)
See also: Interpreting Results, CLI Reference