Input Preparation¶
AlleleFlux requires several input files. This guide covers preparation and formatting.
Required Files¶
File |
Format |
Description |
|---|---|---|
BAM files |
|
Sorted and indexed alignments of metagenomic reads to MAGs |
Reference FASTA |
|
Combined MAG contigs. Header format: |
Prodigal genes |
|
Nucleotide ORF predictions matching reference contig IDs |
Metadata TSV |
|
Sample information with |
MAG mapping |
|
Contig → MAG assignments ( |
GTDB taxonomy |
|
|
Metadata File Format¶
Longitudinal Study:
sample_id bam_path subjectID group replicate time
S1 /data/S1.sorted.bam mouse1 control A pre
S2 /data/S2.sorted.bam mouse2 control B pre
S3 /data/S3.sorted.bam mouse3 treatment A pre
S4 /data/S4.sorted.bam mouse4 treatment B pre
S5 /data/S5.sorted.bam mouse1 control A post
S6 /data/S6.sorted.bam mouse2 control B post
S7 /data/S7.sorted.bam mouse3 treatment A post
S8 /data/S8.sorted.bam mouse4 treatment B post
Single Timepoint:
sample_id bam_path subjectID group replicate
S1 /data/S1.sorted.bam subject1 disease A
S2 /data/S2.sorted.bam subject2 disease B
S3 /data/S3.sorted.bam subject3 healthy A
S4 /data/S4.sorted.bam subject4 healthy B
Minimal Configuration¶
Create config.yml with paths to your files:
data_type: "longitudinal" # or "single"
input:
fasta_path: "reference.fa"
prodigal_path: "genes.fna"
metadata_path: "metadata.tsv"
mag_mapping_path: "mag_mapping.tsv"
gtdb_path: "gtdbtk.tsv" # optional
output:
root_dir: "output/"
analysis:
timepoints_combinations:
- timepoint: ["pre", "post"]
focus: "post"
groups_combinations:
- ["treatment", "control"]
use_lmm: true
use_significance_tests: true
use_cmh: true
See Configuration Reference for all options.
Preparation Utilities¶
Create MAG mapping (combines individual MAG FASTAs):
alleleflux-create-mag-mapping --dir mag_fastas/ --extension fa \
--output-fasta combined.fasta --output-mapping mapping.tsv
Add BAM paths to existing metadata:
alleleflux-add-bam-path --metadata metadata.tsv \
--bam-dir bamfiles/ --output metadata_with_bam.tsv
Generate Prodigal predictions:
prodigal -i combined.fasta -d genes.fna -a genes.faa -p meta
For detailed options: alleleflux-create-mag-mapping --help
Next Steps¶
Once inputs are prepared:
Create configuration file: Configuration Reference
Run the pipeline: Running the Workflow
Examine example data: Example Data