build_features

build_features turns raw genomic sequences into numbers. it calculates the specific signatures—like amino acid frequencies and density metrics the model trains on to distinguish translation tables.

Arguments

usage: gtranslate build_features
                                 (--genome_dir GENOME_DIR | --batchfile BATCHFILE)
                                 --out_dir OUT_DIR [--cpus CPUS]
                                 [-x EXTENSION] [--force] [-h]

mutually exclusive required arguments

--genome_dir

directory containing genome files in FASTA format

--batchfile

path to file describing genomes - tab separated in 2 columns (FASTA file, genome ID)

required named arguments

--out_dir

directory to output files

optional arguments

--cpus

number of CPUs to use

Default: 1

-x, --extension

extension of files to process, e.g., “fna”, “fasta”, “gz” for gzipped files

Default: 'fna'

--force

continue processing if an error occurs on a single genome

Files output

Example

Input

gtranslate build_features --batchfile 1000genomes/5K_batchfile.tsv --out_dir features_test --cpus 90

Output

[2026-04-09 23:09:40] INFO: gTranslate v0.0.2
[2026-04-09 23:09:40] INFO: gtranslate build_features --batchfile 1000genomes/5K_batchfile.tsv --out_dir features_test --cpus 90
[2026-04-09 23:09:40] INFO: Generating feature vectors for training models.
[2026-04-09 23:10:45] TASK: Running Prodigal V2.6.3 to identify genes.
[2026-04-09 23:36:35] INFO: Completed 5,000 genomes in 25.84 minutes (193.47 genomes/minute).
[2026-04-09 23:36:37] INFO: Done.