build_features
build_features turns raw genomic sequences into numbers.
it calculates the specific signatures—like amino acid frequencies and density metrics the model trains on to distinguish translation tables.
Arguments
usage: gtranslate build_features
(--genome_dir GENOME_DIR | --batchfile BATCHFILE)
--out_dir OUT_DIR [--cpus CPUS]
[-x EXTENSION] [--force] [-h]
mutually exclusive required arguments
- --genome_dir
directory containing genome files in FASTA format
- --batchfile
path to file describing genomes - tab separated in 2 columns (FASTA file, genome ID)
required named arguments
- --out_dir
directory to output files
optional arguments
- --cpus
number of CPUs to use
Default:
1- -x, --extension
extension of files to process, e.g., “fna”, “fasta”, “gz” for gzipped files
Default:
'fna'- --force
continue processing if an error occurs on a single genome
Files output
Example
Input
gtranslate build_features --batchfile 1000genomes/5K_batchfile.tsv --out_dir features_test --cpus 90
Output
[2026-04-09 23:09:40] INFO: gTranslate v0.0.2
[2026-04-09 23:09:40] INFO: gtranslate build_features --batchfile 1000genomes/5K_batchfile.tsv --out_dir features_test --cpus 90
[2026-04-09 23:09:40] INFO: Generating feature vectors for training models.
[2026-04-09 23:10:45] TASK: Running Prodigal V2.6.3 to identify genes.
[2026-04-09 23:36:35] INFO: Completed 5,000 genomes in 25.84 minutes (193.47 genomes/minute).
[2026-04-09 23:36:37] INFO: Done.