ground_truth

Before training, you need to know the “right” answers. This command looks at the taxonomy of your genomes to assign them their confirmed genetic codes. This creates the labeled dataset that the model uses to learn.

Arguments

usage: gtranslate ground_truth --taxonomy_file TAXONOMY_FILE --output_file
                               OUTPUT_FILE [--manual_gt_file MANUAL_GT_FILE]
                               [-h]

required named arguments

--taxonomy_file

File indicating taxonomic classification of each genome.

--output_file

path to output file

optional arguments

--manual_gt_file

File indicating manually specific ground truth for select genomes.

Files output

Example

Input

gtranslate ground_truth --taxonomy_file taxonomy_file_r226.tsv.gz --output_file ground_truth_results.tsv

Output

[2026-05-01 01:50:19] INFO: gTranslate v0.0.2
[2026-05-01 01:50:19] INFO: gtranslate ground_truth --taxonomy_file taxonomy_file_r226.tsv.gz --output_file ground_truth_results.tsv
[2026-05-01 01:50:19] INFO: Selecting Ground Truth translation tables based on taxonomic classification.
[2026-05-01 01:50:19] INFO: Determining ground truth for genomes:
[2026-05-01 01:50:20] INFO:  - determined ground truth for 116,508 genomes
[2026-05-01 01:50:20] INFO: Table 11: 115,941 (99.51%)
[2026-05-01 01:50:20] INFO: Table 25: 121 (0.10%)
[2026-05-01 01:50:20] INFO: Table 4: 445 (0.38%)
[2026-05-01 01:50:20] INFO: Table UNRESOLVED: 1 (0.00%)
[2026-05-01 01:50:20] INFO: Done.