ground_truth
Before training, you need to know the “right” answers. This command looks at the taxonomy of your genomes to assign them their confirmed genetic codes. This creates the labeled dataset that the model uses to learn.
Arguments
usage: gtranslate ground_truth --taxonomy_file TAXONOMY_FILE --output_file
OUTPUT_FILE [--manual_gt_file MANUAL_GT_FILE]
[-h]
required named arguments
- --taxonomy_file
File indicating taxonomic classification of each genome.
- --output_file
path to output file
optional arguments
- --manual_gt_file
File indicating manually specific ground truth for select genomes.
Files output
Example
Input
gtranslate ground_truth --taxonomy_file taxonomy_file_r226.tsv.gz --output_file ground_truth_results.tsv
Output
[2026-05-01 01:50:19] INFO: gTranslate v0.0.2
[2026-05-01 01:50:19] INFO: gtranslate ground_truth --taxonomy_file taxonomy_file_r226.tsv.gz --output_file ground_truth_results.tsv
[2026-05-01 01:50:19] INFO: Selecting Ground Truth translation tables based on taxonomic classification.
[2026-05-01 01:50:19] INFO: Determining ground truth for genomes:
[2026-05-01 01:50:20] INFO: - determined ground truth for 116,508 genomes
[2026-05-01 01:50:20] INFO: Table 11: 115,941 (99.51%)
[2026-05-01 01:50:20] INFO: Table 25: 121 (0.10%)
[2026-05-01 01:50:20] INFO: Table 4: 445 (0.38%)
[2026-05-01 01:50:20] INFO: Table UNRESOLVED: 1 (0.00%)
[2026-05-01 01:50:20] INFO: Done.