translation_table_summary.tsv

The output of gTranslate is provided in a tab-separated file (typically gtranslate.tsv). This file contains the final taxonomic predictions, the biological features calculated from the sequences, and the specific outputs of the machine learning ensemble.

The columns in this file are as follows: * user_genome: Unique identifier of the query genome. * best_tln_table: The final predicted Genetic Translation Table (GTT). This is the “consensus” choice recommended by the tool (e.g., 11 for standard, 4 for UGA=Trp, or 25 for UGA=Gly). * coding_density_4: The calculated coding density of the genome assuming the translation table is Table 4. * coding_density_11: The calculated coding density assuming the translation table is Table 11 (Standard code). * gc_percent: The percentage of Guanine and Cytosine in the genome. * n50: N50 of the genome assembly. * genome_size: The total length of the genome assembly in base pairs. * contig_count: The total number of contigs (fragments) in the user’s genome file. * confidence: A score ranging from 0.0 to 1.0 representing the level of agreement across the internal machine learning models. A score of 1.0 indicates that all classifiers in the ensemble (Adaboost, MLP, etc.) predicted the same table. * adaboost_pred: The specific GTT prediction made by the AdaBoost classifier. * decisiontree_pred: The specific GTT prediction made by the Decision Tree classifier. * kneighbors_pred: The specific GTT prediction made by the K-Nearest Neighbors (KNN) classifier. * xgb_pred: The specific GTT prediction made by the XGBoost (Extreme Gradient Boosting) classifier. * mlp_pred: The specific GTT prediction made by the Multi-Layer Perceptron (Neural Network) classifier. * warnings: Flags any unusual features or inconsistencies, such as extreme coding density differences or low confidence scores, that may suggest the prediction should be manually reviewed. “N/A” indicates no issues were detected.

Produced by

Example

user_genome best_tln_table  coding_density_4        coding_density_11       gc_percent      n50     genome_size     contig_count    confidence      adaboost_pred   decisiontree_predkneighbors_pred        xgb_pred        mlp_pred        warnings
GCA_000145705.TT4   4       90.29257        64.14511        25.88353        839615  839615  1       1.0     4       4       4       4       4       N/A
GCA_027046965.1_TT25        25      94.21967        56.0663 32.96163        50860   854813  32      1.0     25      25      25      25      25      N/A
GCA_910575315.1_TT11        11      87.75128        87.91231        44.09593        73133   4990987 224     1.0     11      11      11      11      11      N/A