feature_summary.tsv
The tab-delimited file containing the summary of features for each genome in the training dataset.
The columns in this file are as follows:
* user_genome ( or genome ID for build_features): Unique identifier of the query genome.
* best_tln_table ( missing for build_features): The final predicted Genetic Translation Table (GTT). This is the “consensus” choice recommended by the tool (e.g., 11 for standard, 4 for UGA=Trp, or 25 for UGA=Gly).
* Coding_density_4: gene coding density when predicting genes with Prodigal using translation table 4 or equivalently 25 which reassigned the UGA stop coding to either tryptophan or glycine.
* Coding_density_11: gene coding density when predicting genes with Prodigal (Hyatt et al., 2010) using translation table 11.
* Density_Diff: the difference in coding density when using translation table 4/25 or translation table 11, i.e. CD4 – CD11.
* GC: percentage of guanine (G) and cytosine (C) nucleotides in a genome.
* Trp_ratio: log-transformed ratio of UGA to UGG codon counts when predicting genes with Prodigal under translation table 4. The log ratio is clipped between -6 and 5 to remove extreme outliers.
* Trp_magnitude: log-transformed count of all UGA and UGG tryptophan codons when predicting genes with Prodigal under translation table 4.
* Gly_ratio: log-transformed ratio of UGA codon counts to the total glycine codon counts (i.e. codons GGn) when predicting genes with Prodigal under translation table 4. The log ratio is clipped between -10 and 0 to remove extreme outliers.
* UGG_density: ratio of UGG tryptophan codons to glycine codons (GGn) when predicting genes with Prodigal under translation table 4.