Installing gTranslate

gTranslate is available through multiple sources, you only need to choose one. If you are unsure which one to choose, Bioconda is generally the easiest.

Sources

Hardware requirements

Python libraries

gTranslate is designed for Python >=3.12 and requires the following libraries, which will be automatically installed:

Library

Version

Reference

NumPy

>= 1.26.0

Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 0.1038/s41586-020-2649-2

tqdm

>= 4.67.0

DOI: 10.5281/zenodo.595120

Pandas

>= 2.2.0

McKinney W. 2010. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, 51-56.

scikit-learn

>= 1.6.1

Pedregosa F, et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830.

joblib

>= 1.3.2

Joblib: https://joblib.readthedocs.io/en/latest/

scipy

>= 1.12.0

Virtanen P, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272 (2020). DOI: 10.1038/s41592-019-0686-2

mlxtend

>= 0.22.0

Raschka S. 2018. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. Journal of Open Source Software, 3(24), 638, https://doi.org/10.21105/joss.00638

plotly

>= 5.15.0

Plotly Technologies Inc. 2015. Collaborative data science. Montréal, QC: Plotly Technologies Inc. https://plot.ly

xgboost

>= 2.0.0

Chen T, et al. 2016. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. DOI: 10.1145/2939672.2939785

lightgbm

>= 3.3.5

Ke G, et al. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, 30, 3146–3154.

requests

>= 2.31.0

Reitz K. and Kenneth Reitz. 2023. Requests: HTTP for Humans. https://docs.python-requests.org/en/latest/

Please cite these libraries if you use gTranslate in your work.

Third-party software

gTranslate makes use of the following 3rd party dependencies and assumes they are on your system path:

Tip

The check_install command will verify that all of the programs are on the path.

Software

Version

Reference

Prodigal

>= 2.6.2

Hyatt D, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119. doi: 10.1186/1471-2105-11-119.

Please cite these tools if you use gTranslate in your work.

gTranslate Models

Due to file size limits, most of the gTranslate models (including the R220 and R232 classifiers) are provided as a separate download.

Downloading the Models

We provide three mirrors for downloading the classifiers. For optimal speeds, choose the mirror geographically closest to you:

You can download and extract the archive directly from the command line. Choose one of the following commands based on your preferred mirror:

# Download from GTDB (Australia)
wget https://data.gtdb.ecogenomic.org/tools/gtranslate/gtranslate_models.tar.gz

# OR Download from Aalborg University (Europe)
# wget https://data.gtdb.aau.ecogenomic.org/tools/gtranslate/gtranslate_models.tar.gz

# Extract the downloaded archive
tar xvzf gtranslate_models.tar.gz

Configuring the Model Path

gTranslate requires the GTRANSLATE_MODEL_PATH environment variable to be set to the directory containing the unarchived reference data.

For detailed instructions on setting this variable, please refer to the documentation for your specific setup: