Installing gTranslate
gTranslate is available through multiple sources, you only need to choose one. If you are unsure which one to choose, Bioconda is generally the easiest.
Sources
Hardware requirements
Python libraries
gTranslate is designed for Python >=3.12 and requires the following libraries, which will be automatically installed:
Library |
Version |
Reference |
|---|---|---|
>= 1.26.0 |
Harris, C.R., Millman, K.J., van der Walt, S.J. et al. Array programming with NumPy. Nature 585, 357–362 (2020). DOI: 0.1038/s41586-020-2649-2 |
|
>= 4.67.0 |
||
>= 2.2.0 |
McKinney W. 2010. Data Structures for Statistical Computing in Python. Proceedings of the 9th Python in Science Conference, 51-56. |
|
>= 1.6.1 |
Pedregosa F, et al. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830. |
|
>= 1.3.2 |
||
>= 1.12.0 |
Virtanen P, et al. 2020. SciPy 1.0: fundamental algorithms for scientific computing in Python. Nature Methods, 17, 261–272 (2020). DOI: 10.1038/s41592-019-0686-2 |
|
>= 0.22.0 |
Raschka S. 2018. MLxtend: Providing machine learning and data science utilities and extensions to Python’s scientific computing stack. Journal of Open Source Software, 3(24), 638, https://doi.org/10.21105/joss.00638 |
|
>= 5.15.0 |
Plotly Technologies Inc. 2015. Collaborative data science. Montréal, QC: Plotly Technologies Inc. https://plot.ly |
|
>= 2.0.0 |
Chen T, et al. 2016. XGBoost: A Scalable Tree Boosting System. Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 785–794. DOI: 10.1145/2939672.2939785 |
|
>= 3.3.5 |
Ke G, et al. 2017. LightGBM: A Highly Efficient Gradient Boosting Decision Tree. Advances in Neural Information Processing Systems, 30, 3146–3154. |
|
>= 2.31.0 |
Reitz K. and Kenneth Reitz. 2023. Requests: HTTP for Humans. https://docs.python-requests.org/en/latest/ |
Please cite these libraries if you use gTranslate in your work.
Third-party software
gTranslate makes use of the following 3rd party dependencies and assumes they are on your system path:
Tip
The check_install command will verify that all of the programs are on the path.
Software |
Version |
Reference |
|---|---|---|
>= 2.6.2 |
Hyatt D, et al. 2010. Prodigal: prokaryotic gene recognition and translation initiation site identification. BMC Bioinformatics, 11:119. doi: 10.1186/1471-2105-11-119. |
Please cite these tools if you use gTranslate in your work.
gTranslate Models
Due to file size limits, most of the gTranslate models (including the R220 and R232 classifiers) are provided as a separate download.
Downloading the Models
We provide three mirrors for downloading the classifiers. For optimal speeds, choose the mirror geographically closest to you:
Europe (GTDB Main): https://data.gtdb.aau.ecogenomic.org/tools/gtranslate/
Australia (UQ Nectar): https://data.gtdb.ecogenomic.org/tools/gtranslate/
Australia (UQ ACE): https://data.ace.uq.edu.au/public/gtdb/tools/gtranslate/
You can download and extract the archive directly from the command line. Choose one of the following commands based on your preferred mirror:
# Download from GTDB (Australia)
wget https://data.gtdb.ecogenomic.org/tools/gtranslate/gtranslate_models.tar.gz
# OR Download from Aalborg University (Europe)
# wget https://data.gtdb.aau.ecogenomic.org/tools/gtranslate/gtranslate_models.tar.gz
# Extract the downloaded archive
tar xvzf gtranslate_models.tar.gz
Configuring the Model Path
gTranslate requires the GTRANSLATE_MODEL_PATH environment variable to be set to the directory containing the unarchived reference data.
For detailed instructions on setting this variable, please refer to the documentation for your specific setup: