Conda environments
Conda is an open-source package management system and environment management system that runs on Windows, macOS, and Linux. Conda quickly installs, runs, and updates packages and their dependencies, where a sophisticated dependency solver allows installing multiple tools into the same environment at once without introducing conflicts between required versions of the individual dependencies. This is ideal for scientific projects where reproducibility is key - you can simply create a separate conda environment for each individual project.
Conda was initially created for Python packages but it can package and distribute software for any language. Conda also doesn't require elevated privileges allowing users to install anything with ease. Most tools are already available in the default Anaconda repository, but other community-driven channels like bioconda allow installing practically anything. In comparison to containers, Conda is a dependency manager at the Python package level, while containers also manage operating system dependencies at the base operating system level, hence containers and conda environments are often used together to ensure complete reproducibility and portability, see for example the biocontainers.pro project.
Creating an environment
To install software through conda, it must always be done in an environment. Conda itself is already installed and configured on BioCloud, so you don't need to install it first. To create an environment and install some software in it, run for example:
# create+activate+install
conda env create -n myproject
conda activate myproject
conda install -c bioconda somepkg1=1.0 somepkg2=2.0
# or in one command
conda create -n myproject -c bioconda somepkg1 somepkg2
Make sure to add the required conda channels using -c <channel>
from which to install the software. Usually the bioconda
and conda-forge
channels are all you need.
The best practice is to always note down all packages including versions used in projects before you forget things to ensure reproducibility. You can always export an activated environment created previously and dump the exact versions used into a YAML file with conda env export > requirements.yml
. The file could for example look like this:
requirements.yml
To create an environment from the file in the future simply run conda env create -f requirements.yml
.
Note
When you export a conda environment to a file the file may also contain a host-specific prefix
line, which should be removed if you or someone else need to run it elsewhere.
To use the software installed in the environment remember to activate the environment first using
List available environments with
Installing packages using pip within conda environments
Software that can only be installed with pip have to be installed in a Conda environment by using pip inside the environment. While issues can arise, per the Conda guide for using pip in a Conda environment, there are some best practices to follow to reduce their likelihood:
- Use pip only after conda package installs
- Use conda environments for isolation (Don't perform pip installs in the
base
environment) - Recreate the entire environment if changes are needed after pip packages have been installed
- Use
--no-cache-dir
with anypip install
commands
After activating the conda environment an install command would look like the following:
If you then export the conda environment to a YAML file using conda env export > requirements.yml
, software dependencies installed using pip should show under a separate - pip:
field, for example:
name: myproject
channels:
- bioconda
dependencies:
- minimap2=2.26
- samtools=1.18
- pip:
- virtualenv==20.25.0
Be aware that specific versions are specified using double ==
with pip
dependencies.
R and installing R packages within conda environments
Use the notation r-{package}
to install R and required R packages within an environment, see the list of packages here. Alternatively using renv is highly recommended for project reproducibility and portability if you need to install many packages.
VS Code and conda environments
To ensure VS Code uses for example R and Python installations in conda environments, you can make a file .vscode/settings.json
in the current project folder and write for example:
{
"r.rterm.linux": "${userHome}/.conda/envs/myproject/bin/R",
"python.defaultInterpreterPath": "${userHome}/.conda/envs/myproject/bin/python"
}
You can also place the settings.json
file at $HOME/.config/Code/User/settings.json
instead to make the settings apply for all projects. If any .vscode/settings.json
files are present in individual project folders, they will take precedence over $HOME/.config/Code/User/settings.json
.
Read more details here.