Sharing Environments
Overview
Teaching: 30 min
Exercises: 15 minQuestions
Why should I share my Conda environment with others?
How do I share my Conda environment with others?
How do I create an environment file that can be read by Windows, Mac OS, or Linux.
How do I specifying the package version in a Conda environment file.
Objectives
Understand why you would create an Conda environment file.
Create a Conda environment file in a text editor, specifying the channel, packages and their version.
Use the
conda env
subcommand to export a given environment to a environment file.
Reproducible research
Conda environments are useful when making bioinformatics projects reproducible. Full reproducibility requires the ability to recreate the system that was originally used to generate the results. This can, to a large extent, be accomplished by using a Conda environment file to make an environment with specific versions of the packages that are needed in the project. This environment file can then be shared with others users to reproduce your analysis environment containing software with the same version number.
Creating an environment file
Conda uses YAML (“YAML Ain’t Markup Language”) for writing its environment files. YAML is a human-readable language that is commonly used for configuration files and that that uses Python-style indentation to indicate nesting.
Creating your project’s Conda environment from a single environment file is a Conda “best practice”. Not only do you have a file to share with collaborators but you also have a file that can be placed under version control which further enhancing the reproducibility of your research project and workflow.
Default
environment.yml
fileNote that by convention Conda environment files are called
environment.yml
. As such if you use theconda env create
sub-command without passing the--file
option, thenconda
will expect to find a file calledenvironment.yml
in the current working directory and will throw an error if a file with that name can not be found.
Let’s take a look at an example environment.yml
file to give you an idea of how to write your own environment files.
name: rnaseq-env
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- salmon
- fastqc
- multiqc
The first line specifies a default name rnaseq-env
for the environment, however this can be overidden on the command line. The second line specifies a list of channels, listed in priority order, that packages may need to be installed from. Finally the dependencies lists the most current and mutually compatible versions of the listed packages (including all required dependencies) to download.
The newly created environment would be installed inside the conda environment directory e.g. ~/miniconda3/envs/
directory, unless we specified a different path using conda create
command line option --prefix
or -p
.
Since explicit versions numbers for all packages should be preferred a better environment file would be the following.
name: rnaseq-env
channels:
- conda-forge
- bioconda
dependencies:
- salmon=1.5
- fastqc=0.11
- multiqc=1.11
Note that we are only specifying the major and minor version numbers and not the patch or build numbers. Defining the version number by fixing only the major and minor version numbers while allowing the patch version number to vary allows us to use our environment file to update our environment to get any bug fixes whilst still maintaining significant consistency of our Conda environment across updates.
Always version control your
environment.yml
files!While you should never version control the contents of your
env/
environment sub-directory, you should always version control yourenvironment.yml
files. Version controlling yourenvironment.yml
files together with your project’s source code means that you always know which versions of which packages were used to generate your results at any particular point in time.
Let’s suppose that you want to use the environment.yml
file defined above to create a Conda environment in a sub-directory a project directory. Here is how you would accomplish this task.
$ cd ~/
$ mkdir rnaseq-project-2
$ cd rnaseq-project-2
Once your project folder is created, create an environment.yml
file using your favourite editor for instance nano
.
name: rnaseq-env
channels:
- conda-forge
- bioconda
dependencies:
- salmon=1.5
- fastqc=0.11
- multiqc=1.11
Finally create a new conda environment:
$ conda env create --prefix ./env --file environment.yml
$ conda activate ./env
Note that the above sequence of commands assumes that the environment.yml
file is stored within your ` rnaseq-project-2` directory.
Automatically generate an environment.yml
We can automatically generate the contents of an environment file using the conda env export
command. To export the packages installed into the previously created rnaseq-env
you can run the following command:
$ conda env export --name basic-rnaseq-env
When you run this command, you will see the resulting YAML formatted representation of your Conda environment streamed to the terminal. Recall that we only listed three packages when we originally created basic-rnaseq-env
yet from the output of the conda env export
command we see that these packages result in an environment with a large number of dependencies!
To export this list into an environment.yml
file, you can use --file
option to directly save the resulting YAML environment into a file.
$ conda env export --name basic-rnaseq-env --file environment.yml
Make sure you do not have any other environment.yml
file from before in the same directory when running the above command.
This exported environment file will however not consistently produce environments that are reproducible across Mac OS, Windows, and Linux. The reason is, that it may include operating system specific low-level packages which cannot be used by other operating systems.
If you need an environment file that can produce environments that are reproducible across Mac OS, Windows, and Linux, then you are better off just including those packages into the environment file that your have specifically installed using the --from-history
option.
$ conda env export --name basic-rnaseq-env --from-history --file environment.yml
In short: to make sure others can reproduce your environment independent of the operating system they use, make sure to add the --from-history
argument to the conda env export
command.
Pip and conda env export
--from-history
Python packages installed via pip are not exported using the conda env export
--from-history
argument. You can add them to the environment YAML file using the keywordpip:
followed by a list of python packages, For example;name: rnaseq-env channels: - conda-forge - bioconda dependencies: - salmon=1.5 - fastqc=0.11 - multiqc=1.11 pip: - pandas
Create a new environment from a YAML file.
Create a new project directory rnaseq-project-3 and then create a new
environment.yml
file inside your project directory with the following contents.name: rnaseq-project3-env channels: - conda-forge - bioconda dependencies: - salmon=1.5 - fastqc=0.11 - multiqc=1.11
Now use this file to create a new Conda environment. Where is this new environment created? Using the same
environment.yml
file create a Conda environment as a sub-directory calledenv/
inside a newly created project directory. Compare the contents of the two environments.Solution
To create a new environment from a YAML file use the
conda env create
sub-command as follows.$ cd ~/ $ mkdir rnaseq-project-3 $ cd rnaseq-project-3 $ nano environment.yml $ conda env create --file environment.yml
The above sequence of commands will create a new Conda environment inside the
envs_dirs
directory. In order to create the Conda environment inside a sub-directory of the project directory you need to pass the--prefix
to theconda env create
command as follows.$ conda env create --file environment.yml --prefix ./env
You can now run the
conda env list
command and see that these two environments have been created in different locations but contain the same packages.
Updating an environment
You are unlikely to know ahead of time which packages (and version numbers!) you will need to use for your research project. For example it may be the case that
- one of your core dependencies just released a new version (dependency version number update).
- you need an additional package for data analysis (add a new dependency).
- you have found a better visualization package and no longer need to old visualization package (add new dependency and remove old dependency).
If any of these occurs during the course of your research project, all you need to do is update the contents of your environment.yml
file accordingly and then run the following command.
$ cd ~/
$ cd rnaseq-project-2
$ conda env update --prefix ./env --file environment.yml --prune
Note that the --prune
option tells conda to remove any installed packages not defined in environment.yml
Rebuilding a Conda environment from scratch
When working with
environment.yml
files it is often just as easy to rebuild the Conda environment from scratch whenever you need to add or remove dependencies. To rebuild a Conda environment from scratch you can pass the--force
option to theconda env create
command which will remove any existing environment directory before rebuilding it using the provided environment file.$ conda env create --prefix ./env --file environment.yml --force
Update environment from environment.yml
Update the environment file from the previous exercise, rnaseq-project-3, by adding the package kallisto=0.46 and removing the salmon package. Then rebuild the environment.
Solution
The
environment.yml
file should now look as follows.name: rnaseq-env channels: - conda-forge - bioconda dependencies: - fastqc=0.11 - multiqc=1.11 - kallisto=0.46
You could use the following command, that will rebuild the environment from scratch with the new dependencies:
$ cd ~/rnaseq-project-3 $ conda env create --prefix ./env --file environment.yml --force
Or, if you just want to update the environment in-place with the new kallisto dependencies, you can use:
$ conda env update --prefix ./env --file environment.yml --prune
Restoring an environment
Conda keeps a history of all the changes made to your environment, so you can easily “roll back” to a previous version. To list the history of each change to the current environment:
$ conda activate basic-rnaseq-env
$ conda list --revisions
To restore environment to a previous revision:
$ conda install --revision=REVNUM or conda install --rev REVNUM.
For example,
$ conda install --revision=1
List revisions.
Activate the environment inside the
rnaseq-project-3
and list the revisionsSolution
To create a new environment from a YAML file use the
conda env create
sub-command as follows.$ cd ~/ $ cd rnaseq-project-3 $ conda activate ./env $ conda list --revisions
Key Points
Sharing Conda environments with other researchers facilitates the reproducibility of your research.
Conda environment files ,
environment.yml
, describes your project’s software environment.