Install programs

By C.Du @snail123815

For most high-performance computing (HPC) servers, users do not have root access to install software globally. Instead, they can create their own environments and install software within those environments. This approach allows users to manage their software dependencies without affecting other users on the same server.

This tutorial provides guidance of creating environments and install programs in the created environments using micromamba. To install programs that are not available in any conda repositories, please ask administrators for help.

What is an environment

An environment created by conda, micromamba, or pyvenv is essentially just a folder/directory on the disk. This directory contains configuration files and dependency programs. It has a special structure to allow environment manager programs (conda/micromamba/pyvenv) to read.

Read package management concepts for more detail.

Do not change any content in an environment directory manually, unless you understand how environment manager works.

Install a program

New users on BLIS should have micromamba ready to use directly. You should see your prompt as:

(base) [user@blis ~]$ 

The (base) in front means you have the base environment activated, which located in ~/.micromamba. Note this is BLIS only. If this is not seen, please make sure you have micromamba ready to use. Before setting up virtual environments, it is highly recommended to setup a ~/.mambarc file with the following content (new users on BLIS should have it already, check by yourself). It is explained in the later section.

envs_dirs:
  - /vol/local/conda_envs
pkgs_dirs:
  - /vol/local/.conda_cache/[USERNAME]
channels:
  - bioconda
  - conda-forge
  - defaults
auto_activate_base: true

Create an environment to host the software or a pipe line you want to run. Then you have all control over the environment you created.

Caution

Do not use -n or --name to create an environment, it will be created in your home directory by default, which has a quota on disk space. Putting the environment on the shared drive as shown below does not reduce your home directory quota.

This block includes prompt, select command to copy
# 0. Make sure you have your shell initiated with micromamba
(base) [user@blis ~]$

# 1. Create environment called multi-omics and activate it
(base) [user@blis ~]$ micromamba create -p /vol/local/conda_envs/multi-omics
(base) [user@blis ~]$ micromamba activate /vol/local/conda_envs/multi-omics

# 2. Install software, eg. python
(/vol/local/conda_envs/multi-omics) [user@blis ~]$ micromamba install -c conda-forge python

I guess you have noticed that we have setup the channels in ~/.mambarc, so most of the time you can omit the -c conda-forge part for explicitly specifying the channel where the software comes. In the example above, python will be installed from conda-forge.

Tip

I usually create a “soft link” to /vol/local/conda_envs/ in home directory for easier access to all the environments. For example,

ln -s /vol/local/conda_envs/ ~/genvs

Then I can replace all /vol/local/conda_envs/ with ~/genvs, much simpler.

Advanced method (do not do if you don’t know what ln -s means and its restrictions) is to soft link the shared environment directory to micromamba base directory:

ln -s /vol/local/conda_envs/ ~/micromamba-base/envs

This needs to be done when the target directory does not exist (before creating any “named” environment). The advantage of this method is that you can create environment in the shared environment directory using -n and may be more compatible with most program tutorial (the old ones usually assume you have sudo rights and unlimited HOME directory, which is not the case in any of the server systems.) Use this method with caution!

Do not follow tutorial with yml/yaml file

.yml or .yaml file format is usually configuration files written with a variant of markup language, describing the required programs and usually their versions, i.e. dependencies.

Many times you will find a tutorial to setup a conda environment by conda env create -f minimotif.yml minimotif. Please DO NOT follow this by simply replacing conda env with micromamba.

In these cases, the .yml file usually looks like:

name: MiniMotif
channels:
  - conda-forge
  - bioconda
  - defaults
dependencies:
  - _libgcc_mutex=0.1
  - _openmp_mutex=4.5
  - alsa-lib=1.2.8
  - attr=2.5.1
  - biopython=1.81
  - ...

You need to open this file using a text editor, remove the name: line and save the file. The name: line is telling conda or micromamba to install the environment with -n switch, or install the dependencies it is not compatible with -p switch.

Then create the environment:

This block includes prompt, select command to copy
# 1. Create environment using -p
(base) [user@blis ~]$ micromamba create -p /vol/local/conda_envs/MiniMotif
(base) [user@blis ~]$ micromamba activate /vol/local/conda_envs/MiniMotif
# 2. Install all dependencies using the .yml file
(/vol/local/conda_envs/MiniMotif) [user@blis ~]$ micromamba install -f minimotif.yml

Now it should do the installation, follow the screen to continue.

Why not combine into one single command? Because -p and -f parameters are not compatible.

After installation, you can try your program to see if the help function works:

This block includes prompt, select command to copy
(/vol/local/conda_envs/MiniMotif) [user@blis ~]$ python minimotif.py -h
usage:
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Generic command:
     minimotif.py -i [binding_profiles] -G [genome] -O [outdir]
__________________________________________________________________________
Mandatory arguments:
    -G  Provide a genbank file of the genome of interest for TFBS scanning
    ...

Do not forget to deactivate your environment before doing anything else, unless you know what you are doing. Your current activated environment is shown in the parenthesis before your command line. To deactivate:

This block includes prompt, select command to copy
(/vol/local/conda_envs/MiniMotif) [user@blis ~]$ micromamba deactivate
(base) [user@blis ~]$

Setting up config file

The program micromamba uses a config file located in your home folder: ~/.mambarc to store your specific configurations. Well micromamba not only check ~/.mambarc file, but also uses ~/.condarc, one of them is enough. (The later is used by conda)

The config file has few convenient options. On BLIS, please put these contents in the config file:

envs_dirs:
  - /vol/local/conda_envs
pkgs_dirs:
  - /vol/local/.conda_cache/USERNAME
  • env_dirs will allow micromamba env list command to list all environments including our shared environments. (ignore this line if you have soft linked it to ~/micromamba_base/envs)

  • pkgs_dirs set the cache dir, it is a easy-to-clean location.

You can also add after the above contents:

channels:
  - bioconda
  - conda-forge
  - defaults
auto_activate_base: true
  • channels will allow you to skip -c option when installing packages

  • auto_activate_base will activate your base environment, by this, you will be using eg. python from your base environment rather than a system one.

If you need more information on how to use micromamba on your own machine, please refer to our micromamba instruction.

Premissions of shared environments on BLIS

All files generated by micromamba, including all environments created, by default belong to the group condablis. All group members can activate these environments. Only the owner who created the environment can add or remove package. If you want to let others change your environment, you need to specifically change the permission:

chmod -R g+w /vol/local/conda_envs/yourEnvironment

Anyone who changed this environment should do this again for others to change it. Or, the owner can remove this permission after changing:

chmod -R g-w /vol/local/conda_envs/yourEnvironment

This restriction is due to the limitations of the Linux file system, which are intentional for safety reasons.

Monitoring disk space

Disk space is a shared resource on our servers, and it is crucial to monitor it regularly to ensure smooth operation. Here are some tips for monitoring disk space related with your environments and installed programs:

  1. Use df -h to check the overall disk usage.

    • /home is the home directory, which has a quota for each user.

    • /vol/local and /vol/local1 etc. are the shared local storage, which has no quota but shared by all users. Be mindful of others when using it.

  2. Use du -sh /path/to/your/environment to check the size of your conda environment.

  3. Regularly clean up unused environments and packages to free up space.

    • micromamba clean -a to clean up all unused packages and caches.

    • You can simply remove the environment directory to remove an environment.

  4. Be mindful of large files generated by the package, such as downloaded database, logs, temporary files, and output data. Regularly check and clean these files if they are no longer needed.

    • Well designed programs usually do not use the environment directory to store these files, but some do. Be aware of the programs developed by only a few people, which may not have good design and documentation. If you are not sure where the program stores these files, ask the developers or administrators for help.

    • Some programs have options to specify the location of these files, you can set it to a location with enough space, shared_db/ if it can be used by other program.

    • Some programs fail because of insufficient space, often due to storing data in your HOME directory, which has a quota.

      • These files may be config, data files, or temporary files.

      • Consult with administrators for how to change the location of these files. Possible solutions include:

        • Change the config file of the program or use environment variables to specify the location of these files.

        • Use a soft link to your shared local storage.

        • Change the default location of these files by changing the program code, if you have the permission to do so.

  5. Consult with your team and administrators for how to store and use database from shared database location, to prevent unnecessary duplication.