Install programs¶
By C.Du @snail123815
For most high-performance computing (HPC) servers, users do not have root access to install software globally. Instead, they can create their own environments and install software within those environments. This approach allows users to manage their software dependencies without affecting other users on the same server.
This tutorial provides guidance of creating environments and install programs in the created environments using micromamba. To install programs that are not available in any conda repositories, please ask administrators for help.
What is an environment¶
An environment created by conda, micromamba, or pyvenv is essentially just a folder/directory on the disk. This directory contains configuration files and dependency programs. It has a special structure to allow environment manager programs (conda/micromamba/pyvenv) to read.
Read package management concepts for more detail.
Do not change any content in an environment directory manually, unless you understand how environment manager works.
Install a program¶
New users on BLIS should have micromamba ready to use directly. You should see your prompt as:
(base) [user@blis ~]$
The (base) in front means you have the base environment activated, which located in ~/.micromamba. Note this is BLIS only. If this is not seen, please make sure you have micromamba ready to use. Before setting up virtual environments, it is highly recommended to setup a ~/.mambarc file with the following content (new users on BLIS should have it already, check by yourself). It is explained in the later section.
envs_dirs:
- /vol/local/conda_envs
pkgs_dirs:
- /vol/local/.conda_cache/[USERNAME]
channels:
- bioconda
- conda-forge
- defaults
auto_activate_base: true
Create an environment to host the software or a pipe line you want to run. Then you have all control over the environment you created.
Caution
Do not use -n or --name to create an environment, it will be created in your home directory by default, which has a quota on disk space. Putting the environment on the shared drive as shown below does not reduce your home directory quota.
# 0. Make sure you have your shell initiated with micromamba
(base) [user@blis ~]$
# 1. Create environment called multi-omics and activate it
(base) [user@blis ~]$ micromamba create -p /vol/local/conda_envs/multi-omics
(base) [user@blis ~]$ micromamba activate /vol/local/conda_envs/multi-omics
# 2. Install software, eg. python
(/vol/local/conda_envs/multi-omics) [user@blis ~]$ micromamba install -c conda-forge python
I guess you have noticed that we have setup the channels in ~/.mambarc, so most of the time you can omit the -c conda-forge part for explicitly specifying the channel where the software comes. In the example above, python will be installed from conda-forge.
Tip
I usually create a “soft link” to /vol/local/conda_envs/ in home directory for easier access to all the environments. For example,
ln -s /vol/local/conda_envs/ ~/genvs
Then I can replace all /vol/local/conda_envs/ with ~/genvs, much simpler.
Advanced method (do not do if you don’t know what ln -s means and its restrictions) is to soft link the shared environment directory to micromamba base directory:
ln -s /vol/local/conda_envs/ ~/micromamba-base/envs
This needs to be done when the target directory does not exist (before creating any “named” environment). The advantage of this method is that you can create environment in the shared environment directory using -n and may be more compatible with most program tutorial (the old ones usually assume you have sudo rights and unlimited HOME directory, which is not the case in any of the server systems.) Use this method with caution!
Do not follow tutorial with yml/yaml file¶
.yml or .yaml file format is usually configuration files written with a variant of markup language, describing the required programs and usually their versions, i.e. dependencies.
Many times you will find a tutorial to setup a conda environment by conda env create -f minimotif.yml minimotif. Please DO NOT follow this by simply replacing conda env with micromamba.
In these cases, the .yml file usually looks like:
name: MiniMotif
channels:
- conda-forge
- bioconda
- defaults
dependencies:
- _libgcc_mutex=0.1
- _openmp_mutex=4.5
- alsa-lib=1.2.8
- attr=2.5.1
- biopython=1.81
- ...
You need to open this file using a text editor, remove the name: line and save the file. The name: line is telling conda or micromamba to install the environment with -n switch, or install the dependencies it is not compatible with -p switch.
Then create the environment:
# 1. Create environment using -p
(base) [user@blis ~]$ micromamba create -p /vol/local/conda_envs/MiniMotif
(base) [user@blis ~]$ micromamba activate /vol/local/conda_envs/MiniMotif
# 2. Install all dependencies using the .yml file
(/vol/local/conda_envs/MiniMotif) [user@blis ~]$ micromamba install -f minimotif.yml
Now it should do the installation, follow the screen to continue.
Why not combine into one single command? Because -p and -f parameters are not compatible.
After installation, you can try your program to see if the help function works:
(/vol/local/conda_envs/MiniMotif) [user@blis ~]$ python minimotif.py -h
usage:
-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_-_
Generic command:
minimotif.py -i [binding_profiles] -G [genome] -O [outdir]
__________________________________________________________________________
Mandatory arguments:
-G Provide a genbank file of the genome of interest for TFBS scanning
...
Do not forget to deactivate your environment before doing anything else, unless you know what you are doing. Your current activated environment is shown in the parenthesis before your command line. To deactivate:
(/vol/local/conda_envs/MiniMotif) [user@blis ~]$ micromamba deactivate
(base) [user@blis ~]$
Setting up config file¶
The program micromamba uses a config file located in your home folder: ~/.mambarc to store your specific configurations. Well micromamba not only check ~/.mambarc file, but also uses ~/.condarc, one of them is enough. (The later is used by conda)
The config file has few convenient options. On BLIS, please put these contents in the config file:
envs_dirs:
- /vol/local/conda_envs
pkgs_dirs:
- /vol/local/.conda_cache/USERNAME
env_dirswill allowmicromamba env listcommand to list all environments including our shared environments. (ignore this line if you have soft linked it to~/micromamba_base/envs)pkgs_dirsset the cache dir, it is a easy-to-clean location.
You can also add after the above contents:
channels:
- bioconda
- conda-forge
- defaults
auto_activate_base: true
channelswill allow you to skip -c option when installing packagesauto_activate_basewill activate your base environment, by this, you will be using eg. python from your base environment rather than a system one.
If you need more information on how to use micromamba on your own machine, please refer to our micromamba instruction.
Monitoring disk space¶
Disk space is a shared resource on our servers, and it is crucial to monitor it regularly to ensure smooth operation. Here are some tips for monitoring disk space related with your environments and installed programs:
Use
df -hto check the overall disk usage./homeis the home directory, which has a quota for each user./vol/localand/vol/local1etc. are the shared local storage, which has no quota but shared by all users. Be mindful of others when using it.
Use
du -sh /path/to/your/environmentto check the size of yourcondaenvironment.Regularly clean up unused environments and packages to free up space.
micromamba clean -ato clean up all unused packages and caches.You can simply remove the environment directory to remove an environment.
Be mindful of large files generated by the package, such as downloaded database, logs, temporary files, and output data. Regularly check and clean these files if they are no longer needed.
Well designed programs usually do not use the environment directory to store these files, but some do. Be aware of the programs developed by only a few people, which may not have good design and documentation. If you are not sure where the program stores these files, ask the developers or administrators for help.
Some programs have options to specify the location of these files, you can set it to a location with enough space,
shared_db/if it can be used by other program.Some programs fail because of insufficient space, often due to storing data in your HOME directory, which has a quota.
These files may be config, data files, or temporary files.
Consult with administrators for how to change the location of these files. Possible solutions include:
Change the config file of the program or use environment variables to specify the location of these files.
Use a soft link to your shared local storage.
Change the default location of these files by changing the program code, if you have the permission to do so.
Consult with your team and administrators for how to store and use database from shared database location, to prevent unnecessary duplication.