2025-03-17 15:46:03 +01:00

5.1 KiB

title, last_updated, keywords, summary, sidebar, permalink
title last_updated keywords summary sidebar permalink
Python 28 September 2020
python
anaconda
conda
jupyter
numpy
Running Python on Merlin merlin6_sidebar /merlin6/python.html

PSI provides a variety of ways to execute python code.

  1. Anaconda - Custom environments for using installation and development
  2. Jupyterhub - Execute Jupyter notebooks on the cluster
  3. System Python - Do not use! Only for OS applications.

Anaconda

Anaconda ("conda" for short) is a package manager with excellent python integration. Using it you can create isolated environments for each of your python applications, containing exactly the dependencies needed for that app. It is similar to the virtualenv python package, but can also manage non-python requirements.

Loading conda

Conda is loaded from the module system:

module load anaconda

Using pre-made environments

Loading the module provides the conda command, but does not otherwise change your environment. First an environment needs to be activated. Available environments can be seen with conda info --envs and include many specialized environments for software installs. After activating you should see the environment name in your prompt:

~ $ conda activate datascience_py37
(datascience_py37) ~ $

CondaRC file

Creating a ~/.condarc file is recommended if you want to create new environments on merlin. Environments can grow quite large, so you will need to change the default storage location from the default (your home directory) to a larger volume (usually /data/user/$USER).

Save the following as $HOME/.condarc:

always_copy: true

envs_dirs:
  - /data/user/$USER/conda/envs

pkgs_dirs:
  - /data/user/$USER/conda/pkgs
  - $ANACONDA_PREFIX/conda/pkgs

channels:
  - conda-forge
  - nodefaults

Run conda info to check that the variables are being set correctly.

Creating environments

We will create an environment named myenv which uses an older version of numpy, e.g. to test for backwards compatibility of our code (the -q and --yes switches are just for not getting prompted and disabling the progress bar). The environment will be created in the default location as defined by the .condarc configuration file (see above).

~ $ conda create -q --yes -n 'myenv1' numpy=1.8 scipy ipython

Fetching package metadata: ...
Solving package specifications: .
Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:

The following NEW packages will be INSTALLED:

    ipython:    2.3.0-py27_0
    numpy:      1.8.2-py27_0
    openssl:    1.0.1h-1
    pip:        1.5.6-py27_0
    python:     2.7.8-1
    readline:   6.2-2
    scipy:      0.14.0-np18py27_0
    setuptools: 5.8-py27_0
    sqlite:     3.8.4.1-0
    system:     5.8-1
    tk:         8.5.15-0
    zlib:       1.2.7-0

To activate this environment, use:
$ source activate myenv1

To deactivate this environment, use:
$ source deactivate

The created environment contains just the packages that are needed to satisfy the requirements and it is local to your installation. The python installation is even independent of the central installation, i.e. your code will still work in such an environment, even if you are offline or AFS is down. However, you need the central installation if you want to use the conda command itself.

Packages for your new environment will be either copied from the central one into your new environment, or if there are newer packages available from anaconda and you did not specify exactly the version from our central installation, they may get downloaded from the web. **This will require significant space in the envs_dirs that you defined in .condarc. If you create other environments on the same local disk, they will share the packages using hard links.

We can switch to the newly created environment with the conda activate command.

$ conda activate myenv1

{% include callout.html type="info" content="Note that anaconda's activate/deactivate scripts are compatible with the bash and zsh shells but not with [t]csh." %}

Let's test whether we indeed got the desired numpy version:

$ python -c 'import numpy as np; print np.version.version'

1.8.2

You can install additional packages into the active environment using the conda install command.

$ conda install --yes -q bottle

Fetching package metadata: ...
Solving package specifications: .
Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:

The following NEW packages will be INSTALLED:

    bottle: 0.12.5-py27_0

Jupyterhub

Jupyterhub is a service for running code notebooks on the cluster, particularly in python. It is a powerful tool for data analysis and prototyping. For more infomation see the Jupyterhub documentation.

Pythons to avoid

Avoid using the system python (/usr/bin/python). It is intended for OS software and may not be up to date.

Also avoid the 'python' module (module load python). This is a minimal install of python intended for embedding in other modules.