5.1 KiB
title, last_updated, keywords, summary, sidebar, permalink
title | last_updated | keywords | summary | sidebar | permalink | |||||
---|---|---|---|---|---|---|---|---|---|---|
Python | 28 September 2020 |
|
Running Python on Merlin | merlin6_sidebar | /merlin6/python.html |
PSI provides a variety of ways to execute python code.
- Anaconda - Custom environments for using installation and development
- Jupyterhub - Execute Jupyter notebooks on the cluster
- System Python - Do not use! Only for OS applications.
Anaconda
Anaconda ("conda" for short) is a package manager with excellent python integration. Using it you can create isolated environments for each of your python applications, containing exactly the dependencies needed for that app. It is similar to the virtualenv python package, but can also manage non-python requirements.
Loading conda
Conda is loaded from the module system:
module load anaconda
Using pre-made environments
Loading the module provides the conda
command, but does not otherwise change your
environment. First an environment needs to be activated. Available environments can
be seen with conda info --envs
and include many specialized environments for
software installs. After activating you should see the environment name in your
prompt:
~ $ conda activate datascience_py37
(datascience_py37) ~ $
CondaRC file
Creating a ~/.condarc
file is recommended if you want to create new environments on
merlin. Environments can grow quite large, so you will need to change the default
storage location from the default (your home directory) to a larger volume (usually
/data/user/$USER
).
Save the following as $HOME/.condarc
:
always_copy: true
envs_dirs:
- /data/user/$USER/conda/envs
pkgs_dirs:
- /data/user/$USER/conda/pkgs
- $ANACONDA_PREFIX/conda/pkgs
channels:
- conda-forge
- nodefaults
Run conda info
to check that the variables are being set correctly.
Creating environments
We will create an environment named myenv
which uses an older version of numpy, e.g. to test for backwards compatibility of our code (the -q
and --yes
switches are just for not getting prompted and disabling the progress bar). The environment will be created in the default location as defined by the .condarc
configuration file (see above).
~ $ conda create -q --yes -n 'myenv1' numpy=1.8 scipy ipython
Fetching package metadata: ...
Solving package specifications: .
Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:
The following NEW packages will be INSTALLED:
ipython: 2.3.0-py27_0
numpy: 1.8.2-py27_0
openssl: 1.0.1h-1
pip: 1.5.6-py27_0
python: 2.7.8-1
readline: 6.2-2
scipy: 0.14.0-np18py27_0
setuptools: 5.8-py27_0
sqlite: 3.8.4.1-0
system: 5.8-1
tk: 8.5.15-0
zlib: 1.2.7-0
To activate this environment, use:
$ source activate myenv1
To deactivate this environment, use:
$ source deactivate
The created environment contains just the packages that are needed to satisfy the
requirements and it is local to your installation. The python installation is even
independent of the central installation, i.e. your code will still work in such an
environment, even if you are offline or AFS is down. However, you need the central
installation if you want to use the conda
command itself.
Packages for your new environment will be either copied from the central one into
your new environment, or if there are newer packages available from anaconda and you
did not specify exactly the version from our central installation, they may get
downloaded from the web. **This will require significant space in the envs_dirs
that you defined in .condarc
. If you create other environments on the same local
disk, they will share the packages using hard links.
We can switch to the newly created environment with the conda activate
command.
$ conda activate myenv1
{% include callout.html type="info" content="Note that anaconda's activate/deactivate scripts are compatible with the bash and zsh shells but not with [t]csh." %}
Let's test whether we indeed got the desired numpy version:
$ python -c 'import numpy as np; print np.version.version'
1.8.2
You can install additional packages into the active environment using the conda install
command.
$ conda install --yes -q bottle
Fetching package metadata: ...
Solving package specifications: .
Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:
The following NEW packages will be INSTALLED:
bottle: 0.12.5-py27_0
Jupyterhub
Jupyterhub is a service for running code notebooks on the cluster, particularly in python. It is a powerful tool for data analysis and prototyping. For more infomation see the Jupyterhub documentation.
Pythons to avoid
Avoid using the system python (/usr/bin/python
). It is intended for OS software and
may not be up to date.
Also avoid the 'python' module (module load python
). This is a minimal install of
python intended for embedding in other modules.