192 lines
6.1 KiB
Markdown
192 lines
6.1 KiB
Markdown
---
|
|
title: Python
|
|
#tags:
|
|
last_updated: 28 September 2020
|
|
keywords: [python, anaconda, conda, jupyter, numpy]
|
|
summary: Running Python on Merlin
|
|
sidebar: merlin6_sidebar
|
|
permalink: /merlin6/python.html
|
|
---
|
|
|
|
PSI provides a variety of ways to execute python code.
|
|
|
|
1. **psi-python modules** - Central installation with common packages pre-installed
|
|
2. **Anaconda** - Custom environments for using installation and development
|
|
3. **Jupyterhub** - Execute Jupyter notebooks on the cluster
|
|
4. **System Python** - Do not use! Only for OS applications.
|
|
|
|
## `psi-python` modules
|
|
|
|
The easiest way to use python is using the centrally maintained psi-python modules:
|
|
|
|
```
|
|
~ $ module avail psi-python
|
|
------------------------------------- Programming: ------------------------------
|
|
|
|
psi-python27/2.3.0 psi-python27/2.2.0 psi-python27/2.4.1
|
|
psi-python27/4.4.0 psi-python34/2.1.0 psi-python35/4.2.0
|
|
psi-python36/4.4.0
|
|
|
|
~ $ module load psi-python36/4.4.0
|
|
~ $ python --version
|
|
Python 3.6.1 :: Anaconda 4.4.0 (64-bit)
|
|
```
|
|
|
|
These include over 250 common packages from the
|
|
[Anaconda](https://docs.anaconda.com/anaconda/) software distribution, including
|
|
numpy, pandas, requests, flask, hdf5, and more.
|
|
|
|
{% include callout.html type="warning" content="
|
|
**Caution**{: .text-warning}
|
|
Do not use `module load python`. These modules are minimal installs intended as
|
|
dependencies for other modules that embed python.
|
|
"%}
|
|
|
|
## Anaconda
|
|
|
|
[Anaconda](https://www.anaconda.com/) ("conda" for short) is a package manager with
|
|
excellent python integration. Using it you can create isolated environments for each
|
|
of your python applications, containing exactly the dependencies needed for that app.
|
|
It is similar to the [virtualenv](http://virtualenv.readthedocs.org/) python package,
|
|
but can also manage non-python requirements.
|
|
|
|
### Loading conda
|
|
|
|
Conda is loaded from the module system:
|
|
|
|
```
|
|
module load anaconda
|
|
```
|
|
|
|
### Using pre-made environments
|
|
|
|
Loading the module provides the `conda` command, but does not otherwise change your
|
|
environment. First an environment needs to be activated. Available environments can
|
|
be seen with `conda info --envs` and include many specialized environments for
|
|
software installs. After activating you should see the environment name in your
|
|
prompt:
|
|
|
|
```
|
|
~ $ conda activate datascience_py37
|
|
(datascience_py37) ~ $
|
|
```
|
|
|
|
### CondaRC file
|
|
|
|
Creating a `~/.condarc` file is recommended if you want to create new environments on
|
|
merlin. Environments can grow quite large, so you will need to change the default
|
|
storage location from the default (your home directory) to a larger volume (usually
|
|
`/data/user/$USER`).
|
|
|
|
Save the following as `$HOME/.condarc` (update USERNAME and module version as
|
|
necessary):
|
|
|
|
```
|
|
always_copy: true
|
|
|
|
envs_dirs:
|
|
- /data/user/USERNAME/conda/envs
|
|
|
|
pkgs_dirs:
|
|
- /data/user/USERNAME/conda/pkgs
|
|
- /opt/psi/Programming/anaconda/2019.07/conda/pkgs
|
|
|
|
channels:
|
|
- conda-forge
|
|
- defaults
|
|
```
|
|
|
|
Run `conda info` to check that the variables are being set correctly.
|
|
|
|
### Creating environments
|
|
|
|
We will create an environment named `myenv` which uses an older version of numpy, e.g. to test for backwards compatibility of our code (the `-q` and `--yes` switches are just for not getting prompted and disabling the progress bar). The environment will be created in the default location as defined by the `.condarc` configuration file (see above).
|
|
|
|
```
|
|
~ $ conda create -q --yes -n 'myenv1' numpy=1.8 scipy ipython
|
|
|
|
Fetching package metadata: ...
|
|
Solving package specifications: .
|
|
Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:
|
|
|
|
The following NEW packages will be INSTALLED:
|
|
|
|
ipython: 2.3.0-py27_0
|
|
numpy: 1.8.2-py27_0
|
|
openssl: 1.0.1h-1
|
|
pip: 1.5.6-py27_0
|
|
python: 2.7.8-1
|
|
readline: 6.2-2
|
|
scipy: 0.14.0-np18py27_0
|
|
setuptools: 5.8-py27_0
|
|
sqlite: 3.8.4.1-0
|
|
system: 5.8-1
|
|
tk: 8.5.15-0
|
|
zlib: 1.2.7-0
|
|
|
|
To activate this environment, use:
|
|
$ source activate myenv1
|
|
|
|
To deactivate this environment, use:
|
|
$ source deactivate
|
|
```
|
|
|
|
The created environment contains **just the packages that are needed to satisfy the
|
|
requirements** and it is local to your installation. The python installation is even
|
|
independent of the central installation, i.e. your code will still work in such an
|
|
environment, even if you are offline or AFS is down. However, you need the central
|
|
installation if you want to use the `conda` command itself.
|
|
|
|
Packages for your new environment will be either copied from the central one into
|
|
your new environment, or if there are newer packages available from anaconda and you
|
|
did not specify exactly the version from our central installation, they may get
|
|
downloaded from the web. **This will require significant space in the `envs_dirs`
|
|
that you defined in `.condarc`. If you create other environments on the same local
|
|
disk, they will share the packages using hard links.
|
|
|
|
We can switch to the newly created environment with the `conda activate` command.
|
|
|
|
```
|
|
$ conda activate myenv1
|
|
```
|
|
|
|
{% include callout.html type="info" content="Note that anaconda's activate/deactivate
|
|
scripts are compatible with the bash and zsh shells but not with [t]csh." %}
|
|
|
|
Let's test whether we indeed got the desired numpy version:
|
|
|
|
```
|
|
$ python -c 'import numpy as np; print np.version.version'
|
|
|
|
1.8.2
|
|
```
|
|
|
|
You can install additional packages into the active environment using the `conda
|
|
install` command.
|
|
|
|
```
|
|
$ conda install --yes -q bottle
|
|
|
|
Fetching package metadata: ...
|
|
Solving package specifications: .
|
|
Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1:
|
|
|
|
The following NEW packages will be INSTALLED:
|
|
|
|
bottle: 0.12.5-py27_0
|
|
```
|
|
|
|
## Jupyterhub
|
|
|
|
Jupyterhub is a service for running code notebooks on the cluster, particularly in
|
|
python. It is a powerful tool for data analysis and prototyping. For more infomation
|
|
see the [Jupyterhub documentation]({{"jupyterhub.html"}}).
|
|
|
|
## Pythons to avoid
|
|
|
|
Avoid using the system python (`/usr/bin/python`). It is intended for OS software and
|
|
may not be up to date.
|
|
|
|
Also avoid the 'python' module (`module load python`). This is a minimal install of
|
|
python intended for embedding in other modules.
|