--- title: Python #tags: last_updated: 28 September 2020 keywords: [python, anaconda, conda, jupyter, numpy] summary: Running Python on Merlin sidebar: merlin6_sidebar permalink: /merlin6/python.html --- PSI provides a variety of ways to execute python code. 2. **Anaconda** - Custom environments for using installation and development 3. **Jupyterhub** - Execute Jupyter notebooks on the cluster 4. **System Python** - Do not use! Only for OS applications. ## Anaconda [Anaconda](https://www.anaconda.com/) ("conda" for short) is a package manager with excellent python integration. Using it you can create isolated environments for each of your python applications, containing exactly the dependencies needed for that app. It is similar to the [virtualenv](http://virtualenv.readthedocs.org/) python package, but can also manage non-python requirements. ### Loading conda Conda is loaded from the module system: ``` module load anaconda ``` ### Using pre-made environments Loading the module provides the `conda` command, but does not otherwise change your environment. First an environment needs to be activated. Available environments can be seen with `conda info --envs` and include many specialized environments for software installs. After activating you should see the environment name in your prompt: ``` ~ $ conda activate datascience_py37 (datascience_py37) ~ $ ``` ### CondaRC file Creating a `~/.condarc` file is recommended if you want to create new environments on merlin. Environments can grow quite large, so you will need to change the default storage location from the default (your home directory) to a larger volume (usually `/data/user/$USER`). Save the following as `$HOME/.condarc`: ``` always_copy: true envs_dirs: - /data/user/$USER/conda/envs pkgs_dirs: - /data/user/$USER/conda/pkgs - $ANACONDA_PREFIX/conda/pkgs channels: - conda-forge - nodefaults ``` Run `conda info` to check that the variables are being set correctly. ### Creating environments We will create an environment named `myenv` which uses an older version of numpy, e.g. to test for backwards compatibility of our code (the `-q` and `--yes` switches are just for not getting prompted and disabling the progress bar). The environment will be created in the default location as defined by the `.condarc` configuration file (see above). ``` ~ $ conda create -q --yes -n 'myenv1' numpy=1.8 scipy ipython Fetching package metadata: ... Solving package specifications: . Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1: The following NEW packages will be INSTALLED: ipython: 2.3.0-py27_0 numpy: 1.8.2-py27_0 openssl: 1.0.1h-1 pip: 1.5.6-py27_0 python: 2.7.8-1 readline: 6.2-2 scipy: 0.14.0-np18py27_0 setuptools: 5.8-py27_0 sqlite: 3.8.4.1-0 system: 5.8-1 tk: 8.5.15-0 zlib: 1.2.7-0 To activate this environment, use: $ source activate myenv1 To deactivate this environment, use: $ source deactivate ``` The created environment contains **just the packages that are needed to satisfy the requirements** and it is local to your installation. The python installation is even independent of the central installation, i.e. your code will still work in such an environment, even if you are offline or AFS is down. However, you need the central installation if you want to use the `conda` command itself. Packages for your new environment will be either copied from the central one into your new environment, or if there are newer packages available from anaconda and you did not specify exactly the version from our central installation, they may get downloaded from the web. **This will require significant space in the `envs_dirs` that you defined in `.condarc`. If you create other environments on the same local disk, they will share the packages using hard links. We can switch to the newly created environment with the `conda activate` command. ``` $ conda activate myenv1 ``` {% include callout.html type="info" content="Note that anaconda's activate/deactivate scripts are compatible with the bash and zsh shells but not with [t]csh." %} Let's test whether we indeed got the desired numpy version: ``` $ python -c 'import numpy as np; print np.version.version' 1.8.2 ``` You can install additional packages into the active environment using the `conda install` command. ``` $ conda install --yes -q bottle Fetching package metadata: ... Solving package specifications: . Package plan for installation in environment /gpfs/home/feichtinger/conda-envs/myenv1: The following NEW packages will be INSTALLED: bottle: 0.12.5-py27_0 ``` ## Jupyterhub Jupyterhub is a service for running code notebooks on the cluster, particularly in python. It is a powerful tool for data analysis and prototyping. For more infomation see the [Jupyterhub documentation]({{"jupyterhub.html"}}). ## Pythons to avoid Avoid using the system python (`/usr/bin/python`). It is intended for OS software and may not be up to date. Also avoid the 'python' module (`module load python`). This is a minimal install of python intended for embedding in other modules.