Files
MX_Pmodule/Programming/anaconda
Spencer Bliven 5233ffd6e6 Release Alphafold 2.0.1
- Improves the build system to install the submission scripts in bin
- Add README
2021-10-08 20:57:18 +02:00
..
2019-11-05 15:48:03 +01:00
2021-10-08 20:57:18 +02:00
2020-10-29 09:30:21 +01:00
2020-10-28 11:22:19 +01:00

Building of an Anaconda release using Pmodules

Important Note about access permissions

The central anacoda installations are located below /afs/psi.ch/sys/psi.merlin/Programming/anaconda. All administrators of these installations must be members of the AFS group sys.modules:psi_python in order to have write permission.

Concepts

  • The anaconda module just provides the conda package management tool together with its directory infrastructure which contains conda environments and a cache of downloaded packages
  • Python and user software is provided in conda environments. These environments are located within the directory tree belonging to the anaconda module, e.g. /afs/psi.ch/sys/psi.merlin/Programming/anaconda/2019.07/conda/envs/
  • The software in these environments can be accessed by users through
    1. loading the anaconda module and then using conda activate somemodule_py36
    2. a seperate pmodule that transparently switches to that installed environment by just setting the correct PATH to the python binary.
    3. jupyter installations running from one of the environments which discover the other environments if they contain the correct packages (nb_conda_kernels)
  • The conda tool has frequent updates, and our experience shows that they should be installed. However, it would be a waste to every time produce a new module, because with the new module would also be associated a new area for environments. So, we prefer to update conda in place, and only make a new anaconda module if their are special incentives
  • Most environments are self sufficient and do not depend on the conda tool at all after the instalation: Conda took care of installing all depending libraries, and the builds that conda provides make consistent use of rpath definitions for executables and libraries, i.e. there is no reason to set LD_LIBRARY_PATH at all.
    • There is one important exception: If your environment needs additional setups (activation hooks), then it will rely on the conda activate call, since these hooks are only run inside of this call.

Building a central conda environment

  • Allways work on the host pmod6: conda is trying to use hardlinks where it can. There is an issue that can appear if you install from a machine that uses Auristor (which provides hardlinks within the same AFS volume). Accessing or modifying from an old OpenAFS client can cause problems. This causes whole environments to become corrupt, so that only a PSI AFS admin can fix the problem. Therefore we only install from pmod6 which runs openAFS.

installation of a pure conda environment

In the simplest case, the environment can be created by conda alone. First load the anaconda module to get access to the conda package installer and the install environment.

module load anaconda/2019.07

Define your installation in a conda YAML file and place it inside the buildblock tree

cd buildblocks/Programming/anaconda/2019.07/conda-env-defs
mkdir datascience_36
vim datascience_36/datascience_36.yml

Create the environment

conda env create -f datascience_36/datascience_36.yml

installation of a conda environment and adding pip packages

Frequently there are packages that are not available as conda packages, even though they may exist as PyPi packages. You have two options

  1. install the dependencies using pip
  2. create a conda package based on the PyPi package

In most cases you will want to go ahead with pip installs. However, after running pip inside of a conda environment, the environment is tainted and conda may warn you that it is inconsistent. Therefore conda packages should always be installed first.

Proceed as above by defining a YAML file and use conda to first install all the conda based packages.

Even though the YAML file also allows for the specification of pip packages, I advise to do this step separately. The pip steps can fail for various reasons, and it is better to do them interactively. Describe what you have to do in a README.md inside of the `conda-env-defs/${myenv}** folder.

Note that if pip triggers compilations, the package may pick up shared libraries from outside the environment. This can lead to problems if the build is done on pmod6.psi.ch which runs SL6, while most of the production environments are now on REHL7!

installation of a conda environment and adding source compiled packages

This is still a DRAFT!!!

This works if the python package has a correct setup.py build

  • If you need to apply changes to the source
    • Clone the relevant git repos on github/gitlab
    • implement your changes in a branch
    • document it in conda-env-defs/${myenv}/README.md
  • downlad and store the sources in the install area under /opt/psi/Programming/anaconda/2019.07/xxxx/mypackage
  • Use pip to install them into the environment (requires that the package comes with a correct setup.py)
    cd /opt/psi/Programming/anaconda/2019.07/xxxx/mypackage
    pip install .
    

Building for use with jupyter

Jupyter can discover conda environments if the environment contains the nb_conda_kernels package. Regrettably the environment activation by this package does not run through the full process as compared to the command line functions. The activation scripts that can be placed in an environment do not get executed (q.v. my bug report https://github.com/Anaconda-Platform/nb_conda_kernels/issues/145).

In order to fix this, we must use a workaround which involves modifying the kernel spec configuration file of the environment to wrap the call to the python kernel like in this example for .../envs/mpi-test/share/jupyter/kernels/python3/kernel.json:

     {
         "argv": [
      	"/opt/psi/Programming/anaconda/2019.07/admintools/kernelwrapper.sh",
      	"mpi-test",
      	"/opt/psi/Programming/anaconda/2019.07/conda/envs/mpi-test/bin/python",
      	"-m",
      	"ipykernel_launcher",
      	"-f",
      	"{connection_file}"
      ],
      "display_name": "Python 3",
      "language": "python"
     }

In order to modify an environment easily I wrote a tool found in the admintools of this buildblock which you can use as follows:

files/admintools/install-kernelwrapper.py -v /opt/psi/Programming/anaconda/2019.07/conda/envs/hep_root

Logging of environment activation information

The logging of information for the usage of central environments is important, because it allows to do lifetime management of the conda environments. We can find out whether a particular environment is used at all and by whom, and can so plan the decommissioning.

Logging is currently implemented by an activation hook script that gets placed in each environment. This is done by running the script .../files/admintools/install-env-loggers.sh which will loop over all central environments of this anaconda pmodule.