public release 3.0.0 - see README and CHANGES for details

2021-02-09 12:46:20 +01:00
parent 2b3dbd8bac
commit ef781e2db4
46 changed files with 4390 additions and 1655 deletions
--- a/.gitlab-ci.yml
+++ b/.gitlab-ci.yml
@ -1,6 +1,7 @@
 pages:
  stage: deploy
  script:
+  - ~/miniconda3/bin/activate pmsco
  - make docs
  - mv docs/html/ public/
  artifacts:
@ -10,4 +11,4 @@ pages:
  - master
  tags:
  - doxygen
-  
+  
--- a/CHANGES.md
+++ b/CHANGES.md
@ -1,3 +1,28 @@
+Release 3.0.0 (2021-02-01)
+==========================
+
+| Hash | Date | Description |
+| ---- | ---- | ----------- |
+| 72a9f38 | 2021-02-06 | introduce run file based job scheduling |
+| 42e12d8 | 2021-02-05 | compatibility with recent conda and singularity versions |
+| caf9f43 | 2021-02-03 | installation: include plantuml.jar |
+| 574c88a | 2021-02-01 | docs: replace doxypy by doxypypy |
+| a5cb831 | 2021-02-05 | redefine output_file property |
+| 49dbb89 | 2021-01-27 | documentation of run file interface |
+| 940d9ae | 2021-01-07 | introduce run file interface |
+| 6950f98 | 2021-02-05 | set legacy fortran for compatibility with recent compiler |
+| 28d8bc9 | 2021-01-27 | graphics: fixed color range for modulation functions |
+| 1382508 | 2021-01-16 | cluster: build_element accepts symbol or number |
+| 53508b7 | 2021-01-06 | graphics: swarm plot |
+| 4a24163 | 2021-01-05 | graphics: genetic chart |
+| 99e9782 | 2020-12-23 | periodic table: use common binding energies in condensed matter XPS |
+| fdfcf90 | 2020-12-23 | periodic table: reformat bindingenergy.json, add more import/export functions |
+| 13cf90f | 2020-12-21 | hbnni: parameters for xpd demo with two domains |
+| 680edb4 | 2020-12-21 | documentation: update documentation of optimizers |
+| d909469 | 2020-12-18 | doc: update top components diagram (pmsco module is entry point) |
+| 574993e | 2020-12-09 | spectrum: add plot cross section function |
+
+
 Release 2.2.0 (2020-09-04)
 ==========================

--- a/README.md
+++ b/README.md
@ -6,7 +6,7 @@ It is a collection of computer programs to calculate photoelectron diffraction p
 and to optimize structural models based on measured data.

 The actual scattering calculation is done by code developed by other parties.
-PMSCO wraps around that program and facilitates parameter handling, cluster building, structural optimization and parallel processing.
+PMSCO wraps around those programs and facilitates parameter handling, cluster building, structural optimization and parallel processing.
 In the current version, the [EDAC](http://garciadeabajos-group.icfo.es/widgets/edac/) code
 developed by F. J. García de Abajo, M. A. Van Hove, and C. S. Fadley (1999) is used for scattering calculations.
 Instead of EDAC built-in routines, alternatively,
@ -20,11 +20,12 @@ Highlights
 - various scanning modes including energy, manipulator angle (polar/azimuthal), emission angle.
 - averaging over multiple domains and emitters.
 - global optimization of multiple scans.
- structural optimization algorithms: particle swarm optimization, grid search, gradient search.
+- structural optimization algorithms: particle swarm optimization, genetic algorithm, grid scan, table scan.
+- detailed reports and graphs of result files.
 - calculation of the modulation function.
 - calculation of the weighted R-factor.
 - automatic parallel processing using OpenMPI.
- tested on Linux cluster machines.
+- compatible with Slurm resource manager on Linux cluster machines.


 Installation
@ -39,13 +40,12 @@ The code requires about 2 GB of RAM per process.

 Detailed installation instructions and dependencies can be found in the documentation
 (docs/src/installation.dox).
-A [Doxygen](http://www.stack.nl/~dimitri/doxygen/index.html) compiler with Doxypy is required to generate the documentation in HTML or LaTeX format.
+A [Doxygen](http://www.stack.nl/~dimitri/doxygen/index.html) compiler with Doxypypy is required to generate the documentation in HTML format.

-The easiest way to set up an environment with all dependencies and without side-effects on other installed software is to use a [Singularity](https://www.sylabs.io/guides/2.5/user-guide/index.html) container.
-A Singularity recipe file is part of the distribution, see the PMSCO documentation for details.
-On newer Linux systems (e.g. Ubuntu 18.04), Singularity is available from the package manager. 
-Installation in a [virtual box](https://www.virtualbox.org/) on Windows or Mac is straightforward using the [Vagrant](https://www.vagrantup.com/) system.
-A Vagrant file is included in the distribution.
+The easiest way to set up an environment with all dependencies and without side-effects on other installed software is to use a [Singularity](https://www.sylabs.io/guides/3.7/user-guide/index.html) container.
+A Singularity recipe file is part of the distribution, see the PMSCO documentation for details, Singularity must be installed separately.
+Installation in a [virtual box](https://www.virtualbox.org/) on Windows or Mac is straightforward using pre-compiled images with [Vagrant](https://www.vagrantup.com/).
+A Vagrant definition file is included in the distribution.

 The public distribution of PMSCO does not contain the [EDAC](http://garciadeabajos-group.icfo.es/widgets/edac/) code.
 Please obtain the EDAC source code from the original author, copy it to the pmsco/edac directory, and apply the edac_all.patch patch.
@ -70,7 +70,7 @@ Matthias Muntwiler, <mailto:matthias.muntwiler@psi.ch>
 Copyright
 ---------

-Copyright 2015-2020 by [Paul Scherrer Institut](http://www.psi.ch)
+Copyright 2015-2021 by [Paul Scherrer Institut](http://www.psi.ch)


 Release Notes
@ -78,6 +78,22 @@ Release Notes

 For a detailed list of changes, see the CHANGES.md file.

+3.0.0 (2021-02-08)
+------------------
+
+- Run file interface replaces command line arguments:
+  - Specify all run-time parameters in a JSON-formatted text file.
+  - Override any public attribute of the project class.
+  - Only the name of the run file is needed on the command line.
+- The command line interface is still available, some default values and the handling of directory paths have changed.
+  Check your code for compatibility.
+- Integrated job scheduling with the Slurm resource manager:
+  - Declare all job arguments in the run file and have PMSCO submit the job.
+- Graphics scripts for genetic chart and swarm population (experimental feature).
+- Update for compatibility with recent Ubuntu (20.04), Anaconda (4.8) and Singularity (3.7).
+- Drop compatibility with Python 2.7, minimum requirement is Python 3.6.
+
+
 2.2.0 (2020-09-04)
 ------------------

--- a/bin/pmsco.ra-git.template
+++ b/bin/pmsco.ra-git.template
@ -1,136 +0,0 @@
-#!/bin/bash
-#
-# Slurm script template for PMSCO calculations on the Ra cluster
-# based on run_mpi_HPL_nodes-2.sl by V. Markushin 2016-03-01
-#
-# this version checks out the source code from a git repository
-# to a temporary location and compiles the code.
-# this is to minimize conflicts between different jobs
-# but requires that each job has its own git commit.
-#
-# Use:
-# - enter the appropriate parameters and save as a new file.
-# - call the sbatch command to pass the job script.
-#   request a specific number of nodes and tasks.
-#   example:
-#   sbatch --nodes=2  --ntasks-per-node=24 --time=02:00:00 run_pmsco.sl
-# the qpmsco script does all this for you.
-#
-# PMSCO arguments
-# copy this template to a new file, and set the arguments
-#
-# PMSCO_WORK_DIR
-#   path to be used as working directory.
-#   contains the script derived from this template
-#   and a copy of the pmsco code in the 'pmsco' directory.
-#   receives output and temporary files.
-#
-# PMSCO_PROJECT_FILE
-#   python module that declares the project and starts the calculation.
-#   must include the file path relative to $PMSCO_WORK_DIR.
-#
-# PMSCO_OUT
-#   name of output file. should not include a path.
-#
-# all paths are relative to $PMSCO_WORK_DIR or (better) absolute.
-#
-#
-# Further arguments
-#
-# PMSCO_JOBNAME (required)
-#   the job name is the base name for output files.
-#
-# PMSCO_WALLTIME_HR (integer, required)
-#   wall time limit in hours. must be integer, minimum 1.
-#   this value is passed to PMSCO.
-#   it should specify the same amount of wall time as requested from the scheduler.
-#
-# PMSCO_PROJECT_ARGS (optional)
-#   extra arguments that are parsed by the project module.
-#
-#SBATCH --job-name="_PMSCO_JOBNAME"
-#SBATCH --output="_PMSCO_JOBNAME.o.%j"
-#SBATCH --error="_PMSCO_JOBNAME.e.%j"
-
-PMSCO_WORK_DIR="_PMSCO_WORK_DIR"
-PMSCO_JOBNAME="_PMSCO_JOBNAME"
-PMSCO_WALLTIME_HR=_PMSCO_WALLTIME_HR
-
-PMSCO_PROJECT_FILE="_PMSCO_PROJECT_FILE"
-PMSCO_OUT="_PMSCO_JOBNAME"
-PMSCO_PROJECT_ARGS="_PMSCO_PROJECT_ARGS"
-
-module load psi-python36/4.4.0
-module load gcc/4.8.5
-module load openmpi/3.1.3
-source activate pmsco3
-
-echo '================================================================================'
-echo "=== Running $0 at the following time and place:"
-date
-/bin/hostname
-cd $PMSCO_WORK_DIR
-pwd
-ls -lA
-#the intel compiler is currently not compatible with mpi4py. -mm 170131
-#echo
-#echo '================================================================================'
-#echo "=== Setting the environment to use Intel Cluster Studio XE 2016 Update 2 intel/16.2:"
-#cmd="source /opt/psi/Programming/intel/16.2/bin/compilervars.sh intel64"
-#echo $cmd
-#$cmd
-echo
-echo '================================================================================'
-echo "=== The environment is set as following:"
-env
-echo
-echo '================================================================================'
-echo "BEGIN test"
-which mpirun
-cmd="mpirun /bin/hostname"
-echo $cmd
-$cmd
-echo "END test"
-echo
-echo '================================================================================'
-echo "BEGIN mpirun pmsco"
-echo
-
-cd "$PMSCO_WORK_DIR"
-cd pmsco
-echo "code revision"
-git log --pretty=tformat:'%h %ai %d' -1
-make -C pmsco all
-python -m compileall pmsco
-python -m compileall projects
-echo
-
-cd "$PMSCO_WORK_DIR"
-PMSCO_CMD="python pmsco/pmsco $PMSCO_PROJECT_FILE"
-PMSCO_ARGS="$PMSCO_PROJECT_ARGS"
-if [ -n "$PMSCO_SCAN_FILES" ]; then
-    PMSCO_ARGS="-s $PMSCO_SCAN_FILES $PMSCO_ARGS"
-fi
-if [ -n "$PMSCO_OUT" ]; then
-    PMSCO_ARGS="-o $PMSCO_OUT $PMSCO_ARGS"
-fi
-if [ "$PMSCO_WALLTIME_HR" -ge 1 ]; then
-    PMSCO_ARGS="-t $PMSCO_WALLTIME_HR $PMSCO_ARGS"
-fi
-if [ -n "$PMSCO_LOGLEVEL" ]; then
-    PMSCO_ARGS="--log-level $PMSCO_LOGLEVEL --log-file $PMSCO_JOBNAME.log $PMSCO_ARGS"
-fi
-
-# Do no use the OpenMPI specific options, like "-x LD_LIBRARY_PATH", with the Intel mpirun.
-cmd="mpirun $PMSCO_CMD $PMSCO_ARGS"
-echo $cmd
-$cmd
-echo "END mpirun pmsco"
-echo '================================================================================'
-cd "$PMSCO_WORK_DIR"
-rm -rf pmsco
-date
-ls -lAtr
-echo '================================================================================'
-
-exit 0
--- a/bin/pmsco.ra.template
+++ b/bin/pmsco.ra.template
@ -1,157 +0,0 @@
-#!/bin/bash
-#
-# Slurm script template for PMSCO calculations on the Ra cluster
-# based on run_mpi_HPL_nodes-2.sl by V. Markushin 2016-03-01
-#
-# Use:
-# - enter the appropriate parameters and save as a new file.
-# - call the sbatch command to pass the job script.
-#   request a specific number of nodes and tasks.
-#   example:
-#   sbatch --nodes=2  --ntasks-per-node=24 --time=02:00:00 run_pmsco.sl
-#
-# PMSCO arguments
-# copy this template to a new file, and set the arguments
-#
-# PMSCO_WORK_DIR
-#   path to be used as working directory.
-#   contains the script derived from this template.
-#   receives output and temporary files.
-#
-# PMSCO_PROJECT_FILE
-#   python module that declares the project and starts the calculation.
-#   must include the file path relative to $PMSCO_WORK_DIR.
-#
-# PMSCO_SOURCE_DIR
-#   path to the pmsco source directory
-#   (the directory which contains the bin, lib, pmsco sub-directories)
-#
-# PMSCO_SCAN_FILES
-#   list of scan files.
-#
-# PMSCO_OUT
-#   name of output file. should not include a path.
-#
-# all paths are relative to $PMSCO_WORK_DIR or (better) absolute.
-#
-#
-# Further arguments
-#
-# PMSCO_JOBNAME (required)
-#   the job name is the base name for output files.
-#
-# PMSCO_WALLTIME_HR (integer, required)
-#   wall time limit in hours. must be integer, minimum 1.
-#   this value is passed to PMSCO.
-#   it should specify the same amount of wall time as requested from the scheduler.
-#
-# PMSCO_MODE (optional)
-#   calculation mode: single, swarm, grid, gradient
-#
-# PMSCO_CODE (optional)
-#   calculation code: edac, msc, test
-#
-# PMSCO_LOGLEVEL (optional)
-#   request log level: DEBUG, INFO, WARNING, ERROR
-#   create a log file based on the job name.
-#
-# PMSCO_PROJECT_ARGS (optional)
-#   extra arguments that are parsed by the project module.
-#
-#SBATCH --job-name="_PMSCO_JOBNAME"
-#SBATCH --output="_PMSCO_JOBNAME.o.%j"
-#SBATCH --error="_PMSCO_JOBNAME.e.%j"
-
-PMSCO_WORK_DIR="_PMSCO_WORK_DIR"
-PMSCO_JOBNAME="_PMSCO_JOBNAME"
-PMSCO_WALLTIME_HR=_PMSCO_WALLTIME_HR
-
-PMSCO_PROJECT_FILE="_PMSCO_PROJECT_FILE"
-PMSCO_MODE="_PMSCO_MODE"
-PMSCO_CODE="_PMSCO_CODE"
-PMSCO_SOURCE_DIR="_PMSCO_SOURCE_DIR"
-PMSCO_SCAN_FILES="_PMSCO_SCAN_FILES"
-PMSCO_OUT="_PMSCO_JOBNAME"
-PMSCO_LOGLEVEL="_PMSCO_LOGLEVEL"
-PMSCO_PROJECT_ARGS="_PMSCO_PROJECT_ARGS"
-
-module load psi-python36/4.4.0
-module load gcc/4.8.5
-module load openmpi/3.1.3
-source activate pmsco3
-
-echo '================================================================================'
-echo "=== Running $0 at the following time and place:"
-date
-/bin/hostname
-cd $PMSCO_WORK_DIR
-pwd
-ls -lA
-#the intel compiler is currently not compatible with mpi4py. -mm 170131
-#echo
-#echo '================================================================================'
-#echo "=== Setting the environment to use Intel Cluster Studio XE 2016 Update 2 intel/16.2:"
-#cmd="source /opt/psi/Programming/intel/16.2/bin/compilervars.sh intel64"
-#echo $cmd
-#$cmd
-echo
-echo '================================================================================'
-echo "=== The environment is set as following:"
-env
-echo
-echo '================================================================================'
-echo "BEGIN test"
-echo "=== Intel native mpirun will get the number of nodes and the machinefile from Slurm"
-which mpirun
-cmd="mpirun /bin/hostname"
-echo $cmd
-$cmd
-echo "END test"
-echo
-echo '================================================================================'
-echo "BEGIN mpirun pmsco"
-echo "Intel native mpirun will get the number of nodes and the machinefile from Slurm"
-echo
-echo "code revision"
-cd "$PMSCO_SOURCE_DIR"
-git log --pretty=tformat:'%h %ai %d' -1
-python -m compileall pmsco
-python -m compileall projects
-cd "$PMSCO_WORK_DIR"
-echo
-
-PMSCO_CMD="python $PMSCO_SOURCE_DIR/pmsco $PMSCO_PROJECT_FILE"
-PMSCO_ARGS="$PMSCO_PROJECT_ARGS"
-if [ -n "$PMSCO_SCAN_FILES" ]; then
-    PMSCO_ARGS="-s $PMSCO_SCAN_FILES $PMSCO_ARGS"
-fi
-if [ -n "$PMSCO_CODE" ]; then
-    PMSCO_ARGS="-c $PMSCO_CODE $PMSCO_ARGS"
-fi
-if [ -n "$PMSCO_MODE" ]; then
-    PMSCO_ARGS="-m $PMSCO_MODE $PMSCO_ARGS"
-fi
-if [ -n "$PMSCO_OUT" ]; then
-    PMSCO_ARGS="-o $PMSCO_OUT $PMSCO_ARGS"
-fi
-if [ "$PMSCO_WALLTIME_HR" -ge 1 ]; then
-    PMSCO_ARGS="-t $PMSCO_WALLTIME_HR $PMSCO_ARGS"
-fi
-if [ -n "$PMSCO_LOGLEVEL" ]; then
-    PMSCO_ARGS="--log-level $PMSCO_LOGLEVEL --log-file $PMSCO_JOBNAME.log $PMSCO_ARGS"
-fi
-
-which mpirun
-ls -l "$PMSCO_SOURCE_DIR"
-ls -l "$PMSCO_PROJECT_FILE"
-# Do no use the OpenMPI specific options, like "-x LD_LIBRARY_PATH", with the Intel mpirun.
-cmd="mpirun $PMSCO_CMD $PMSCO_ARGS"
-echo $cmd
-$cmd
-echo "END mpirun pmsco"
-echo '================================================================================'
-date
-ls -lAtr
-echo '================================================================================'
-
-exit 0
--- a/bin/pmsco.sge.template
+++ b/bin/pmsco.sge.template
@ -1,178 +0,0 @@
-#!/bin/bash
-#
-# SGE script template for MSC calculations
-#
-# This script uses the tight integration of openmpi-1.4.5-gcc-4.6.3 in SGE
-# using the parallel environment (PE) "orte".
-# This script must be used only with qsub command - do NOT run it as a stand-alone
-# shell script because it will start all processes on the local node.
-#
-# PhD arguments
-# copy this template to a new file, and set the arguments
-#
-# PHD_WORK_DIR
-#   path to be used as working directory.
-#   contains the SGE script derived from this template.
-#   receives output and temporary files.
-#
-# PHD_PROJECT_FILE
-#   python module that declares the project and starts the calculation.
-#   must include the file path relative to $PHD_WORK_DIR.
-#
-# PHD_SOURCE_DIR
-#   path to the pmsco source directory
-#   (the directory which contains the bin, lib, pmsco sub-directories)
-#
-# PHD_SCAN_FILES
-#   list of scan files.
-#
-# PHD_OUT
-#   name of output file. should not include a path.
-#
-# all paths are relative to $PHD_WORK_DIR or (better) absolute.
-#
-#
-# Further arguments
-#
-# PHD_JOBNAME (required)
-#   the job name is the base name for output files.
-#
-# PHD_NODES (required)
-#   number of computing nodes (processes) to allocate for the job.
-#
-# PHD_WALLTIME_HR (required)
-#   wall time limit (hours)
-#
-# PHD_WALLTIME_MIN (required)
-#   wall time limit (minutes)
-#
-# PHD_MODE (optional)
-#   calculation mode: single, swarm, grid, gradient
-#
-# PHD_CODE (optional)
-#   calculation code: edac, msc, test
-#
-# PHD_LOGLEVEL (optional)
-#   request log level: DEBUG, INFO, WARNING, ERROR
-#   create a log file based on the job name.
-#
-# PHD_PROJECT_ARGS (optional)
-#   extra arguments that are parsed by the project module.
-#
-
-PHD_WORK_DIR="_PHD_WORK_DIR"
-PHD_JOBNAME="_PHD_JOBNAME"
-PHD_NODES=_PHD_NODES
-PHD_WALLTIME_HR=_PHD_WALLTIME_HR
-PHD_WALLTIME_MIN=_PHD_WALLTIME_MIN
-
-PHD_PROJECT_FILE="_PHD_PROJECT_FILE"
-PHD_MODE="_PHD_MODE"
-PHD_CODE="_PHD_CODE"
-PHD_SOURCE_DIR="_PHD_SOURCE_DIR"
-PHD_SCAN_FILES="_PHD_SCAN_FILES"
-PHD_OUT="_PHD_JOBNAME"
-PHD_LOGLEVEL="_PHD_LOGLEVEL"
-PHD_PROJECT_ARGS="_PHD_PROJECT_ARGS"
-
-# Define your job name, parallel environment with the number of slots, and run time:
-#$ -cwd
-#$ -N _PHD_JOBNAME.job
-#$ -pe orte _PHD_NODES
-#$ -l ram=2G
-#$ -l s_rt=_PHD_WALLTIME_HR:_PHD_WALLTIME_MIN:00
-#$ -l h_rt=_PHD_WALLTIME_HR:_PHD_WALLTIME_MIN:30
-#$ -V
-
-###################################################
-# Fix the SGE environment-handling bug (bash):
-source /usr/share/Modules/init/sh
-export -n -f module
-
-# Load the environment modules for this job (the order may be important):
-module load python/python-2.7.5
-module load gcc/gcc-4.6.3
-module load mpi/openmpi-1.4.5-gcc-4.6.3
-module load blas/blas-20110419-gcc-4.6.3
-module load lapack/lapack-3.4.2-gcc-4.6.3
-export LD_LIBRARY_PATH=$PHD_SOURCE_DIR/lib/:$LD_LIBRARY_PATH
-
-###################################################
-# Set the environment variables:
-MPIEXEC=$OPENMPI/bin/mpiexec
-# OPENMPI is set by the mpi/openmpi-* module.
-
-export OMP_NUM_THREADS=1
-export OMPI_MCA_btl='openib,sm,self'
-# export OMPI_MCA_orte_process_binding=core
-
-##############
-# BEGIN DEBUG
-# Print the SGE environment on master host:
-echo "================================================================"
-echo "=== SGE job  JOB_NAME=$JOB_NAME  JOB_ID=$JOB_ID"
-echo "================================================================"
-echo DATE=`date`
-echo HOSTNAME=`hostname`
-echo PWD=`pwd`
-echo "NSLOTS=$NSLOTS"
-echo "PE_HOSTFILE=$PE_HOSTFILE"
-cat $PE_HOSTFILE
-echo "================================================================"
-echo "Running environment:"
-env
-echo "================================================================"
-echo "Loaded environment modules:"
-module list 2>&1
-echo
-# END DEBUG
-##############
-
-##############
-# Setup
-cd "$PHD_SOURCE_DIR"
-python -m compileall .
-
-cd "$PHD_WORK_DIR"
-ulimit -c 0
-
-###################################################
-# The command to run with mpiexec:
-CMD="python $PHD_PROJECT_FILE"
-ARGS="$PHD_PROJECT_ARGS"
-
-if [ -n "$PHD_SCAN_FILES" ]; then
-    ARGS="-s $PHD_SCAN_FILES -- $ARGS"
-fi
-
-if [ -n "$PHD_CODE" ]; then
-    ARGS="-c $PHD_CODE $ARGS"
-fi
-
-if [ -n "$PHD_MODE" ]; then
-    ARGS="-m $PHD_MODE $ARGS"
-fi
-
-if [ -n "$PHD_OUT" ]; then
-    ARGS="-o $PHD_OUT $ARGS"
-fi
-
-if [ "$PHD_WALLTIME_HR" -ge 1 ]
-then
-    ARGS="-t $PHD_WALLTIME_HR $ARGS"
-else
-    ARGS="-t 0.5 $ARGS"
-fi
-
-if [ -n "$PHD_LOGLEVEL" ]; then
-    ARGS="--log-level $PHD_LOGLEVEL --log-file $PHD_JOBNAME.log $ARGS"
-fi
-
-# The MPI command to run:
-MPICMD="$MPIEXEC --prefix $OPENMPI -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -x OMPI_MCA_btl -np $NSLOTS $CMD $ARGS"
-echo "Command to run:"
-echo "$MPICMD"
-echo
-exec $MPICMD
-
-exit 0
--- a/bin/qpmsco.ra-git.sh
+++ b/bin/qpmsco.ra-git.sh
@ -1,145 +0,0 @@
-#!/bin/sh
-#
-# submission script for PMSCO calculations on the Ra cluster
-#
-# this version clones the current git repository at HEAD to the work directory.
-# thus, version conflicts between jobs are avoided.
-#
-
-if [ $# -lt 1 ]; then
-  echo "Usage: $0 [NOSUB] GIT_TAG DESTDIR JOBNAME NODES TASKS_PER_NODE WALLTIME:HOURS PROJECT [ARGS [ARGS [...]]]"
-  echo ""
-  echo "       NOSUB (optional): do not submit the script to the queue. default: submit."
-  echo "       GIT_TAG: git tag or branch name of the code. HEAD for current code."
-  echo "       DESTDIR: destination directory. must exist. a sub-dir \$JOBNAME is created."
-  echo "       JOBNAME (text): name of job. use only alphanumeric characters, no spaces."
-  echo "       NODES (integer): number of computing nodes. (1 node = 24 or 32 processors)."
-  echo "          do not specify more than 2."
-  echo "       TASKS_PER_NODE (integer): 1...24, or 32."
-  echo "          24 or 32 for full-node allocation."
-  echo "          1...23 for shared node allocation."
-  echo "       WALLTIME:HOURS (integer): requested wall time."
-  echo "          1...24 for day partition"
-  echo "          24...192 for week partition"
-  echo "          1...192 for shared partition"
-  echo "       PROJECT: python module (file path) that declares the project and starts the calculation."
-  echo "       ARGS (optional): any number of further PMSCO or project arguments (except time)."
-  echo ""
-  echo "the job script is written to \$DESTDIR/\$JOBNAME which is also the destination of calculation output."
-  exit 1
-fi
-
-# location of the pmsco package is derived from the path of this script
-SCRIPTDIR="$(dirname $(readlink -f $0))"
-SOURCEDIR="$(readlink -f $SCRIPTDIR/..)"
-PMSCO_SOURCE_DIR="$SOURCEDIR"
-
-# read arguments
-if [ "$1" == "NOSUB" ]; then
-  NOSUB="true"
-  shift
-else
-  NOSUB="false"
-fi
-
-if [ "$1" == "HEAD" ]; then
-    BRANCH_ARG=""
-else
-    BRANCH_ARG="-b $1"
-fi
-shift
-
-DEST_DIR="$1"
-shift
-
-PMSCO_JOBNAME=$1
-shift
-
-PMSCO_NODES=$1
-PMSCO_TASKS_PER_NODE=$2
-PMSCO_TASKS=$(expr $PMSCO_NODES \* $PMSCO_TASKS_PER_NODE)
-shift 2
-
-PMSCO_WALLTIME_HR=$1
-PMSCO_WALLTIME_MIN=$(expr $PMSCO_WALLTIME_HR \* 60)
-shift
-
-# select partition
-if [ $PMSCO_WALLTIME_HR -ge 25 ]; then
-    PMSCO_PARTITION="week"
-else
-    PMSCO_PARTITION="day"
-fi
-if [ $PMSCO_TASKS_PER_NODE -lt 24 ]; then
-    PMSCO_PARTITION="shared"
-fi
-
-PMSCO_PROJECT_FILE="$(readlink -f $1)"
-shift
-
-PMSCO_PROJECT_ARGS="$*"
-
-# set up working directory
-cd "$DEST_DIR"
-if [ ! -d "$PMSCO_JOBNAME" ]; then
-    mkdir "$PMSCO_JOBNAME"
-fi
-cd "$PMSCO_JOBNAME"
-WORKDIR="$(pwd)"
-PMSCO_WORK_DIR="$WORKDIR"
-
-# copy code
-PMSCO_SOURCE_REPO="file://$PMSCO_SOURCE_DIR"
-echo "$PMSCO_SOURCE_REPO"
-
-cd "$PMSCO_WORK_DIR"
-git clone $BRANCH_ARG --single-branch --depth 1 $PMSCO_SOURCE_REPO pmsco || exit
-cd pmsco
-PMSCO_REV=$(git log --pretty=format:"%h, %ai" -1) || exit
-cd "$WORKDIR"
-echo "$PMSCO_REV" > revision.txt
-
-# generate job script from template
-sed -e "s:_PMSCO_WORK_DIR:$PMSCO_WORK_DIR:g" \
-    -e "s:_PMSCO_JOBNAME:$PMSCO_JOBNAME:g" \
-    -e "s:_PMSCO_NODES:$PMSCO_NODES:g" \
-    -e "s:_PMSCO_WALLTIME_HR:$PMSCO_WALLTIME_HR:g" \
-    -e "s:_PMSCO_PROJECT_FILE:$PMSCO_PROJECT_FILE:g" \
-    -e "s:_PMSCO_PROJECT_ARGS:$PMSCO_PROJECT_ARGS:g" \
-    "$SCRIPTDIR/pmsco.ra-git.template" > $PMSCO_JOBNAME.job
-
-chmod u+x "$PMSCO_JOBNAME.job" || exit
-
-# request nodes and tasks
-#
-# The option --ntasks-per-node is meant to be used with the --nodes option.
-# (For the --ntasks option, the default is one task per node, use the --cpus-per-task option to change this default.)
-#
-# sbatch options
-# --cores-per-socket=16
-#   32 cores per node
-# --partition=[shared|day|week]
-# --time=8-00:00:00
-#   override default time limit (2 days in long queue)
-#   time formats: "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes", "days-hours:minutes:seconds"
-# --mail-type=ALL
-# --test-only
-#   check script but do not submit
-#
-SLURM_ARGS="--nodes=$PMSCO_NODES --ntasks-per-node=$PMSCO_TASKS_PER_NODE"
-
-if [ $PMSCO_TASKS_PER_NODE -gt 24 ]; then
-    SLURM_ARGS="--cores-per-socket=16 $SLURM_ARGS"
-fi
-
-SLURM_ARGS="--partition=$PMSCO_PARTITION $SLURM_ARGS"
-
-SLURM_ARGS="--time=$PMSCO_WALLTIME_HR:00:00 $SLURM_ARGS"
-
-CMD="sbatch $SLURM_ARGS $PMSCO_JOBNAME.job"
-echo $CMD
-if [ "$NOSUB" != "true" ]; then
-  $CMD
-fi
-
-exit 0
--- a/bin/qpmsco.ra.sh
+++ b/bin/qpmsco.ra.sh
@ -1,151 +0,0 @@
-#!/bin/sh
-#
-# submission script for PMSCO calculations on the Ra cluster
-#
-# CAUTION: the job will execute the pmsco code which is present in the directory tree
-#          of this script _at the time of job execution_, not submission!
-#          before changing the code, make sure that all pending jobs have started execution,
-#          otherwise you will experience version conflicts.
-#          it's better to use the qpmsco.ra-git.sh script which clones the code.
-
-if [ $# -lt 1 ]; then
-  echo "Usage: $0 [NOSUB] DESTDIR JOBNAME NODES TASKS_PER_NODE WALLTIME:HOURS PROJECT MODE [ARGS [ARGS [...]]]"
-  echo ""
-  echo "       NOSUB (optional): do not submit the script to the queue. default: submit."
-  echo "       DESTDIR: destination directory. must exist. a sub-dir \$JOBNAME is created."
-  echo "       JOBNAME (text): name of job. use only alphanumeric characters, no spaces."
-  echo "       NODES (integer): number of computing nodes. (1 node = 24 or 32 processors)."
-  echo "          do not specify more than 2."
-  echo "       TASKS_PER_NODE (integer): 1...24, or 32."
-  echo "          24 or 32 for full-node allocation."
-  echo "          1...23 for shared node allocation."
-  echo "       WALLTIME:HOURS (integer): requested wall time."
-  echo "          1...24 for day partition"
-  echo "          24...192 for week partition"
-  echo "          1...192 for shared partition"
-  echo "       PROJECT: python module (file path) that declares the project and starts the calculation."
-  echo "       MODE: PMSCO calculation mode (single|swarm|gradient|grid)."
-  echo "       ARGS (optional): any number of further PMSCO or project arguments (except mode and time)."
-  echo ""
-  echo "the job script is written to \$DESTDIR/\$JOBNAME which is also the destination of calculation output."
-  exit 1
-fi
-
-# location of the pmsco package is derived from the path of this script
-SCRIPTDIR="$(dirname $(readlink -f $0))"
-SOURCEDIR="$SCRIPTDIR/.."
-PMSCO_SOURCE_DIR="$SOURCEDIR"
-
-# read arguments
-if [ "$1" == "NOSUB" ]; then
-  NOSUB="true"
-  shift
-else
-  NOSUB="false"
-fi
-
-DEST_DIR="$1"
-shift
-
-PMSCO_JOBNAME=$1
-shift
-
-PMSCO_NODES=$1
-PMSCO_TASKS_PER_NODE=$2
-PMSCO_TASKS=$(expr $PMSCO_NODES \* $PMSCO_TASKS_PER_NODE)
-shift 2
-
-PMSCO_WALLTIME_HR=$1
-PMSCO_WALLTIME_MIN=$(expr $PMSCO_WALLTIME_HR \* 60)
-shift
-
-# select partition
-if [ $PMSCO_WALLTIME_HR -ge 25 ]; then
-    PMSCO_PARTITION="week"
-else
-    PMSCO_PARTITION="day"
-fi
-if [ $PMSCO_TASKS_PER_NODE -lt 24 ]; then
-    PMSCO_PARTITION="shared"
-fi
-
-PMSCO_PROJECT_FILE="$(readlink -f $1)"
-shift
-
-PMSCO_MODE="$1"
-shift
-
-PMSCO_PROJECT_ARGS="$*"
-
-# use defaults, override explicitly in PMSCO_PROJECT_ARGS if necessary
-PMSCO_SCAN_FILES=""
-PMSCO_LOGLEVEL=""
-PMSCO_CODE=""
-
-# set up working directory
-cd "$DEST_DIR"
-if [ ! -d "$PMSCO_JOBNAME" ]; then
-    mkdir "$PMSCO_JOBNAME"
-fi
-cd "$PMSCO_JOBNAME"
-WORKDIR="$(pwd)"
-PMSCO_WORK_DIR="$WORKDIR"
-
-# provide revision information, requires git repository
-cd "$SOURCEDIR"
-PMSCO_REV=$(git log --pretty=format:"%h, %ai" -1)
-if [ $? -ne 0 ]; then
-   PMSCO_REV="revision unknown, "$(date +"%F %T %z")
-fi
-cd "$WORKDIR"
-echo "$PMSCO_REV" > revision.txt
-
-# generate job script from template
-sed -e "s:_PMSCO_WORK_DIR:$PMSCO_WORK_DIR:g" \
-    -e "s:_PMSCO_JOBNAME:$PMSCO_JOBNAME:g" \
-    -e "s:_PMSCO_NODES:$PMSCO_NODES:g" \
-    -e "s:_PMSCO_WALLTIME_HR:$PMSCO_WALLTIME_HR:g" \
-    -e "s:_PMSCO_PROJECT_FILE:$PMSCO_PROJECT_FILE:g" \
-    -e "s:_PMSCO_PROJECT_ARGS:$PMSCO_PROJECT_ARGS:g" \
-    -e "s:_PMSCO_CODE:$PMSCO_CODE:g" \
-    -e "s:_PMSCO_MODE:$PMSCO_MODE:g" \
-    -e "s:_PMSCO_SOURCE_DIR:$PMSCO_SOURCE_DIR:g" \
-    -e "s:_PMSCO_SCAN_FILES:$PMSCO_SCAN_FILES:g" \
-    -e "s:_PMSCO_LOGLEVEL:$PMSCO_LOGLEVEL:g" \
-    "$SCRIPTDIR/pmsco.ra.template" > $PMSCO_JOBNAME.job
-
-chmod u+x "$PMSCO_JOBNAME.job"
-
-# request nodes and tasks
-#
-# The option --ntasks-per-node is meant to be used with the --nodes option.
-# (For the --ntasks option, the default is one task per node, use the --cpus-per-task option to change this default.)
-#
-# sbatch options
-# --cores-per-socket=16
-#   32 cores per node
-# --partition=[shared|day|week]
-# --time=8-00:00:00
-#   override default time limit (2 days in long queue)
-#   time formats: "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes", "days-hours:minutes:seconds"
-# --mail-type=ALL
-# --test-only
-#   check script but do not submit
-#
-SLURM_ARGS="--nodes=$PMSCO_NODES --ntasks-per-node=$PMSCO_TASKS_PER_NODE"
-
-if [ $PMSCO_TASKS_PER_NODE -gt 24 ]; then
-    SLURM_ARGS="--cores-per-socket=16 $SLURM_ARGS"
-fi
-
-SLURM_ARGS="--partition=$PMSCO_PARTITION $SLURM_ARGS"
-
-SLURM_ARGS="--time=$PMSCO_WALLTIME_HR:00:00 $SLURM_ARGS"
-
-CMD="sbatch $SLURM_ARGS $PMSCO_JOBNAME.job"
-echo $CMD
-if [ "$NOSUB" != "true" ]; then
-  $CMD
-fi
-
-exit 0
--- a/bin/qpmsco.sge
+++ b/bin/qpmsco.sge
@ -1,128 +0,0 @@
-#!/bin/sh
-#
-# submission script for PMSCO calculations on Merlin cluster
-#
-
-if [ $# -lt 1 ]; then
-  echo "Usage: $0 [NOSUB] JOBNAME NODES WALLTIME:HOURS PROJECT MODE [LOG_LEVEL]"
-  echo ""
-  echo "       NOSUB (optional): do not submit the script to the queue. default: submit."
-  echo "       WALLTIME:HOURS (integer): sets the wall time limits."
-  echo "          soft limit = HOURS:00:00"
-  echo "          hard limit = HOURS:00:30"
-  echo "          for short.q: HOURS = 0 (-> MINUTES=30)"
-  echo "          for all.q:   HOURS <= 24"
-  echo "          for long.q:  HOURS <= 96"
-  echo "       PROJECT: python module (file path) that declares the project and starts the calculation."
-  echo "       MODE: PMSCO calculation mode (single|swarm|gradient|grid)."
-  echo "       LOG_LEVEL (optional): one of DEBUG, INFO, WARNING, ERROR if log files should be produced."
-  echo ""
-  echo "the job script complete with the program code and input/output data is generated in ~/jobs/\$JOBNAME"
-  exit 1
-fi
-
-# location of the pmsco package is derived from the path of this script
-SCRIPTDIR="$(dirname $(readlink -f $0))"
-SOURCEDIR="$SCRIPTDIR/.."
-PHD_SOURCE_DIR="$SOURCEDIR"
-
-PHD_CODE="edac"
-
-# read arguments
-if [ "$1" == "NOSUB" ]; then
-  NOSUB="true"
-  shift
-else
-  NOSUB="false"
-fi
-
-PHD_JOBNAME=$1
-shift
-
-PHD_NODES=$1
-shift
-
-PHD_WALLTIME_HR=$1
-PHD_WALLTIME_MIN=0
-shift
-
-PHD_PROJECT_FILE="$(readlink -f $1)"
-PHD_PROJECT_ARGS=""
-shift
-
-PHD_MODE="$1"
-shift
-
-PHD_LOGLEVEL=""
-if [ "$1" == "DEBUG" ] || [ "$1" == "INFO" ] || [ "$1" == "WARNING" ] || [ "$1" == "ERROR" ]; then
-  PHD_LOGLEVEL="$1"
-  shift
-fi
-
-# ignore remaining arguments
-PHD_SCAN_FILES=""
-
-# select allowed queues
-QUEUE=short.q,all.q,long.q
-
-# for short queue (limit 30 minutes)
-if [ "$PHD_WALLTIME_HR" -lt 1 ]; then
-    PHD_WALLTIME_HR=0
-    PHD_WALLTIME_MIN=30
-fi
-
-# set up working directory
-cd ~
-if [ ! -d "jobs" ]; then
-    mkdir jobs
-fi
-cd jobs
-if [ ! -d "$PHD_JOBNAME" ]; then
-    mkdir "$PHD_JOBNAME"
-fi
-cd "$PHD_JOBNAME"
-WORKDIR="$(pwd)"
-PHD_WORK_DIR="$WORKDIR"
-
-# provide revision information, requires git repository
-cd "$SOURCEDIR"
-PHD_REV=$(git log --pretty=format:"%h, %ad" --date=iso -1)
-if [ $? -ne 0 ]; then
-   PHD_REV="revision unknown, "$(date +"%F %T %z")
-fi
-cd "$WORKDIR"
-echo "$PHD_REV" > revision.txt
-
-# generate job script from template
-sed -e "s:_PHD_WORK_DIR:$PHD_WORK_DIR:g" \
-    -e "s:_PHD_JOBNAME:$PHD_JOBNAME:g" \
-    -e "s:_PHD_NODES:$PHD_NODES:g" \
-    -e "s:_PHD_WALLTIME_HR:$PHD_WALLTIME_HR:g" \
-    -e "s:_PHD_WALLTIME_MIN:$PHD_WALLTIME_MIN:g" \
-    -e "s:_PHD_PROJECT_FILE:$PHD_PROJECT_FILE:g" \
-    -e "s:_PHD_PROJECT_ARGS:$PHD_PROJECT_ARGS:g" \
-    -e "s:_PHD_CODE:$PHD_CODE:g" \
-    -e "s:_PHD_MODE:$PHD_MODE:g" \
-    -e "s:_PHD_SOURCE_DIR:$PHD_SOURCE_DIR:g" \
-    -e "s:_PHD_SCAN_FILES:$PHD_SCAN_FILES:g" \
-    -e "s:_PHD_LOGLEVEL:$PHD_LOGLEVEL:g" \
-    "$SCRIPTDIR/pmsco.sge.template" > $PHD_JOBNAME.job
-
-chmod u+x "$PHD_JOBNAME.job"
-
-if [ "$NOSUB" != "true" ]; then
-
-# suppress bash error [stackoverflow.com/questions/10496758]
-unset module
-
-# submit the job script
-# EMAIL must be defined in the environment
-if [ -n "$EMAIL" ]; then
-  qsub -q $QUEUE -m ae -M $EMAIL $PHD_JOBNAME.job
-else
-  qsub -q $QUEUE $PHD_JOBNAME.job
-fi
-
-fi
-
-exit 0
--- a/docs/config.dox
+++ b/docs/config.dox
@ -32,7 +32,7 @@ DOXYFILE_ENCODING      = UTF-8
 # title of most generated pages and in a few other places.
 # The default value is: My Project.

-PROJECT_NAME           = "PEARL MSCO"
+PROJECT_NAME           = "PMSCO"

 # The PROJECT_NUMBER tag can be used to enter a project or revision number. This
 # could be handy for archiving the generated documentation or if some version
@ -765,8 +765,10 @@ src/concepts-tasks.dox \
 src/concepts-emitter.dox \
 src/concepts-atomscat.dox \
 src/installation.dox \
+src/project.dox \
 src/execution.dox \
 src/commandline.dox \
+src/runfile.dox \
 src/optimizers.dox \
                         ../pmsco \
                         ../projects \
@ -889,7 +891,7 @@ INPUT_FILTER           =
 # filters are used. If the FILTER_PATTERNS tag is empty or if none of the
 # patterns match the file name, INPUT_FILTER is applied.

-FILTER_PATTERNS        = *.py=/usr/bin/doxypy
+FILTER_PATTERNS        = *.py=./py_filter.sh

 # If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using
 # INPUT_FILTER) will also be used to filter the input files that are used for
@ -2083,12 +2085,6 @@ EXTERNAL_GROUPS        = YES

 EXTERNAL_PAGES         = YES

-# The PERL_PATH should be the absolute path and name of the perl script
-# interpreter (i.e. the result of 'which perl').
-# The default file (with absolute path) is: /usr/bin/perl.
-
-PERL_PATH              = /usr/bin/perl
-
 #---------------------------------------------------------------------------
 # Configuration options related to the dot tool
 #---------------------------------------------------------------------------
@ -2102,15 +2098,6 @@ PERL_PATH              = /usr/bin/perl

 CLASS_DIAGRAMS         = YES

-# You can define message sequence charts within doxygen comments using the \msc
-# command. Doxygen will then run the mscgen tool (see:
-# http://www.mcternan.me.uk/mscgen/)) to produce the chart and insert it in the
-# documentation. The MSCGEN_PATH tag allows you to specify the directory where
-# the mscgen tool resides. If left empty the tool is assumed to be found in the
-# default search path.
-
-MSCGEN_PATH            = 
-
 # You can include diagrams made with dia in doxygen documentation. Doxygen will
 # then run dia to produce the diagram and insert it in the documentation. The
 # DIA_PATH tag allows you to specify the directory where the dia binary resides.
--- a/docs/py_filter.sh
+++ b/docs/py_filter.sh
@ -0,0 +1,2 @@
+#!/bin/bash
+python -m doxypypy.doxypypy -a -c $1
--- a/docs/readme.txt
+++ b/docs/readme.txt
@ -1,7 +1,17 @@
-to compile the source code documentation, you need the following packages (naming according to Debian):
+To compile the source code documentation in HTML format, 
+you need the following packages.
+They are available from Linux distributions unless noted otherwise.

+GNU make
 doxygen
-doxygen-gui (optional)
-doxypy
+python
+doxypypy (pip)
 graphviz
-latex (optional)
+java JRE
+plantuml (download from plantuml.com)
+
+export the location of plantuml.jar in the PLANTUML_JAR_PATH environment variable.
+
+go to the `docs` directory and execute `make html`.
+
+open `docs/html/index.html` in your browser.
--- a/docs/src/commandline.dox
+++ b/docs/src/commandline.dox
@ -22,7 +22,7 @@ Do not include the extension <code>.py</code> or a trailing slash.
 Common args and project args are described below.


-\subsection sec_common_args Common Arguments
+\subsection sec_command_common Common Arguments

 All common arguments are optional and default to more or less reasonable values if omitted.
 They can be added to the command line in arbitrary order.
@ -34,7 +34,7 @@ The following table is ordered by importance.
 | -h , --help | | Display a command line summary and exit. |
 | -m , --mode | single (default), grid, swarm, genetic | Operation mode. |
 | -d, --data-dir | file system path | Directory path for experimental data files (if required by project). Default: current working directory. |
-| -o, --output-file | file system path | Base path and/or name for intermediate and output files. Default: pmsco_data |
+| -o, --output-file | file system path | Base path and/or name for intermediate and output files. Default: pmsco0 |
 | -t, --time-limit | decimal number | Wall time limit in hours. The optimizers try to finish before the limit. Default: 24.0. |
 | -k, --keep-files | list of file categories | Output file categories to keep after the calculation. Multiple values can be specified and must be separated by spaces. By default, cluster and model (simulated data) of a limited number of best models are kept. See @ref sec_file_categories below. |
 | --log-level | DEBUG, INFO, WARNING (default), ERROR, CRITICAL | Minimum level of messages that should be added to the log. |
@ -45,7 +45,7 @@ The following table is ordered by importance.
 | --table-file | file system path | Name of the model table file in table scan mode. |


-\subsubsection sec_file_categories File Categories
+\subsubsection sec_command_files File Categories

 The following category names can be used with the `--keep-files` option.
 Multiple names can be specified and must be separated by spaces.
@ -79,7 +79,7 @@ you have to add the file categories that you want to keep, e.g.,
 Do not specify `rfac` alone as this will effectively not return any file.


-\subsection sec_project_args Project Arguments
+\subsection sec_command_project_args Project Arguments

 The following table lists a few recommended options that are handled by the project code.
 Project options that are not listed here should use the long form to avoid conflicts in future versions.
@ -90,7 +90,7 @@ Project options that are not listed here should use the long form to avoid confl
 | -s, --scans | project-dependent | Nick names of scans to use in calculation. The nick name selects the experimental data file and the initial state of the photoelectron. Multiple values can be specified and must be separated by spaces. |


-\subsection sec_scanfile Experimental Scan Files
+\subsection sec_command_scanfile Experimental Scan Files

 The recommended way of specifying experimental scan files is using nick names (dictionary keys) and the @c --scans option.
 A dictionary in the module code defines the corresponding file name, chemical species of the emitter and initial state of the photoelectron.
@ -99,7 +99,7 @@ This way, the file names and photoelectron parameters are versioned with the cod
 whereas command line arguments may easily get forgotten in the records.


-\subsection sec_project_example Argument Handling
+\subsection sec_command_example Argument Handling

 To handle command line arguments in a project module,
 the module must define a <code>parse_project_args</code> and a <code>set_project_args</code> function.
--- a/docs/src/concepts.dox
+++ b/docs/src/concepts.dox
@ -8,28 +8,30 @@ The code for a PMSCO job consists of the following components.

 skinparam componentStyle uml2

-component "project" as project
 component "PMSCO" as pmsco
+component "project" as project
 component "scattering code\n(calculator)" as calculator

 interface "command line" as cli
-interface "input files" as input
-interface "output files" as output
 interface "experimental data" as data
 interface "results" as results
+interface "output files" as output

+cli --> pmsco
 data -> project
-project ..> pmsco
+pmsco ..> project
 pmsco ..> calculator
-cli --> project
-input -> calculator
 calculator -> output
 pmsco -> results

@enduml

+The main entry point is the _PMSCO_ module.
+It implements a task loop to carry out the structural optimization
+and provides an interface between calculation programs and project-specific code.
+It also provides common utility classes and functions for the handling project data.

-The _project_ consists of program code, system and experimental parameters
+The _project_ consists of program code and parameters
 that are specific to a particular experiment and calculation job.
 The project code reads experimental data, defines the parameter dictionary of the model,
 and contains code to generate the cluster, parameter and phase files for the scattering code.
@ -40,10 +42,6 @@ which accepts detailed input files
 (parameters, atomic coordinates, emitter specification, scattering phases)
 and outputs an intensity distribution of photoelectrons versus energy and/or angle.

-The _PMSCO core_ interfaces between the project and the calculator.
-It carries out the structural optimization and manages the calculation tasks.
-It generates and sends input files to the calculator and reads back the output.
-

 \section sec_control_flow Control flow

--- a/docs/src/execution.dox
+++ b/docs/src/execution.dox
@ -2,10 +2,15 @@
 \section sec_run Running PMSCO

 To run PMSCO you need the PMSCO code and its dependencies (cf. @ref pag_install),
-a code module that contains the project-specific code,
+a customized code module that contains the project-specific code,
 and one or several files containing the scan parameters and experimental data.
 Please check the <code>projects</code> folder for examples of project modules.
-For a detailed description of the command line, see @ref pag_command.
+
+The run-time arguments can either be passed on the command line
+(@ref pag_command - the older and less flexible way)
+or in a JSON-formatted run-file
+(@ref pag_runfile - the recommended new and flexible way).
+For beginners, it's also possible to hard-code all project parameters in the custom project module.


 \subsection sec_run_single Single Process
@ -14,40 +19,28 @@ Run PMSCO from the command prompt:

@code{.sh}
 cd work-dir
-python pmsco-dir project-dir/project.py [pmsco-arguments] [project-arguments]
+python pmsco-dir -r run-file
@endcode

 where <code>work-dir</code> is the destination directory for output files,
 <code>pmsco-dir</code> is the directory containing the <code>__main__.py</code> file,
-<code>project.py</code> is the specific project module,
-and <code>project-dir</code> is the directory where the project file is located.
-PMSCO is run in one process which handles all calculations sequentially.
+<code>run-file</code> is a json-formatted configuration file that defines run-time parameters.
+The format and content of the run-file is described in a separate section.

-The command line arguments are divided into common arguments interpreted by the main pmsco code (pmsco.py),
-and project-specific arguments interpreted by the project module.
+In this form, PMSCO is run in one process which handles all calculations sequentially.

 Example command line for a single EDAC calculation of the two-atom project:
@code{.sh}
 cd work/twoatom
-python ../../pmsco ../../projects/twoatom/twoatom.py -s ea -o twoatom-demo -m single
+python ../../pmsco -r twoatom-hemi.json
@endcode

 This command line executes the main pmsco module <code>pmsco.py</code>.
-The main module loads the project file <code>twoatom.py</code> as a plug-in
-and starts processing the common arguments.
-The <code>twoatom.py</code> module contains only project-specific code
-with several defined entry-points called from the main module.
+The information which project to load is contained in the <code>twoatom-hemi.json</code> file,
+along with all common and specific project arguments.

-In the command line above, the <code>-o twoatom-demo</code> and <code>-m single</code> arguments
-are interpreted by the pmsco module.
-<code>-o</code> sets the base name of output files,
-and <code>-m</code> selects the operation mode to a single calculation.
-
-The scan argument is interpreted by the project module.
-It refers to a dictionary entry that declares the scan file, the emitting atomic species, and the initial state.
-In this example, the project looks for the <code>twoatom_energy_alpha.etpai</code> scan file in the project directory,
-and calculates the modulation function for a N 1s initial state.
-The kinetic energy and emission angles are contained in the scan file.
+This example can be run for testing.
+All necessary parameters and data files are included in the code repository.


 \subsection sec_run_parallel Parallel Processes
@ -61,29 +54,45 @@ The slave processes will run the scattering calculations, while the master coord
 and optimizes the model parameters (depending on the operation mode).

 For optimum performance, the number of processes should not exceed the number of available processors.
-To start a two-hour optimization job with multiple processes on an quad-core workstation with hyperthreading:
+To start an optimization job with multiple processes on an quad-core workstation with hyperthreading:
@code{.sh}
 cd work/my_project
-mpiexec -np 8 pmsco-dir/pmsco project-dir/project.py -o my_job_0001 -t 2 -m swarm
+mpiexec -np 8 --use-hwthread-cpus python pmsco-dir -r run-file
@endcode

+The `--use-hwthread` option may be necessary on certain hyperthreading architectures.
+

 \subsection sec_run_hpc High-Performance Cluster

-The script @c bin/qpmsco.ra.sh takes care of submitting a PMSCO job to the slurm queue of the Ra cluster at PSI.
-The script can be adapted to other machines running the slurm resource manager.
-The script generates a job script based on @c pmsco.ra.template,
-substituting the necessary environment and parameters,
-and submits it to the queue.
+PMSCO is ready to run with resource managers on cluster machines.
+Code for submitting jobs to the slurm queue of the Ra cluster at PSI is included in the pmsco.schedule module
+(see also the PEARL wiki pages in the PSI intranet).
+The job parameters are entered in a separate section of the run file, cf. @pag_runfile for details.
+Other machines can be supported by sub-classing pmsco.schedule.JobSchedule or pmsco.schedule.SlurmSchedule.

-Execute @c bin/qpmsco.ra.sh without arguments to see a summary of the arguments.
+If a schedule section is present and enabled in the run file,
+the following command will submit a job to the cluster machine
+rather than starting a calculation directly:

-To submit a job to the PSI clusters (see also the PEARL-Wiki page MscCalcRa),
-the analog command to the previous section would be:
@code{.sh}
-bin/qpmsco.ra.sh my_job_0001 1 8 2 projects/my_project/project.py swarm
+cd ~/pmsco
+python pmsco -r run-file.json
@endcode

+The command will copy the pmsco and project source trees as well as the run file and job script to a job directory
+under the output directory specified in the project section of the run file.
+The full path of the job directory is _output-dir/job-name.
+The directory must be empty or not existing when you run the above command.
+
+Be careful to specify correct project file paths.
+The output and data directories should be specified as absolute paths.
+
+The scheduling command will also load the project and scan files.
+Many parameter errors can, thus, be caught and fixed before the job is submitted to the queue.
+The run file also offers an option to stop just before submitting the job
+so that you can inspect the job files and submit the job manually.
+
 Be sure to consider the resource allocation policy of the cluster
 before you decide on the number of processes.
 Requesting less resources will prolong the run time but might increase the scheduling priority.
--- a/docs/src/installation.dox
+++ b/docs/src/installation.dox
@ -51,6 +51,14 @@ and it's difficult to switch between different Python versions.
 On the PSI cluster machines, the environment must be set using the module system and conda (on Ra).
 Details are explained in the PEARL Wiki.

+The following tools are required to compile the documentation:
+
+- doxygen
+- doxypypy
+- graphviz
+- Java
+- [plantUML](https://plantuml.com)
+- LaTeX (optional, generally not recommended)

 \subsection sec_install_instructions Instructions

@ -66,7 +74,6 @@ sudo apt install \
 binutils \
 build-essential \
 doxygen \
-doxypy \
 f2c \
 g++ \
 gcc \
@ -92,12 +99,15 @@ cd /usr/lib
 sudo ln -s /usr/lib/libblas/libblas.so.3 libblas.so
@endcode

-Install Miniconda according to their [instructions](https://conda.io/docs/user-guide/install/index.html),
+Download and install [Miniconda](https://conda.io/), 
 then configure the Python environment:

@code{.sh}
+wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
+bash ~/miniconda.sh
+
 conda create -q --yes -n pmsco python=3.6
-source activate pmsco
+conda activate pmsco
 conda install -q --yes -n pmsco \
    pip \
    "numpy>=1.13" \
@ -110,7 +120,7 @@ conda install -q --yes -n pmsco \
    statsmodels \
    swig \
    gitpython
-pip install periodictable attrdict fasteners mpi4py
+pip install periodictable attrdict commentjson fasteners mpi4py doxypypy
@endcode

@note `mpi4pi` should be installed via pip, _not_ conda.
@ -119,16 +129,15 @@ pip install periodictable attrdict fasteners mpi4py

 \subsubsection sec_install_singularity Installation in Singularity container

-A [Singularity](https://www.sylabs.io/guides/2.5/user-guide/index.html) container
+A [Singularity](https://sylabs.io/singularity/) container
 contains all OS and Python dependencies for running PMSCO.
 Besides the Singularity executable, nothing else needs to be installed in the host system.
 This may be the fastest way to get PMSCO running.

-For installation of Singularity,
-see their [user guide](https://www.sylabs.io/guides/2.5/user-guide/installation.html).
-On newer Linux systems (e.g. Ubuntu 18.04), Singularity is available from the package manager.
-Installation in a virtual machine on Windows or Mac are straightforward
-thanks to the [Vagrant system](https://www.vagrantup.com/).
+To get started with Singularity,
+download it from [sylabs.io](https://www.sylabs.io/singularity/) and install it according to their instructions.
+On Windows, Singularity can be installed in a virtual machine using the [Vagrant](https://www.vagrantup.com/)
+script included under `extras/vagrant`.

 After installing Singularity,
 check out PMSCO as explained in the @ref sec_compile section:
@ -136,6 +145,7 @@ check out PMSCO as explained in the @ref sec_compile section:
@code{.sh}
 cd ~
 mkdir containers
+cd containers
 git clone git@git.psi.ch:pearl/pmsco.git pmsco
 cd pmsco
 git checkout master
@ -143,11 +153,14 @@ git checkout -b my_branch
@endcode

 Then, either copy a pre-built container into `~/containers`,
-or build one from a script provided by the PMSCO repository:
+or build one from the definition file included under extras/singularity.
+You may need to customize the definition file to match the host OS
+or to install compatible OpenMPI libraries,
+cf. cf. [Singularity user guide](https://sylabs.io/guides/3.7/user-guide/mpi.html).

@code{.sh}
 cd ~/containers
-sudo singularity build pmsco.simg ~/containers/pmsco/extras/singularity/singularity_python3
+sudo singularity build pmsco.sif ~/containers/pmsco/extras/singularity/singularity_python3
@endcode

 To work with PMSCO, start an interactive shell in the container and switch to the pmsco environment.
@ -155,8 +168,9 @@ Note that the PMSCO code is outside the container and can be edited with the usu

@code{.sh}
 cd ~/containers
-singularity shell pmsco.simg
-source activate pmsco
+singularity shell pmsco.sif
+. /opt/miniconda/etc/profile.d/conda.sh
+conda activate pmsco
 cd ~/containers/pmsco
 make all
 nosetests -w tests/
@ -168,16 +182,17 @@ Or call PMSCO from outside:
 cd ~/containers
 mkdir output
 cd output
-singularity run ../pmsco.simg python ~/containers/pmsco/pmsco path/to/your-project.py arg1 arg2 ...
+singularity run -e ../pmsco.sif ~/containers/pmsco/pmsco -r path/to/your-runfile
@endcode

 For parallel processing, prepend `mpirun -np X` to the singularity command as needed.
+Note that this requires "compatible" OpenMPI versions on the host and container to avoid runtime errors.


 \subsubsection sec_install_extra Additional Applications

 For working with the code and data, some other applications are recommended.
-The PyCharm IDE can be installed from the Ubuntu software center.
+The PyCharm IDE (community edition) can be installed from the Ubuntu software center.
 The following commands install other useful helper applications:

@code{.sh}
@ -187,10 +202,24 @@ gitg \
 meld
@endcode

-To produce documentation in PDF format (not recommended on virtual machine), install LaTeX:
+To compile the documentation install the following tools.
+The basic documentation is in HTML format and can be opened in any internet browser.
+If you have a working LaTeX installation, a PDF document can be produced as well.
+It is not recommended to install LaTeX just for this documentation, however.

@code{.sh}
-sudo apt-get install texlive-latex-recommended
+sudo apt install \
+doxygen \
+graphviz \
+default-jre
+
+conda activate pmsco
+conda install -q --yes -n pmsco doxypypy
+
+wget -O plantuml.jar https://sourceforge.net/projects/plantuml/files/plantuml.jar/download
+sudo mkdir /opt/plantuml/
+sudo mv plantuml.jar /opt/plantuml/
+echo "export PLANTUML_JAR_PATH=/opt/plantuml/plantuml.jar" | sudo tee /etc/profile.d/pmsco-env.sh
@endcode


@ -250,7 +279,7 @@ mkdir work
 cd work
 mkdir twoatom
 cd twoatom/
-nice python ~/pmsco/pmsco ~/pmsco/projects/twoatom/twoatom.py -s ea -o twoatom_energy_alpha -m single
+nice python ~/pmsco/pmsco -r ~/pmsco/projects/twoatom/twoatom-energy.json
@endcode

 Runtime warnings may appear because the twoatom project does not contain experimental data.
--- a/docs/src/introduction.dox
+++ b/docs/src/introduction.dox
@ -26,13 +26,13 @@ Other programs may be integrated as well.
 - various scanning modes including energy, polar angle, azimuthal angle, analyser angle.
 - averaging over multiple domains and emitters.
 - global optimization of multiple scans.
- structural optimization algorithms: particle swarm optimization, grid search, gradient search.
+- structural optimization algorithms: genetic, particle swarm, grid search.
 - calculation of the modulation function.
 - calculation of the weighted R-factor.
 - automatic parallel processing using OpenMPI.


-\section sec_project Optimization Projects
+\section sec_intro_project Optimization Projects

 To set up a new optimization project, you need to:

@ -44,8 +44,7 @@ To set up a new optimization project, you need to:
 - add a global function create_project to my_project.py.
 - provide experimental data files (intensity or modulation function).

-For details, see the documentation of the Project class,
-and the example projects.
+For details, see @ref pag_project, the documentation of the pmsco.project.Project class and the example projects.


 \section sec_intro_start Getting Started
@ -54,8 +53,9 @@ and the example projects.
  - @ref pag_concepts_tasks
  - @ref pag_concepts_emitter
 - @ref pag_install
+- @ref pag_project
 - @ref pag_run
- @ref pag_command
+- @ref pag_opt

 \section sec_license License Information

@ -70,6 +70,6 @@ These programs may not be used without an explicit agreement by the respective o

 \author    Matthias Muntwiler, <mailto:matthias.muntwiler@psi.ch>
 \version   This documentation is compiled from version $(REVISION).
-\copyright 2015-2019 by [Paul Scherrer Institut](http://www.psi.ch)
+\copyright 2015-2021 by [Paul Scherrer Institut](http://www.psi.ch)
 \copyright Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)
 */
--- a/docs/src/optimizers.dox
+++ b/docs/src/optimizers.dox
@ -3,28 +3,34 @@



-\subsection sec_opt_swarm Particle swarm
+\subsection sec_opt_swarm Particle swarm optimization (PSO)

-The particle swarm algorithm is adapted from
+The particle swarm optimization (PSO) algorithm seeks to find a global optimum in a multi-dimensional model space
+by employing the _swarm intelligence_ of a number of particles traversing space,
+each at its own velocity and direction,
+but adjusting its trajectory based on its own experience and the results of its peers.
+
+The PSO algorithm is adapted from
 D. A. Duncan et al., Surface Science 606, 278 (2012).
+It is implemented in the @ref pmsco.optimizers.swarm module.

-The general parameters of the genetic algorithm are specified in the @ref Project.optimizer_params dictionary.
+The general parameters of the algorithm are specified in the @ref Project.optimizer_params dictionary.
 Some of them can be changed on the command line.

 | Parameter | Command line | Range | Description |
 | --- | --- | --- | --- |
-| pop_size | --pop-size | &ge; 1 | |
+| pop_size | --pop-size | &ge; 1 | Recommended 20..50 |
 | position_constrain_mode | | default bounce | Resolution of domain limit violations. |
 | seed_file | --seed-file | a file path, default none | |
 | seed_limit | --seed-limit | 0..pop_size | |
 | rfac_limit | | 0..1, default 0.8 | Accept only seed values that have a lower R-factor. |
 | recalc_seed | | True or False, default True | |

-The domain parameters have the following meanings:
+The model space attributes have the following meaning:

 | Parameter | Description |
 | --- | --- |
-| start | Seed model. The start values are copied into particle 0 of the initial population. |
+| start | Start value of particle 0 in first iteration. |
 | min | Lower limit of the parameter range. |
 | max | Upper limit of the parameter range. |
 | step | Not used. |
@ -32,23 +38,23 @@ The domain parameters have the following meanings:

 \subsubsection sec_opt_seed Seeding a population

-By default, one particle is initialized with the start value declared in the parameter domain,
-and the other are set to random values within the domain.
+By default, one particle is initialized with the start value declared with the model space,
+and the other ones are initialized at random positions in the model space.
 You may initialize more particles of the population with specific values by providing a seed file.

 The seed file must have a similar format as the result `.dat` files
 with a header line specifying the column names and data rows containing the values for each particle.
 A good practice is to use a previous `.dat` file and remove unwanted rows.
-To continue an interrupted optimization,
-the `.dat` file from the previous optimization can be used as is.
+The `.dat` file from a previous optimization job can be used as is to continue the optimization,
+also in a different optimization mode.

 The seeding procedure can be tweaked by several optimizer parameters (see above).
 PMSCO normally loads the first rows up to population size - 1 or up to the `seed_limit` parameter,
 whichever is lower.
 If an `_rfac` column is present, the file is first sorted by R-factor and only the best models are loaded.
-Models that resulted in an R-factor above the `rfac_limit` parameter are always ignored.
+Models that resulted in an R-factor above the `rfac_limit` parameter are ignored in any case.

-During the optimization process, all models loaded from the seed file are normally re-calculated.
+In the first iteration of the optimization run, the models loaded from the seed file are re-calculated by default.
 This may waste CPU time if the calculation is run under the same conditions
 and would result in exactly the same R-factor,
 as is the case if the seed is used to continue a previous optimization, for example.
@ -58,25 +64,26 @@ and PMSCO will use the R-factor value from the seed file rather than calculating

 \subsubsection sec_opt_patch Patching a running optimization

-While an optimization process is running, the user can manually patch the population with arbitrary values,
+While an optimization job is running, the user can manually patch the population with arbitrary values,
 for instance, to kick the population out of a local optimum or to drive it to a less sampled parameter region.
 To patch a running population, prepare a population file named `pmsco_patch.pop` and copy it to the work directory.

-The file must have a similar format as the result `.dat` files
+The patch file must have the same format as the result `.dat` files
 with a header line specifying the column names and data rows containing the values.
 It should contain as many rows as particles to be patched but not more than the size of the population.
-The columns must include a `_particle` column which specifies the particle to patch
-as well as the model parameters to be changed.
+The columns must include a `_particle` column and the model parameters to be changed.
+The `_particle` column specifies the index of the particle that is patched (ranging from 0 to population size - 1).
 Parameters that should remain unaffected can be left out,
 extra columns including `_gen`, `_rfac` etc. are ignored.

 PMSCO checks the file for syntax errors and ignores it if errors are present.
-Parameter values that lie outside the domain boundary are ignored.
+Individual parameter values that lie outside the domain boundary are silently ignored.
 Successful or failed patching is logged at warning level.
-The patch file is re-applied whenever its time stamp has changed.
+PMSCO keeps track of the time stamp of the file and re-applies the patch whenever the time stamp has changed.

-\attention Do not edit the patch file in the working directory
-to prevent it from being read in an unfinished state or multiple times.
+\attention Since each change of time stamp may trigger patching,
+do not edit the patch file in the working directory
+to prevent it from being read in an unfinished state or multiple times!


 \subsection sec_opt_genetic Genetic optimization
@ -103,7 +110,7 @@ Some of them can be changed on the command line.

 | Parameter | Command line | Range | Description |
 | --- | --- | --- | --- |
-| pop_size | --pop-size | &ge; 1 | |
+| pop_size | --pop-size | &ge; 1 | Recommended 10..40 |
 | mating_factor | | 1..pop_size, default 4 | |
 | strong_mutation_probability | | 0..1, default 0.01 | Probability that a parameter undergoes a strong mutation. |
 | weak_mutation_probability | | 0..1, default 1 | Probability that a parameter undergoes a weak mutation. This parameters should be left at 1. Lower values tend to produce discrete parameter values. Weak mutations can be tuned by the step domain parameters. |
@ -113,7 +120,7 @@ Some of them can be changed on the command line.
 | rfac_limit | | 0..1, default 0.8 | Accept only seed values that have a lower R-factor. |
 | recalc_seed | | True or False, default True | |

-The domain parameters have the following meanings:
+The model space attributes have the following meaning:

 | Parameter | Description |
 | --- | --- |
@ -129,7 +136,11 @@ cf. sections @ref sec_opt_seed and @ref sec_opt_swarm.
 \subsection sec_opt_grid Grid search

 The grid search algorithm samples the parameter space at equidistant steps.
-The order of calculations is randomized so that distant parts of the parameter space are sampled at an early stage.
+It is implemented in the @ref pmsco.optimizers.grid module.
+
+
+The model space attributes have the following meaning.
+The order of calculations is random so that results from different parts of the model space become available early.

 | Parameter | Description |
 | --- | --- |
@ -149,15 +160,19 @@ The table scan calculates models from an explicit table of model parameters.
 It can be used to recalculate models from a previous optimization run on other experimental data,
 as an interface to external optimizers,
 or as a simple input of manually edited model parameters.
+It is implemented in the @ref pmsco.optimizers.table module.

 The table can be stored in an external file that is specified on the command line,
 or supplied in one of several forms by the custom project class.
 The table can be left unchanged during the calculations,
 or new models can be added on the go.
+Duplicate models are ignored.

-@attention Because it is not easily possible to know when and which models have been read from the table file, if you do modify the table file during processing, pay attention to the following hints:
-1. The file on disk must not be locked for more than a second. Do not keep the file open unnecessarily.
-2. _Append_ new models to the end of the table rather than overwriting previous ones. Otherwise, some models may be lost before they have been calculated.
+@attention Because it is not easily possible to know when the table file is read,
+if you do modify the table file while calculations are running,
+1. Do not keep the file locked for longer than a second.
+2. Append new models to the end of the table rather than overwriting previous ones.
+3. Delete lines only if you're sure that they are not needed any more.

 The general parameters of the table scan are specified in the @ref Project.optimizer_params dictionary.
 Some of them can be changed on the command line or in the project class (depending on how the project class is implemented).
@ -167,7 +182,7 @@ Some of them can be changed on the command line or in the project class (dependi
 | pop_size | --pop-size | &ge; 1 | Number of models in a generation (calculated in parallel). In table mode, this parameter is not so important and can be left at the default. It has nothing to do with table size. |
 | table_file | --table-file | a file path, default none | |

-The domain parameters have the following meanings.
+The model space attributes have the following meaning.
 Models that violate the parameter range are not calculated.

 | Parameter | Description |
--- a/docs/src/project.dox
+++ b/docs/src/project.dox
@ -0,0 +1,454 @@
+/*! @page pag_project Setting up a new project
+\section sec_project Setting Up a New Project
+
+This topic guides you through the setup of a new project.
+Be sure to check out the examples in the projects folder
+and the code documentation as well.
+
+The basic steps are:
+
+1. Create a new folder under `projects`.
+2. In the new folder, create a Python module for the project (subsequently called _the project module_).
+3. In the project module, define a cluster generator class which derives from pmsco.cluster.ClusterGenerator.
+4. In the project module, define a project class which derives from pmsco.project.Project.
+5. In the same folder as the project module, create a JSON run-file.
+
+\subsection sec_project_module Project Module
+
+A skeleton of the project module file (with some common imports) may look like this:
+
+~~~~~~{.py}
+import logging
+import math
+import numpy as np
+import periodictable as pt
+from pathlib import Path
+
+import pmsco.cluster
+import pmsco.data
+import pmsco.dispatch
+import pmsco.elements.bindingenergy
+import pmsco.project
+
+logger = logging.getLogger(__name__)
+
+
+class MyClusterGenerator(pmsco.cluster.ClusterGenerator):
+    def create_cluster(self, model, index):
+        clu = pmsco.cluster.Cluster()
+        # ...
+        return clu
+
+    def count_emitters(self, model, index):
+        # ...
+        return 1
+
+
+class MyProject(pmsco.project.Project):
+    def __init__(self):
+        super().__init__()
+        # ...
+        self.cluster_generator = MyClusterGenerator(self)
+
+    def create_model_space():
+        spa = pmsco.project.ModelSpace()
+        # ...
+        return spa
+
+    def create_params(self, model, index):
+        par = pmsco.project.CalculatorParams()
+        # ...
+        return par
+~~~~~~
+
+The main purpose of the `MyProject` class is to bundle the project-specific calculation parameters and code.
+The purpose of the `MyClusterGenerator` class is to produce atomic clusters as a function of a number of model parameters.
+For the project to be useful, some of the methods in the skeleton above need to be implemented.
+The individual methods are discussed in the following.
+Further descriptions can be found in the documentation of the code.
+
+\subsection sec_project_cluster Cluster Generator
+
+The cluster generator is a project-specific Python object that produces a cluster, i.e., a list of atomic coordinates,
+based on a small number of model parameters whenever PMSCO requires it.
+The most important member of a cluster generator is its `create_cluster` method.
+At least this method must be implemented for a functional cluster generator.
+
+A generic `count_emitters` method is implemented in the base class.
+It needs to be overridden if you want to use parallel calculation of multiple emitters.
+
+\subsubsection sec_project_cluster_create Cluster Definition
+
+The `create_cluster` method takes the model parameters (a dictionary)
+and the task index (a pmsco.dispatch.CalcID, cf. @ref pag_concepts_tasks) as arguments.
+Given these arguments, it must create and fill a pmsco.cluster.Cluster object.
+See pmsco.cluster.ClusterGenerator.create_cluster for details on the method contract.
+
+As an example, have a look at the following simplified excerpt from the twoatom demo project.
+
+~~~~~~{.py}
+    def create_cluster(self, model, index):
+        # access model parameters
+        # dAB - distance between atoms in Angstroms
+        # th - polar angle in degrees
+        # ph - azimuthal angle in degrees
+        r = model['dAB']
+        th = math.radians(model['th'])
+        ph = math.radians(model['ph'])
+
+        # prepare a cluster object
+        clu = pmsco.cluster.Cluster()
+        # the comment line is optional but can be useful
+        clu.comment = "{0} {1}".format(self.__class__, index)
+        # set the maximum radius of the cluster (outliers will be ignored)
+        clu.set_rmax(r * 2.0)
+
+        # calculate atomic vectors
+        dx = r * math.sin(th) * math.cos(ph)
+        dy = r * math.sin(th) * math.sin(ph)
+        dz = r * math.cos(th)
+        a_top = np.array((0.0, 0.0, 0.0))
+        a_bot = np.array((-dx, -dy, -dz))
+
+        # add an oxygen atom at a_top position and mark it as emitter
+        clu.add_atom('O', a_top, 1)
+        # add a copper atom at a_bot position
+        clu.add_atom('Cu', a_bot, 0)
+
+        # pass the created cluster to the calculator
+        return clu
+~~~~~~
+
+In this example, two atoms are added to the cluster.
+The pmsco.cluster.Cluster class provides several methods to simplify the task,
+such as adding layers or bulk regions, rotation, translation, trim, emitter selection, etc.
+Please refer to the documentation of its code for details.
+It may also be instructive to have a look at the demo projects.
+
+The main purposes of the cluster object are to store an array of atoms and to read/write cluster files in a variety of formats.
+For each atom, the following properties are stored:
+
+- sequential atom index (1-based, maintained by cluster code)
+- atom type (chemical element number)
+- chemical element symbol from periodic table
+- x coordinate of the atom position
+- t coordinate of the atom position
+- z coordinate of the atom position
+- emitter flag (0 = scatterer, 1 = emitter, default 0)
+- charge/ionicity (units of elementary charge, default 0)
+- scatterer class (default 0)
+
+All of these properties except the scatterer class can be set by the add methods of the cluster.
+The scatterer class is used internally by the atomic scattering factor calculators.
+Whether the charge/ionicity is used, depends on the particular calculators, EDAC does not use it, for instance.
+
+Note: You do not need to take care how many emitters a calculator allows,
+or whether the emitter needs to be at the origin or the first place of the array.
+These technical aspects are handled by PMSCO code transparently.
+
+\subsubsection sec_project_cluster_domains Domains
+
+Domains refer to regions of inequivalent structure in the probing region.
+This may include regions of different orientation, different lattice constant, or even different structure.
+The cluster methods can read the selected domain from the `index.domain` argument.
+This is an index into the pmsco.project.Project.domains list where each item is a dictionary
+that holds additional, invariable structural parameters.
+
+A common case are rotational domains.
+In this case, the list of domains may look like `[{"zrot": 0.0}, {"zrot": 60.0}]`, for example,
+and the `create_cluster` method would include additional code to rotate the cluster:
+
+~~~~~~{.py}
+    def create_cluster(self, model, index):
+        # filling atoms here
+        # ...
+
+        dom = self.domains[index.domain]
+        try:
+            z_rot = dom['zrot']
+        except KeyError:
+            z_rot = 0.0
+        if z_rot:
+            clu.rotate_z(z_rot)
+
+        # selecting emitters
+        # ...
+
+        return clu
+~~~~~~
+
+Depending on the complexity of the system, it may, however, be necessary to write a specific sub-routine for each domain.
+
+The pmsco.project.Project class includes generic code to add intensities of domains incoherently (cf. pmsco.project.Project.combine_domains).
+If the model space contains parameters 'wdom0', 'wdom1', etc.,
+these parameters are interpreted at weights of domain 0, 1, etc.
+One domain must have a fixed weight to avoid correlated parameters.
+Typically, 'wdom0' is left undefined and defaults to 1.
+
+\subsubsection sec_project_cluster_emitters Emitter Configurations
+
+If your project has a large cluster and/or many emitters, have a look at @ref pag_concepts_emitter.
+In this case, you should override the `count_emitters` method and return the number of emitter configurations.
+In the simplest case, this is the number of inequivalent emitters, and the implementation would be:
+
+~~~~~~{.py}
+    def count_emitters(self, model, index):
+        index = index._replace(emit=-1)
+        clu = self.create_cluster(model, index)
+        return clu.get_emitter_count()
+~~~~~~
+
+Next, modify the `create_cluster` method to check the emitter index (`index.emit`).
+If it is -1, the method must return the full cluster with all inequivalent emitters marked.
+If it is positive, only the corresponding emitter must be marked.
+The code could be similar to this example:
+
+~~~~~~{.py}
+    def create_cluster(self, model, index):
+        # filling atoms here
+        # ...
+
+        # select all possible emitters (atoms of a specific element) in a cylindrical volume
+        # idx_emit is an array of atom numbers (0-based atom index)
+        idx_emit = clu.find_index_cylinder(origin, r_xy, r_z, self.project.scans[index.scan].emitter)
+        # if a specific emitter should be marked, restrict the array index.
+        if index.emit >= 0:
+            idx_emit = idx_emit[index.emit]
+        # mark the selected emitters
+        # if index.emit was < 0, all emitters are marked
+        clu.data['e'][idx_emit] = 1
+
+        return clu
+~~~~~~
+
+Now, the individual emitter configurations will be calculated in separate tasks
+which can be run in parallel in a multi-process environment.
+Note that the processing time of EDAC scales linearly with the number of emitters.
+Thus, parallel execution is beneficial.
+
+Advanced programmers may exploit more of the flexibility of emitter configurations, cf. @ref pag_concepts_emitter.
+
+\subsection sec_project_project Project Class
+
+Most commonly, a project class overrides the `__init__`, `create_model_space` and `create_params` methods.
+Most other inherited methods can be overridden optionally,
+for instance `validate`, `setup`, `calc_modulation`, `rfactor`,
+as well as the combine methods `combine_rfactors`, `combine_domains`, `combine_emitters`, etc.
+Int his introduction, we focus on the most basic three methods.
+
+\subsubsection sec_project_project_init Initialization and Defaults
+
+In the `__init__` method, you define and initialize (with default values) additional project properties.
+You may also redefine properties of the base class.
+The following code is just an example to give you some ideas.
+
+~~~~~~{.py}
+class MyProject(pmsco.project.Project):
+    def __init__(self):
+        # call the inherited method first
+        super().__init__()
+        # re-define an inherited property
+        self.directories["data"] = Path("/home/pmsco/data")
+        # define a scan dictionary
+        self.scan_dict = {}
+        # fill the scan dictionary
+        self.build_scan_dict()
+        # create the cluster generator
+        self.cluster_generator = MyClusterGenerator(self)
+        # declare the list of domains (at least one is required)
+        self.domains = [{"zrot": 0.}]
+
+    def build_scan_dict(self):
+        self.scan_dict["empty"] = {"filename": "{pmsco}/projects/common/empty-hemiscan.etpi",
+                                   "emitter": "Si", "initial_state": "2p3/2"}
+        self.scan_dict["Si2p"]  = {"filename": "{data}/xpd-Si2p.etpis",
+                                   "emitter": "Si", "initial_state": "2p3/2"}
+~~~~~~
+
+The scan dictionary can come in handy if you want to select scans by a shortcut on the command line or in a run file.
+
+Note that most of the properties can be assigned from a run file.
+This happens after the `__init__` method.
+The values set by `__init__` serve as default values.
+
+\subsubsection sec_project_project_space Model Space
+
+The model space defines the keys and value ranges of the model parameters.
+There are three ways to declare the model space in order of priority:
+
+1. Declare the model space in the run-file.
+2. Assign a ModelSpace to the self.model_space property directly in the `__init__` method.
+3. Implement the `create_model_space` method.
+
+We begin the third way:
+
+~~~~~~{.py}
+# under class MyProject(pmsco.project.Project):
+    def create_model_space(self):
+        # create an empty model space
+        spa = pmsco.project.ModelSpace()
+
+        # add parameters
+        spa.add_param('dAB',    2.10,  2.00,   2.25,  0.05)
+        spa.add_param('th',    15.00,  0.00,  30.00,  1.00)
+        spa.add_param('ph',    90.00)
+        spa.add_param('V0',    21.96, 15.00,  25.00,  1.00)
+        spa.add_param('Zsurf',  1.50)
+        spa.add_param('wdom1',  0.5,   0.10,  10.00,  0.10)
+
+        # return the model space
+        return spa
+~~~~~~
+
+This code declares six model parameters: `dAB`, `th`, `ph`, `V0`, `Zsurf` and `wdom1`.
+Three of them are structural parameters (used by the cluster generator above),
+two are used by the `create_params` method (see below),
+and `wdom1` is used in pmsco.project.Project.combine_domains while summing up contributions from different domains.
+
+The values in the arguments list correspond to the start value (initial guess),
+the lower and upper boundaries of the value range,
+and the step size for optimizers that require it.
+If just one value is given, like for `ph` and `Zsurf`, the parameter is held constant during the optimization.
+
+The equivalent declaration in the run-file would look like (parameters after `th` omitted):
+
+~~~~~~{.py}
+{
+  "project": {
+    // ...
+    "model_space": {
+      "dAB": {
+        "start": 2.109,
+        "min": 2.0,
+        "max": 2.25,
+        "step": 0.05
+      },
+      "th": {
+        "start": 15.0,
+        "min": 0.0,
+        "max": 30.0,
+        "step": 1.0
+      },
+      // ...
+    }
+  }
+}
+~~~~~~
+
+\subsubsection sec_project_project_params Calculation Parameters
+
+Non-structural parameters that are needed for the input files of the calculators are passed
+in a pmsco.project.CalculatorParams object.
+This object should be created and filled in the `create_params` method of the project class.
+
+The following example is from the twoatoms demo project:
+
+~~~~~~{.py}
+# under class MyProject(pmsco.project.Project):
+    def create_params(self, model, index):
+        params = pmsco.project.CalculatorParams()
+
+        # meta data
+        params.title = "two-atom demo"
+        params.comment = "{0} {1}".format(self.__class__, index)
+
+        # initial state and binding energy
+        initial_state = self.scans[index.scan].initial_state
+        params.initial_state = initial_state
+        emitter = self.scans[index.scan].emitter
+        params.binding_energy = pt.elements.symbol(emitter).binding_energy[initial_state]
+
+        # experimental setup
+        params.polarization = "H"
+        params.polar_incidence_angle = 60.0
+        params.azimuthal_incidence_angle = 0.0
+        params.experiment_temperature = 300.0
+
+        # material parameters
+        params.z_surface = model['Zsurf']
+        params.work_function = 4.5
+        params.inner_potential = model['V0']
+        params.debye_temperature = 356.0
+
+        # multiple-scattering parameters (EDAC)
+        params.emitters = []
+        params.lmax = 15
+        params.dmax = 5.0
+        params.orders = [25]
+
+        return params
+~~~~~~
+
+Most of the code is generic and can be copied to other projects.
+Only the experimental and material parameters need to be adjusted.
+Other properties can be changed as needed, see the documentation of pmsco.project.CalculatorParams for details.
+
+\subsection sec_project_args Passing Runtime Parameters
+
+Runtime parameters can be passed in one of three ways:
+
+1. hard-coded in the project module,
+2. on the command line, or
+3. in a JSON run-file.
+
+In the first way, all parameters are hard-coded in the `create_project` function of the project module.
+This is the simplest way for a quick start to a small project.
+However, as the project code grows, it's easy to loose track of revisions.
+In programming it is usually best practice to separate code and data.
+
+The command line is another option for passing parameters to a process.
+It requires extra code for parsing the command line and is not very flexible.
+It is difficult to pass complex data types.
+Using the command line is no longer recommended and may become deprecated in a future version.
+
+The recommended way of passing parameters is via run-files.
+Run-files allow for complete separation of code and data in a generic and flexible way.
+For example, run-files can be stored along with the results.
+However, the semantics of the run-file may look intimidating at first.
+
+\subsubsection sec_project_args_runfile Setting Up a Run-File
+
+The usage and format of run-files is described in detail under @ref pag_runfile.
+
+\subsubsection sec_project_args_code Hard-Coded Arguments
+
+Hard-coded parameters are usually set in a `create_module` function of the project module.
+At the end of the module, this function can easily be found.
+The function has two purposes: to create the project object and to set parameters.
+The parameters can be any attributes of the project class and its ancestors.
+See the parent pmsco.project.Project class for a list of common attributes.
+
+The `create_project` function may look like in the following example.
+It must return a project object, i.e. an object instance of a class that inherits from pmsco.project.Project.
+
+~~~~~~{.py}
+def create_project():
+    project = MyProject()
+
+    project.optimizer_params["pop_size"] = 20
+
+    project_dir = Path(__file__).parent
+    scan_file = Path(project_dir, "hbnni_e156_int.etpi")
+    project.add_scan(filename=scan_file, emitter="N", initial_state="1s")
+
+    project.add_domain({"zrot": 0.0})
+    project.add_domain({"zrot": 60.0})
+
+    return project
+~~~~~~
+
+To have PMSCO call this function,
+pass the file path of the containing module as the first command line argument of PMSCO, cf. @ref pag_command.
+PMSCO calls this function in absence of a run-file.
+
+
+\subsubsection sec_project_args_cmd Command Line
+
+Since it is not recommended to pass calculation parameters on the command line,
+this mechanism is not described in detail here.
+It is, however, still available.
+If you really need to use it,
+have a look at the code of the pmsco.pmsco.main function
+and how it calls the `create_project`, `parse_project_args` and `set_project_args` of the project module.
+
+*/
--- a/docs/src/runfile.dox
+++ b/docs/src/runfile.dox
@ -0,0 +1,333 @@
+/*! @page pag_runfile Run File
+\section sec_runfile Run File
+
+This section describes the format of a run-file.
+Run-files are a new way of passing arguments to a PMSCO process which avoids cluttering up the command line.
+It is more flexible than the command line
+because run-files can assign a value to any property of the project object in an abstract way.
+Moreover, there is no necessity for the project code to parse the command line.
+
+
+\subsection sec_runfile_how How It Works
+
+Run-files are text files in [JSON](https://en.wikipedia.org/wiki/JSON) format
+which shares most syntax elements with Python.
+JSON files contain nested dictionaries, lists, strings and numbers.
+
+In PMSCO, run-files contain a dictionary of parameters for the project object
+which is the main container for calculation parameters, model objects and links to data files.
+An abstract run-file parser reads the run-file,
+constructs the specified project object based on the custom project class
+and assigns the attributes of the project object.
+It's important to note that the parser does not recognize specific data types or classes.
+All specific data handling is done by the instantiated objects, mainly the project class.
+
+The parser can handle the following situations:
+
+- Strings, numbers as well as dictionaries and lists of simple objects can be assigned directly to project attributes.
+- If the project class defines an attribute as a _property_,
+  the class can execute custom code to import or validate data.
+- The parser can instantiate an object from a class in the namespace of the project module
+  and assign its properties.
+
+
+\subsection sec_runfile_general General File Format
+
+Run-files must adhere to the [JSON](https://en.wikipedia.org/wiki/JSON) format,
+which shares most syntax elements with Python.
+Specifically, a JSON file can declare dictionaries, lists and simple objects
+such as strings, numbers and `null`.
+As one extension to plain JSON, PMSCO ignores line comments starting with a hash `#` or double-slash `//`.
+This can be used to temporarily hide a parameter from the parser.
+
+For example run-files, have a look at the twoatom demo project.
+
+
+\subsection sec_runfile_project Project Specification
+
+
+The following minimum run-file demonstrates how to specify the project at the top level:
+
+~~~~~~{.py}
+{
+  "project": {
+    "__module__": "projects.twoatom.twoatom",
+    "__class__": "TwoatomProject",
+    "mode": "single",
+    "output_file": "twoatom0001"
+  }
+}
+~~~~~~
+
+Here, the `project` keyword denotes the dictionary that is used to construct the project object.
+
+Within the project dictionary, the `__module__` key selects the Python module file that contains the project code,
+and `__class__` refers to the name of the actual project class.
+Further dictionary items correspond to attributes of the project class.
+
+The module name is the same as would be used in a Python import statement.
+It must be findable on the Python path.
+PMSCO ensures that the directory containing the `pmsco` and `projects` sub-directories is on the Python path.
+The class name must be in the namespace of the loaded module.
+
+As PMSCO starts, it imports the specified module,
+constructs an object of the specified project class,
+and assigns any further items to project attributes.
+In the example above, `twoatom0001` is assigned to the `output_file` property.
+Any attributes not specified in the run-file will remain at their default values
+that were set byt the `__init__` method of the project class.
+
+Note that parameter names must start with an alphabetic character, else they are ignored.
+This provides another way to temporarily ignore an item from the file besides line comments.
+
+Also note that PMSCO does not spell-check parameter names.
+The parameter values are just written to the corresponding object attribute.
+If a name is misspelled, the value will be written under the wrong name and missed by the code eventually.
+
+PMSCO carries out only some most important checks on the given parameter values.
+Incorrect values may lead to improper operation or exceptions later in the calculations.
+
+
+\subsection sec_runfile_common Common Arguments
+
+The following table lists some important parameters controlling the calculations.
+They are declared in the pmsco.projects.Project class.
+
+| Key | Values | Description |
+| --- | --- | --- |
+| mode | `single` (default), `grid`, `swarm`, `genetic`, `table`, `test`, `validate` | Operation mode. `validate` can be used to check the syntax of the run-file, the process exits before starting calculations. |
+| directories | dictionary | This dictionary lists common file paths used in the project. It contains keys such as `home`, `project`, `output` (see documentation of Project class in pmsco.project). Enclosed in curly braces, the keys can be used as placeholders in filenames. |
+| output_dir | path | Shortcut for directories["output"] |
+| data_dir | path | Shortcut for directories["data"] |
+| job_name | string, must be a valid file name | Base name for all produced output files. It is recommended to set a unique name for each calculation run. Do not include a path. The path can be set in _output_dir_. |
+| cluster_generator | dictionary | Class name and attributes of the cluster generator. See below. |
+| atomic_scattering_factory | string<br>Default: InternalAtomicCalculator from pmsco.calculators.calculator | Class name of the atomic scattering calculator. This name must be in the namespace of the project module. |
+| multiple_scattering_factory | string<br>Default: EdacCalculator from  pmsco.calculators.edac | Class name of the multiple scattering calculator. This name must be in the namespace of the project module. |
+| model_space | dictionary | See @ref sec_runfile_space below. |
+| domains | list of dictionaries | See @ref sec_runfile_domains below. |
+| scans | list of dictionaries | See @ref sec_runfile_scans below. |
+| optimizer_params | dictionary | See @ref sec_runfile_optimizer below. |
+
+The following table lists some common control parameters and metadata
+that affect the behaviour of the program but do not affect the calculation results.
+The job metadata is used to identify and describe a job in the results database if requested.
+
+| Key | Values | Description |
+| --- | --- | --- |
+| job_tags | list of strings | User-specified job tags (metadata). |
+| description | string | Description of the calculation job (metadata) |
+| time_limit | decimal number<br>Default: 24. | Wall time limit in hours. The optimizers try to finish before the limit. This cannot be guaranteed, however. |
+| keep_files | list of file categories | Output file categories to keep after the calculation. Multiple values can be specified and must be separated by spaces. By default, cluster and model (simulated data) of a limited number of best models are kept. See @ref sec_runfile_files below. |
+| keep_best | integer number<br>Default: 10 | number of best models for which result files should be kept. |
+| keep_level | integer number<br>Default: 1 | numeric task level down to which files are kept. 1 = scan level, 2 = domain level, etc. |
+| log_level | DEBUG, INFO, WARNING, ERROR, CRITICAL | Minimum level of messages that should be added to the log. Empty string turns off logging. |
+| log_file | file system path<br>Default: job_name + ".log". | Name of the main log file. Under MPI, the rank of the process is inserted before the extension. The log name is created in the working directory.  |
+
+
+\subsection sec_runfile_space Model Space
+
+The `model_space` parameter is a dictionary of model parameters.
+The key is the name of the parameter as used by the cluster and input-formatting code,
+the value is a dictionary holding the `start`, `min`, `max`, `step` values to be used by the optimizer.
+
+~~~~~~{.py}
+{
+  "project": {
+    // ...
+    "model_space": {
+      "dAB": {
+        "start": 2.109,
+        "min": 2.0,
+        "max": 2.25,
+        "step": 0.05
+      },
+      "pAB": {
+        "start": 15.0,
+        "min": 0.0,
+        "max": 30.0,
+        "step": 1.0
+      },
+      // ...
+    }
+  }
+}
+~~~~~~
+
+
+\subsection sec_runfile_domains Domains
+
+Domains is a list of dictionaries.
+Each dictionary holds keys describing the domain to the cluster and input-formatting code.
+The meaning of these keys is up to the project.
+
+~~~~~~{.py}
+{
+  "project": {
+    // ...
+    "domains": [
+      {"surface": "Te", "doping": null, "zrot": 0.0},
+      {"surface": "Te", "doping": null, "zrot": 60.0}
+    ],
+  }
+}
+~~~~~~
+
+
+\subsection sec_runfile_scans Experimental Scan Files
+
+The pmsco.project.Scan objects used in the calculation cannot be instantiated from the run-file directly.
+Instead, the scans object is a list of scan creators/loaders which specify what to do to create a Scan object.
+The pmsco.project module defines three scan creators: ScanLoader, ScanCreator and ScanKey.
+The following code block shows an example of each of the three:
+
+~~~~~~{.py}
+{
+  "project": {
+    // ...
+    "scans": [
+      {
+        "__class__": "pmsco.project.ScanCreator",
+        "filename": "twoatom_energy_alpha.etpai",
+        "emitter": "N",
+        "initial_state": "1s",
+        "positions": {
+          "e": "np.arange(10, 400, 5)",
+          "t": "0",
+          "p": "0",
+          "a": "np.linspace(-30, 30, 31)"
+        }
+      },
+      {
+        "__class__": "pmsco.project.ScanLoader",
+        "filename": "{project}/twoatom_hemi_250e.etpi",
+        "emitter": "N",
+        "initial_state": "1s",
+        "is_modf": false
+      },
+      {
+        "__class__": "pmsco_project.ScanKey",
+        "key": "Ge3s113tp"
+      }
+    ]
+  }
+}
+~~~~~~
+
+The class name must be specified as it would be called in the custom project module.
+`pmsco.project` must, thus, be imported in the custom project module.
+
+The *ScanCreator* object creates a scan using Numpy array constructors in `positions`.
+In the example above, a two-dimensional rectangular energy-alpha scan grid is created.
+The values of the positions axes are passed to Python's `eval` function
+and must return a one-dimensional Numpy `ndarray`.
+
+The `emitter` and `initial_state` keys define the probed core level.
+
+The *ScanLoader* object loads a data file, specified under `filename`.
+The filename can include a placeholder which is replaced by the corresponding item from Project.directories.
+Note that some of the directories (including `project`) are pre-set by PMSCO.
+It is recommended to add a `data` key under `directories` in the run-file
+if the data files are outside of the PMSCO directory tree.
+The `is_modf` key indicates whether the file contains a modulation function (`true`) or intensity (`false`).
+In the latter case, the modulation function is calculated after loading.
+
+The *ScanKey* is the shortest scan specification in the run-file.
+It is a shortcut to a complete scan description in `scan_dict` dictionary in the project object.
+The `scan_dict` must be set up in the `__init__` method of the project class.
+The `key` item specifies which key of `scan_dict` should be used to create the Scan object.
+
+Each item of `scan_dict` holds a dictionary
+that in turn holds the attributes for either a `ScanCreator` or a `ScanLoader`.
+If it contains a `positions` key, it represents a `ScanCreator`, else a `ScanLoader`.
+
+
+\subsection sec_runfile_optimizer Optimizer Parameters
+
+The `optimizer_params` is a dictionary holding one or more of the following items.
+
+| Key | Values | Description |
+| --- | --- | --- |
+| pop-size | integer<br>The default value is the greater of 4 or the number of parallel calculation processes. | Population size (number of particles) in swarm and genetic optimization mode. |
+| seed-file | file system path | Name of the population seed file. Population data of previous optimizations can be used to seed a new optimization. The file must have the same structure as the .pop or .dat files. See @ref pmsco.project.Project.seed_file. |
+| table-file | file system path | Name of the model table file in table scan mode. |
+
+
+\subsubsection sec_runfile_files File Categories
+
+The following category names can be used with the `keep_files` option.
+Multiple names can be specified as a list.
+
+| Category | Description | Default Action |
+| --- | --- | --- |
+| all | shortcut to include all categories | |
+| input |      raw input files for calculator, including cluster and phase files in custom format | delete |
+| output |     raw output files from calculator | delete |
+| atomic |     atomic scattering and emission files in portable format | delete |
+| cluster |    cluster files in portable XYZ format for report | keep |
+| debug |      debug files |  delete |
+| model |       output files in ETPAI format: complete simulation  (a_-1_-1_-1_-1) | keep |
+| scan |       output files in ETPAI format: scan (a_b_-1_-1_-1) |  keep |
+| domain |     output files in ETPAI format: domain (a_b_c_-1_-1) |  delete |
+| emitter |    output files in ETPAI format: emitter (a_b_c_d_-1) |  delete |
+| region |     output files in ETPAI format: region (a_b_c_d_e) |  delete |
+| report|      final report of results | keep always |
+| population |  final state of particle population | keep |
+| rfac |        files related to models which give bad r-factors, see warning below | delete |
+
+\note
+The `report` category is always kept and cannot be turned off.
+The `model` category is always kept in single calculation mode.
+
+\warning
+If you want to specify `rfac` with the `keep_files` option,
+you have to add the file categories that you want to keep, e.g.,
+`"keep_files": ["rfac", "cluster", "model", "scan", "population"]`
+(to return the default categories for all calculated models).
+Do not specify `rfac` alone as this will effectively not return any file.
+
+
+\subsection sec_runfile_schedule Job Scheduling
+
+To submit a job to a resource manager such as Slurm, add a `schedule` section to the run file
+(section ordering is not important):
+
+~~~~~~{.py}
+{
+  "schedule": {
+    "__module__": "pmsco.schedule",
+    "__class__": "PsiRaSchedule",
+    "nodes": 1,
+    "tasks_per_node": 24,
+    "walltime": "2:00",
+    "manual_run": true,
+    "enabled": true
+  },
+  "project": {
+    "__module__": "projects.twoatom.twoatom",
+    "__class__": "TwoatomProject",
+    "mode": "single",
+    "output_file": "{home}/pmsco/twoatom0001",
+    ...
+  }
+}
+~~~~~~
+
+In the same way as for the project, the `__module__` and `__class__` keys select the class that handles the job submission.
+In this example, it is pmsco.schedule.PsiRaSchedule which is tied to the Ra cluster at PSI.
+For other machines, you can sub-class one of the classes in the pmsco.schedule module and include it in your project module.
+
+The parameters of pmsco.schedule.PsiRaSchedule are as follows.
+Some of them are also used in other schedule classes or may have different types or ranges.
+
+| Key | Values | Description |
+| --- | --- | --- |
+| nodes | integer: 1..2 | Number of compute nodes (main boards on Ra). The maximum number available for PEARL is 2. |
+| tasks_per_node | integer: 1..24, 32 | Number of tasks (CPU cores on Ra) per node. Jobs with less than 24 tasks are assigned to the shared partition. |
+| wall_time | string: [days-]hours[:minutes[:seconds]] <br> dict: with any combination of days, hours, minutes, seconds | Maximum run time (wall time) of the job. |
+| manual | bool | Manual submission (true) or automatic submission (false). Manual submission allows you to inspect the job files before submission. |
+| enabled | bool | Enable scheduling (true). Otherwise, the calculation is started directly (false).
+
+@note The calculation job may run in a different working directory than the current one.
+It is important to specify absolute data and output directories in the run file (project/directories section).
+
+*/
--- a/docs/src/uml/top-components.puml
+++ b/docs/src/uml/top-components.puml
@ -2,21 +2,19 @@

 skinparam componentStyle uml2

-component "project" as project
 component "PMSCO" as pmsco
+component "project" as project
 component "scattering code\n(calculator)" as calculator

 interface "command line" as cli
-interface "input files" as input
-interface "output files" as output
 interface "experimental data" as data
 interface "results" as results
+interface "output files" as output

+cli --> pmsco
 data -> project
-project ..> pmsco
+pmsco ..> project
 pmsco ..> calculator
-cli --> project
-input -> calculator
 calculator -> output
 pmsco -> results

--- a/extras/singularity/singularity_python2
+++ b/extras/singularity/singularity_python2
@ -1,117 +0,0 @@
-BootStrap: debootstrap
-OSVersion: bionic
-MirrorURL: http://ch.archive.ubuntu.com/ubuntu/
-
-%help
-a singularity container for PMSCO.
-
-git clone requires an ssh key for git.psi.ch.
-try agent forwarding (-A option to ssh).
-
-#%setup
-# executed on the host system outside of the container before %post
-#
-# this will be inside the container
-#    touch ${SINGULARITY_ROOTFS}/tacos.txt
-# this will be on the host
-#    touch avocados.txt
-
-#%files
-# files are copied before %post
-#
-# this copies to root
-#    avocados.txt
-# this copies to /opt
-#    avocados.txt /opt
-#
-# this does not work
-#    ~/.ssh/known_hosts /etc/ssh/ssh_known_hosts
-#    ~/.ssh/id_rsa /etc/ssh/id_rsa
-
-%labels
-    Maintainer Matthias Muntwiler
-    Maintainer_Email matthias.muntwiler@psi.ch
-    Python_Version 2.7
-
-%environment
-    export PATH="/usr/local/miniconda3/bin:$PATH"
-    export PYTHON_VERSION=2.7
-    export SINGULAR_BRANCH="singular"
-    export LC_ALL=C
-
-%post
-    export PYTHON_VERSION=2.7
-    export LC_ALL=C
-
-    sed -i 's/$/ universe/' /etc/apt/sources.list
-    apt-get update
-    apt-get -y install \
-        binutils \
-        build-essential \
-        doxygen \
-        doxypy \
-        f2c \
-        g++ \
-        gcc \
-        gfortran \
-        git \
-        graphviz \
-        libblas-dev \
-        liblapack-dev \
-        libopenmpi-dev \
-        make \
-        nano \
-        openmpi-bin \
-        openmpi-common \
-        sqlite3 \
-        wget
-    apt-get clean
-
-    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
-    bash ~/miniconda.sh -b -p /usr/local/miniconda3
-    export PATH="/usr/local/miniconda3/bin:$PATH"
-
-    conda create -q --yes -n pmsco python=${PYTHON_VERSION}
-    . /usr/local/miniconda3/bin/activate pmsco
-    conda install -q --yes -n pmsco \
-        pip \
-        "numpy>=1.13" \
-        scipy \
-        ipython \
-        matplotlib \
-        nose \
-        mock \
-        future \
-        statsmodels \
-        swig
-    conda clean --all -y
-    /usr/local/miniconda3/envs/pmsco/bin/pip install periodictable attrdict fasteners mpi4py
-    
-    
-#%test
-# test the image after build
-
-%runscript
-    # executes command from command line
-    . /usr/local/miniconda3/bin/activate pmsco
-    exec echo "$@"
-
-%apprun install
-    . /usr/local/miniconda3/bin/activate pmsco
-    cd ~
-    git clone https://git.psi.ch/pearl/pmsco.git pmsco
-    cd pmsco
-    git checkout develop
-    git checkout -b ${SINGULAR_BRANCH}
-
-    make all
-    nosetests
-
-%apprun python
-    . /usr/local/miniconda3/bin/activate pmsco
-    exec python "${@}"
-
-%apprun conda
-    . /usr/local/miniconda3/bin/activate pmsco
-    exec conda "${@}"
-
--- a/extras/singularity/singularity_python3
+++ b/extras/singularity/singularity_python3
@ -3,10 +3,11 @@ OSVersion: bionic
 MirrorURL: http://ch.archive.ubuntu.com/ubuntu/

 %help
-a singularity container for PMSCO.
+A singularity container for PMSCO.

-git clone requires an ssh key for git.psi.ch.
-try agent forwarding (-A option to ssh).
+singularity run -e pmsco.sif path/to/pmsco -r path/to/your-runfile
+
+path/to/pmsco must point to the directory that contains the __main__.py file.

 #%setup
 # executed on the host system outside of the container before %post
@ -34,22 +35,25 @@ try agent forwarding (-A option to ssh).
    Python_Version 3

 %environment
-    export PATH="/usr/local/miniconda3/bin:$PATH"
-    export PYTHON_VERSION=3
-    export SINGULAR_BRANCH="singular"
    export LC_ALL=C
+    export PYTHON_VERSION=3
+    export CONDA_ROOT=/opt/miniconda
+    export PLANTUML_JAR_PATH=/opt/plantuml/plantuml.jar
+    export SINGULAR_BRANCH="singular"

 %post
-    export PYTHON_VERSION=3
    export LC_ALL=C
+    export PYTHON_VERSION=3
+    export CONDA_ROOT=/opt/miniconda
+    export PLANTUML_ROOT=/opt/plantuml

    sed -i 's/$/ universe/' /etc/apt/sources.list
    apt-get update
    apt-get -y install \
        binutils \
        build-essential \
+        default-jre \
        doxygen \
-        doxypy \
        f2c \
        g++ \
        gcc \
@ -67,11 +71,11 @@ try agent forwarding (-A option to ssh).
    apt-get clean

    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
-    bash ~/miniconda.sh -b -p /usr/local/miniconda3
-    export PATH="/usr/local/miniconda3/bin:$PATH"
+    bash ~/miniconda.sh -b -p ${CONDA_ROOT}

+    . ${CONDA_ROOT}/bin/activate
    conda create -q --yes -n pmsco python=${PYTHON_VERSION}
-    . /usr/local/miniconda3/bin/activate pmsco
+    conda activate pmsco
    conda install -q --yes -n pmsco \
        pip \
        "numpy>=1.13" \
@ -82,35 +86,36 @@ try agent forwarding (-A option to ssh).
        mock \
        future \
        statsmodels \
-        swig
+        swig \
+        gitpython
    conda clean --all -y
-    /usr/local/miniconda3/envs/pmsco/bin/pip install periodictable attrdict fasteners mpi4py
+    pip install periodictable attrdict commentjson fasteners mpi4py doxypypy

+    mkdir ${PLANTUML_ROOT}
+    wget -O ${PLANTUML_ROOT}/plantuml.jar https://sourceforge.net/projects/plantuml/files/plantuml.jar/download

 #%test
 # test the image after build

 %runscript
-    # executes command from command line
-    source /usr/local/miniconda3/bin/activate pmsco
-    exec echo "$@"
+    . ${CONDA_ROOT}/etc/profile.d/conda.sh
+    conda activate pmsco
+    exec python "$@"

 %apprun install
-    source /usr/local/miniconda3/bin/activate pmsco
+    . ${CONDA_ROOT}/etc/profile.d/conda.sh
+    conda activate pmsco
    cd ~
    git clone https://git.psi.ch/pearl/pmsco.git pmsco
    cd pmsco
-    git checkout develop
+    git checkout master
    git checkout -b ${SINGULAR_BRANCH}

+    make all
+    nosetests -w tests/
+
+%apprun compile
+    . ${CONDA_ROOT}/etc/profile.d/conda.sh
+    conda activate pmsco
    make all
    nosetests
-
-%apprun python
-    source /usr/local/miniconda3/bin/activate pmsco
-    exec python "${@}"
-
-%apprun conda
-    source /usr/local/miniconda3/bin/activate pmsco
-    exec conda "${@}"
-
--- a/extras/vagrant/Vagrantfile
+++ b/extras/vagrant/Vagrantfile
@ -12,8 +12,8 @@ Vagrant.configure("2") do |config|

  # Every Vagrant development environment requires a box. You can search for
  # boxes at https://vagrantcloud.com/search.
-  config.vm.box = "singularityware/singularity-2.4"
-  config.vm.box_version = "2.4"
+  config.vm.box = "sylabs/singularity-3.7-ubuntu-bionic64"
+  config.vm.box_version = "3.7"

  # Disable automatic box update checking. If you disable this, then
  # boxes will only be checked for updates when the user runs
--- a/pmsco/main.py
+++ b/pmsco/main.py
@ -8,16 +8,13 @@ python pmsco [pmsco-arguments]
@endverbatim
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
+from pathlib import Path
 import sys
-import os.path

-file_dir = os.path.dirname(__file__) or '.'
-root_dir = os.path.join(file_dir, '..')
-root_dir = os.path.abspath(root_dir)
-sys.path[0] = root_dir
+pmsco_root = Path(__file__).resolve().parent.parent
+if str(pmsco_root) not in sys.path:
+    sys.path.insert(0, str(pmsco_root))
+

 if __name__ == '__main__':
    import pmsco.pmsco
--- a/pmsco/calculators/phagen/makefile
+++ b/pmsco/calculators/phagen/makefile
@ -13,8 +13,9 @@ SHELL=/bin/sh
 .PHONY: all clean phagen

 FC?=gfortran
+FCOPTS?=-std=legacy
 F2PY?=f2py
-F2PYOPTS?=
+F2PYOPTS?=--f77flags=-std=legacy --f90flags=-std=legacy
 CC?=gcc
 CCOPTS?=
 SWIG?=swig
--- a/pmsco/cluster.py
+++ b/pmsco/cluster.py
@ -17,22 +17,20 @@ pip install --user periodictable

@author Matthias Muntwiler

-@copyright (c) 2015-20 by Paul Scherrer Institut @n
+@copyright (c) 2015-21 by Paul Scherrer Institut @n
 Licensed under the Apache License, Version 2.0 (the "License"); @n
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
 import math
 import numpy as np
 import periodictable as pt
 import sys

+import pmsco.config as config
+
 ## default file format identifier
 FMT_DEFAULT = 0
 ## MSC file format identifier
@ -227,13 +225,13 @@ class Cluster(object):
        """
        self.rmax = r

-    def build_element(self, index, element_number, x, y, z, emitter, charge=0., scatterer_class=0):
+    def build_element(self, index, element, x, y, z, emitter, charge=0., scatterer_class=0):
        """
        build a tuple in the format of the internal data array.
        
        @param index: (int) index
        
-        @param element_number: (int) chemical element number
+        @param element: chemical element number (int) or symbol (str)
        
        @param x, y, z: (float) atom coordinates in the cluster
        
@ -243,7 +241,13 @@ class Cluster(object):

        @param scatterer_class: (int) scatterer class. default = 0.
        """
-        symbol = pt.elements[element_number].symbol
+        try:
+            element_number = int(element)
+            symbol = pt.elements[element_number].symbol
+        except ValueError:
+            symbol = element
+            element_number = pt.elements.symbol(symbol.strip()).number
+
        element = (index, element_number, symbol, scatterer_class, x, y, z, int(emitter), charge)
        return element

@ -251,7 +255,7 @@ class Cluster(object):
        """
        add a single atom to the cluster.
        
-        @param atomtype: (int) chemical element number
+        @param atomtype: chemical element number (int) or symbol (str)
        
        @param v_pos: (numpy.ndarray, shape = (3)) position vector
        
@ -274,7 +278,7 @@ class Cluster(object):
        self.rmax (maximum distance from the origin).
        all atoms are non-emitters.
        
-        @param atomtype: (int) chemical element number
+        @param atomtype: chemical element number (int) or symbol (str)
        
        @param v_pos: (numpy.ndarray, shape = (3))
            position vector of the first atom (basis vector)
@ -307,7 +311,7 @@ class Cluster(object):
        and z_surf (position of the surface).
        all atoms are non-emitters.

-        @param atomtype: (int) chemical element number
+        @param atomtype: chemical element number (int) or symbol (str)
        
        @param v_pos: (numpy.ndarray, shape = (3))
            position vector of the first atom (basis vector)
@ -1133,7 +1137,7 @@ class Cluster(object):
        np.savetxt(f, data, fmt=file_format, header=header, comments="")


-class ClusterGenerator(object):
+class ClusterGenerator(config.ConfigurableObject):
    """
    cluster generator class.

@ -1151,6 +1155,7 @@ class ClusterGenerator(object):
        @param project: reference to the project object.
            cluster generators may need to look up project parameters.
        """
+        super().__init__()
        self.project = project

    def count_emitters(self, model, index):
@ -1258,7 +1263,7 @@ class LegacyClusterGenerator(ClusterGenerator):
    """

    def __init__(self, project):
-        super(LegacyClusterGenerator, self).__init__(project)
+        super().__init__(project)

    def count_emitters(self, model, index):
        """
--- a/pmsco/config.py
+++ b/pmsco/config.py
@ -0,0 +1,120 @@
+"""
+@package pmsco.config
+infrastructure for configurable objects
+
+@author Matthias Muntwiler
+
+@copyright (c) 2021 by Paul Scherrer Institut @n
+Licensed under the Apache License, Version 2.0 (the "License"); @n
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+"""
+
+import collections.abc
+import functools
+import inspect
+import logging
+from pathlib import Path
+
+logger = logging.getLogger(__name__)
+
+
+def resolve_path(path, dirs):
+    """
+    resolve a file path by replacing placeholders
+
+    placeholders are enclosed in curly braces.
+    values for all possible placeholders are provided in a dictionary.
+
+    @param path: str, Path or other path-like.
+        example: '{work}/test/testfile.dat'.
+    @param dirs: dictionary mapping placeholders to project paths.
+        the paths can be str, Path or other path-like
+        example: {'work': '/home/user/work'}
+    @return: pathlib.Path object
+    """
+    return Path(*(p.format(**dirs) for p in Path(path).parts))
+
+
+class ConfigurableObject(object):
+    """
+    Parent class for objects that can be configured by a run file
+
+    the run file is a JSON file that contains object data in a nested dictionary structure.
+
+    in the dictionary structure the keys are property or attribute names of the object to be initialized.
+    keys starting with a non-alphabetic character (except for some special keys like __class__) are ignored.
+    these can be used as comments, or they protect private attributes.
+
+    the values can be numeric values, strings, lists or dictionaries.
+
+    simple values are simply assigned using setattr.
+    this may call a property setter if defined.
+
+    lists are iterated. each item is appended to the attribute.
+    the attribute must implement an append method in this case.
+
+    if an item is a dictionary and contains the special key '__class__',
+    an object of that class is instantiated and recursively initialized with the dictionary elements.
+    this requires that the class can be found in the module scope passed to the parser methods,
+    and that the class inherits from this class.
+
+    cases that can't be covered easily using this mechanism
+    should be implemented in a property setter.
+    value-checking should also be done in a property setter (or the append method in sequence-like objects).
+    """
+    def __init__(self):
+        pass
+
+    def set_properties(self, module, data_dict, project):
+        """
+        set properties of this class.
+
+        @param module: module reference that should be used to resolve class names.
+            this is usually the project module.
+        @param data_dict: dictionary of properties to set.
+            see the class description for details.
+        @param project: reference to the project object.
+        @return: None
+        """
+        for key in data_dict:
+            if key[0].isalpha():
+                self.set_property(module, key, data_dict[key], project)
+
+    def set_property(self, module, key, value, project):
+        obj = self.parse_object(module, value, project)
+        if hasattr(self, key):
+            if obj is not None:
+                if isinstance(obj, collections.abc.MutableSequence):
+                    attr = getattr(self, key)
+                    for item in obj:
+                        attr.append(item)
+                elif isinstance(obj, collections.abc.Mapping):
+                    d = getattr(self, key)
+                    if d is not None and isinstance(d, collections.abc.MutableMapping):
+                        d.update(obj)
+                    else:
+                        setattr(self, key, obj)
+                else:
+                    setattr(self, key, obj)
+            else:
+                setattr(self, key, obj)
+        else:
+            logger.warning(f"class {self.__class__.__name__} does not have attribute {key}.")
+
+    def parse_object(self, module, value, project):
+        if isinstance(value, collections.abc.MutableMapping) and "__class__" in value:
+            cn = value["__class__"].split('.')
+            c = functools.reduce(getattr, cn, module)
+            s = inspect.signature(c)
+            if 'project' in s.parameters:
+                o = c(project=project)
+            else:
+                o = c()
+            o.set_properties(module, value, project)
+        elif isinstance(value, collections.abc.MutableSequence):
+            o = [self.parse_object(module, i, project) for i in value]
+        else:
+            o = value
+        return o
--- a/pmsco/dispatch.py
+++ b/pmsco/dispatch.py
@ -4,16 +4,13 @@ calculation dispatcher.

@author Matthias Muntwiler

-@copyright (c) 2015 by Paul Scherrer Institut @n
+@copyright (c) 2015-21 by Paul Scherrer Institut @n
 Licensed under the Apache License, Version 2.0 (the "License"); @n
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
 import os
 import os.path
 import datetime
@ -21,10 +18,20 @@ import signal
 import collections
 import copy
 import logging
-import math

 from attrdict import AttrDict
-from mpi4py import MPI
+
+try:
+    from mpi4py import MPI
+    mpi_comm = MPI.COMM_WORLD
+    mpi_size = mpi_comm.Get_size()
+    mpi_rank = mpi_comm.Get_rank()
+except ImportError:
+    MPI = None
+    mpi_comm = None
+    mpi_size = 1
+    mpi_rank = 0
+
 from pmsco.helpers import BraceMessage as BMsg

 logger = logging.getLogger(__name__)
@ -521,8 +528,7 @@ class MscoProcess(object):
    #
    #  the default is 2 days after start.

-    def __init__(self, comm):
-        self._comm = comm
+    def __init__(self):
        self._project = None
        self._atomic_scattering = None
        self._multiple_scattering = None
@ -829,12 +835,12 @@ class MscoMaster(MscoProcess):
    #       the values are handlers.TaskHandler objects.
    #       the objects can be accessed in attribute or dictionary notation.

-    def __init__(self, comm):
-        super(MscoMaster, self).__init__(comm)
+    def __init__(self):
+        super().__init__()
        self._pending_tasks = collections.OrderedDict()
        self._running_tasks = collections.OrderedDict()
        self._complete_tasks = collections.OrderedDict()
-        self._slaves = self._comm.Get_size() - 1
+        self._slaves = mpi_size - 1
        self._idle_ranks = []
        self.max_calculations = 1000000
        self._calculations = 0
@ -879,8 +885,8 @@ class MscoMaster(MscoProcess):
        self._idle_ranks = list(range(1, self._running_slaves + 1))

        self._root_task = CalculationTask()
-        self._root_task.file_root = project.output_file
-        self._root_task.model = project.create_model_space().start
+        self._root_task.file_root = str(project.output_file)
+        self._root_task.model = project.model_space.start

        for level in self.task_levels:
            self.task_handlers[level] = project.handler_classes[level]()
@ -1033,7 +1039,7 @@ class MscoMaster(MscoProcess):
                else:
                    logger.debug("assigning task %s to rank %u", str(task.id), rank)
                    self._running_tasks[task.id] = task
-                    self._comm.send(task.get_mpi_message(), dest=rank, tag=TAG_NEW_TASK)
+                    mpi_comm.send(task.get_mpi_message(), dest=rank, tag=TAG_NEW_TASK)
                    self._calculations += 1
        else:
            if not self._finishing:
@ -1055,7 +1061,7 @@ class MscoMaster(MscoProcess):
        while self._idle_ranks:
            rank = self._idle_ranks.pop()
            logger.debug("send finish tag to rank %u", rank)
-            self._comm.send(None, dest=rank, tag=TAG_FINISH)
+            mpi_comm.send(None, dest=rank, tag=TAG_FINISH)
            self._running_slaves -= 1

    def _receive_result(self):
@ -1065,7 +1071,7 @@ class MscoMaster(MscoProcess):
        if self._running_slaves > 0:
            logger.debug("waiting for calculation result")
            s = MPI.Status()
-            data = self._comm.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=s)
+            data = mpi_comm.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=s)

            if s.tag == TAG_NEW_RESULT:
                task_id = self._accept_task_done(data)
@ -1185,8 +1191,8 @@ class MscoSlave(MscoProcess):
    #
    #       typically, a task is aborted when an exception is encountered.

-    def __init__(self, comm):
-        super(MscoSlave, self).__init__(comm)
+    def __init__(self):
+        super().__init__()
        self._errors = 0
        self._max_errors = 5

@ -1199,7 +1205,7 @@ class MscoSlave(MscoProcess):
        self._running = True
        while self._running:
            logger.debug("waiting for message")
-            data = self._comm.recv(source=0, tag=MPI.ANY_TAG, status=s)
+            data = mpi_comm.recv(source=0, tag=MPI.ANY_TAG, status=s)
            if s.tag == TAG_NEW_TASK:
                logger.debug("received new task")
                self.accept_task(data)
@ -1229,17 +1235,17 @@ class MscoSlave(MscoProcess):
            logger.exception(BMsg("unhandled exception in calculation task {0}", task.id))
            self._errors += 1
            if self._errors <= self._max_errors:
-                self._comm.send(data, dest=0, tag=TAG_INVALID_RESULT)
+                mpi_comm.send(data, dest=0, tag=TAG_INVALID_RESULT)
            else:
                logger.error("too many exceptions, aborting")
                self._running = False
-                self._comm.send(data, dest=0, tag=TAG_ERROR_ABORTING)
+                mpi_comm.send(data, dest=0, tag=TAG_ERROR_ABORTING)
        else:
            logger.debug(BMsg("sending result of task {0} to master", result.id))
-            self._comm.send(result.get_mpi_message(), dest=0, tag=TAG_NEW_RESULT)
+            mpi_comm.send(result.get_mpi_message(), dest=0, tag=TAG_NEW_RESULT)


-def run_master(mpi_comm, project):
+def run_master(project):
    """
    initialize and run the master calculation loop.

@ -1251,25 +1257,25 @@ def run_master(mpi_comm, project):
    if an unhandled exception occurs, this function aborts the MPI communicator, killing all MPI processes.
    the caller will not have a chance to handle the exception.

-    @param mpi_comm: MPI communicator (mpi4py.MPI.COMM_WORLD).
-
    @param project: project instance (sub-class of project.Project).
    """
    try:
-        master = MscoMaster(mpi_comm)
+        master = MscoMaster()
        master.setup(project)
        master.run()
        master.cleanup()
    except (SystemExit, KeyboardInterrupt):
-        mpi_comm.Abort()
+        if mpi_comm:
+            mpi_comm.Abort()
        raise
    except Exception:
        logger.exception("unhandled exception in master calculation loop.")
-        mpi_comm.Abort()
+        if mpi_comm:
+            mpi_comm.Abort()
        raise


-def run_slave(mpi_comm, project):
+def run_slave(project):
    """
    initialize and run the slave calculation loop.

@ -1282,12 +1288,10 @@ def run_slave(mpi_comm, project):
    unless it is a SystemExit or KeyboardInterrupt (where we expect that the master also receives the signal),
    the MPI communicator is aborted, killing all MPI processes.

-    @param mpi_comm: MPI communicator (mpi4py.MPI.COMM_WORLD).
-
    @param project: project instance (sub-class of project.Project).
    """
    try:
-        slave = MscoSlave(mpi_comm)
+        slave = MscoSlave()
        slave.setup(project)
        slave.run()
        slave.cleanup()
@ -1295,7 +1299,8 @@ def run_slave(mpi_comm, project):
        raise
    except Exception:
        logger.exception("unhandled exception in slave calculation loop.")
-        mpi_comm.Abort()
+        if mpi_comm:
+            mpi_comm.Abort()
        raise


@ -1307,12 +1312,9 @@ def run_calculations(project):

    @param project: project instance (sub-class of project.Project).
    """
-    mpi_comm = MPI.COMM_WORLD
-    mpi_rank = mpi_comm.Get_rank()
-
    if mpi_rank == 0:
        logger.debug("MPI rank %u setting up master loop", mpi_rank)
-        run_master(mpi_comm, project)
+        run_master(project)
    else:
        logger.debug("MPI rank %u setting up slave loop", mpi_rank)
-        run_slave(mpi_comm, project)
+        run_slave(project)
--- a/pmsco/edac/edac.i
+++ b/pmsco/edac/edac.i
@ -1,7 +0,0 @@
-/* EDAC interface for other programs */
-%module edac
-%{
-extern int run_script(char *scriptfile);
-%}
-
-extern int run_script(char *scriptfile);
--- a/pmsco/elements/bindingenergy.json
+++ b/pmsco/elements/bindingenergy.json
--- a/pmsco/elements/bindingenergy.py
+++ b/pmsco/elements/bindingenergy.py
@ -10,6 +10,8 @@ the binding energies are compiled from Gwyn Williams' web page
 (https://userweb.jlab.org/~gwyn/ebindene.html).
 please refer to the original web page or the x-ray data booklet
 for original sources, definitions and remarks.
+binding energies of gases are replaced by respective values of a common compound
+from the 'handbook of x-ray photoelectron spectroscopy' (physical electronics, inc., 1995).

 usage
 -----
@ -52,15 +54,47 @@ from pmsco.compat import open
 index_energy = np.zeros(0)
 index_number = np.zeros(0)
 index_term = []
+default_data_path = os.path.join(os.path.dirname(__file__), "bindingenergy.json")


-def load_data():
-    data_path = os.path.join(os.path.dirname(__file__), "bindingenergy.json")
+def load_data(data_path=None):
+    """
+    load binding energy data from json file
+
+    the data file must be in the same format as generated by save_data.
+
+    @param file path of the data file. default: "bindingenergy.json" next to this module file
+
+    @return dictionary
+    """
+    if data_path is None:
+        data_path = default_data_path
    with open(data_path) as fp:
        data = json.load(fp)
    return data


+def save_data(data_path=None):
+    """
+    save binding energy data to json file
+
+    @param file path of the data file. default: "bindingenergy.json" next to this module file
+
+    @return None
+    """
+    if data_path is None:
+        data_path = default_data_path
+    data = {}
+    for element in pt.elements:
+        element_data = {}
+        for term, energy in element.binding_energy.items():
+            element_data[term] = energy
+        if element_data:
+            data[element.number] = element_data
+    with open(data_path, 'w', 'utf8') as fp:
+        json.dump(data, fp, sort_keys=True, indent='\t')
+
+
 def init(table, reload=False):
    if 'binding_energy' in table.properties and not reload:
        return
@ -142,6 +176,9 @@ def export_flat_text(f):
    """
    export the binding energies to a flat general text file.

+    the file has four space-separated columns `number`, `symbol`, `term`, `energy`.
+    column names are included in the first row.
+
    @param f: file path or open file object
    @return: None
    """
@ -153,3 +190,23 @@ def export_flat_text(f):
    else:
        with open(f, "w") as fi:
            export_flat_text(fi)
+
+
+def import_flat_text(f):
+    """
+    import binding energies from a flat general text file.
+
+    data is in space-separated columns.
+    the first row contains column names.
+    at least the columns `number`, `term`, `energy` must be present.
+
+    the function updates existing entries and appends entries of non-existing terms.
+    existing terms that are not listed in the file remain unchanged.
+
+    @param f: file path or open file object
+
+    @return: None
+    """
+    data = np.atleast_1d(np.genfromtxt(f, names=True, dtype=None, encoding="utf8"))
+    for d in data:
+        pt.elements[d['number']].binding_energy[d['term']] = d['energy']
--- a/pmsco/elements/spectrum.py
+++ b/pmsco/elements/spectrum.py
@ -92,6 +92,8 @@ def get_cross_section(photon_energy, element, nlj):
    @return: (float) cross section in Mb.
    """
    nl = nlj[0:2]
+    if not hasattr(element, "photoionization"):
+        element = get_element(element)
    try:
        pet, cst = element.photoionization.cross_section[nl]
    except KeyError:
@ -196,3 +198,11 @@ def plot_spectrum(photon_energy, elements, binding_energy=False, work_function=4
    ax.set_ylabel('intensity')
    ax.set_title(elements)
    return fig, ax
+
+
+def plot_cross_section(el, nlj):
+    energy = np.arange(100, 1500, 140)
+    cs = get_cross_section(energy, el, nlj)
+    fig, ax = plt.subplots()
+    ax.set_yscale("log")
+    ax.plot(energy, cs)
--- a/pmsco/graphics/population.py
+++ b/pmsco/graphics/population.py
@ -0,0 +1,443 @@
+"""
+@package pmsco.graphics.population
+graphics rendering module for population dynamics.
+
+the main function is render_genetic_chart().
+
+this module is experimental.
+interface and implementation are subject to change.
+
+@author Matthias Muntwiler, matthias.muntwiler@psi.ch
+
+@copyright (c) 2021 by Paul Scherrer Institut @n
+Licensed under the Apache License, Version 2.0 (the "License"); @n
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+"""
+
+import logging
+import numpy as np
+import os
+from pmsco.database import regular_params, special_params
+
+logger = logging.getLogger(__name__)
+
+try:
+    from matplotlib.figure import Figure
+    from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
+    # from matplotlib.backends.backend_pdf import FigureCanvasPdf
+    # from matplotlib.backends.backend_svg import FigureCanvasSVG
+except ImportError:
+    Figure = None
+    FigureCanvas = None
+    logger.warning("error importing matplotlib. graphics rendering disabled.")
+
+
+def _default_range(pos):
+    """
+    determine a default range from actual values.
+
+    @param pos: (numpy.ndarray) 1-dimensional structured array of parameter values.
+    @return: range_min, range_max are dictionaries of the minimum and maximum values of each parameter.
+    """
+    names = regular_params(pos.dtype.names)
+    range_min = {}
+    range_max = {}
+    for name in names:
+        range_min[name] = pos[name].min()
+        range_max[name] = pos[name].max()
+    return range_min, range_max
+
+
+def _prune_constant_params(pnames, range_min, range_max):
+    """
+    remove constant parameters from the list and range
+
+    @param pnames: (list)
+    @param range_min: (dict)
+    @param range_max: (dict)
+    @return:
+    """
+    del_names = [name for name in pnames if range_max[name] <= range_min[name]]
+    for name in del_names:
+        pnames.remove(name)
+        del range_min[name]
+        del range_max[name]
+
+
+def render_genetic_chart(output_file, input_data_or_file, model_space=None, generations=None, title=None, cmap=None,
+                         canvas=None):
+    """
+    produce a genetic chart from a given population.
+
+    a genetic chart is a pseudo-colour representation of the coordinates of each individual in the model space.
+    the axes are the particle number and the model parameter.
+    the colour is mapped from the relative position of a parameter value within the parameter range.
+
+    the chart should illustrate the diversity in the population.
+    converged parameters will show similar colours.
+    by comparing charts of different generations, the effect of the optimization algorithm can be examined.
+    though the chart type is designed for the genetic algorithm, it may be useful for other algorithms as well.
+
+    the function requires input in one of the following forms:
+    - a result (.dat) file or numpy structured array.
+      the array must contain regular parameters, as well as the _particle and _gen columns.
+      the function generates one chart per generation unless the generation argument is specified.
+    - a population (.pop) file or numpy structured array.
+      the array must contain regular parameters, as well as the _particle columns.
+    - a pmsco.optimizers.population.Population object with valid data.
+
+    the graphics file format can be changed by providing a specific canvas. default is PNG.
+
+    this function requires the matplotlib module.
+    if it is not available, the function raises an error.
+
+    @param output_file: path and base name of the output file without extension.
+        a generation index and the file extension according to the file format are appended.
+    @param input_data_or_file: a numpy structured ndarray of a population or result list from an optimization run.
+        alternatively, the file path of a result file (.dat) or population file (.pop) can be given.
+        file can be any object that numpy.genfromtxt() can handle.
+    @param model_space: model space can be a pmsco.project.ModelSpace object,
+        any object that contains the same min and max attributes as pmsco.project.ModelSpace,
+        or a dictionary with to keys 'min' and 'max' that provides the corresponding ModelSpace dictionaries.
+        by default, the model space boundaries are derived from the input data.
+        if a model_space is specified, only the parameters listed in it are plotted.
+    @param generations: (int or sequence) generation index or list of indices.
+        this index is used in the output file name and for filtering input data by generation.
+        if the input data does not contain the generation, no filtering is applied.
+        by default, no filtering is applied, and one graph for each generation is produced.
+    @param title: (str) title of the chart.
+        the title is a {}-style format string, where {base} is the output file name and {gen} is the generation.
+        default: derived from file name.
+    @param cmap: (str) name of colour map supported by matplotlib.
+        default is 'jet'.
+        other good-looking options are 'PiYG', 'RdBu', 'RdYlGn', 'coolwarm'.
+    @param canvas: a FigureCanvas class reference from a matplotlib backend.
+        if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
+        some other options are:
+        matplotlib.backends.backend_pdf.FigureCanvasPdf or
+        matplotlib.backends.backend_svg.FigureCanvasSVG.
+
+    @return (str) path and name of the generated graphics file.
+        empty string if an error occurred.
+
+    @raise TypeError if matplotlib is not available.
+    """
+
+    try:
+        pos = np.copy(input_data_or_file.pos)
+        range_min = input_data_or_file.model_min
+        range_max = input_data_or_file.model_max
+        generations = [input_data_or_file.generation]
+    except AttributeError:
+        try:
+            pos = np.atleast_1d(np.genfromtxt(input_data_or_file, names=True))
+        except TypeError:
+            pos = np.copy(input_data_or_file)
+        range_min, range_max = _default_range(pos)
+    pnames = regular_params(pos.dtype.names)
+
+    if model_space is not None:
+        try:
+            # a ModelSpace-like object
+            range_min = model_space.min
+            range_max = model_space.max
+        except AttributeError:
+            # a dictionary-like object
+            range_min = model_space['min']
+            range_max = model_space['max']
+        try:
+            pnames = range_min.keys()
+        except AttributeError:
+            pnames = range_min.dtype.names
+
+    pnames = list(pnames)
+    _prune_constant_params(pnames, range_min, range_max)
+
+    if generations is None:
+        try:
+            generations = np.unique(pos['_gen'])
+        except ValueError:
+            pass
+
+    files = []
+    path, base = os.path.split(output_file)
+    if generations is not None and len(generations):
+        if title is None:
+            title = "{base} gen {gen}"
+
+        for generation in generations:
+            idx = np.where(pos['_gen'] == generation)
+            gpos = pos[idx]
+            gtitle = title.format(base=base, gen=int(generation))
+            out_filename = "{base}-{gen}".format(base=os.fspath(output_file), gen=int(generation))
+            out_filename = _render_genetic_chart_2(out_filename, gpos, pnames, range_min, range_max,
+                                                   gtitle, cmap, canvas)
+            files.append(out_filename)
+    else:
+        if title is None:
+            title = "{base}"
+        gtitle = title.format(base=base, gen="")
+        out_filename = "{base}".format(base=os.fspath(output_file))
+        out_filename = _render_genetic_chart_2(out_filename, pos, pnames, range_min, range_max, gtitle, cmap, canvas)
+        files.append(out_filename)
+
+    return files
+
+
+def _render_genetic_chart_2(out_filename, pos, pnames, range_min, range_max, title, cmap, canvas):
+    """
+    internal part of render_genetic_chart()
+
+    this function calculates the relative position in the model space,
+    sorts the positions array by particle index,
+    and calls plot_genetic_chart().
+
+    @param out_filename:
+    @param pos:
+    @param pnames:
+    @param range_max:
+    @param range_min:
+    @param cmap:
+    @param canvas:
+    @return: out_filename
+    """
+    spos = np.sort(pos, order='_particle')
+    rpos2d = np.zeros((spos.shape[0], len(pnames)))
+    for index, pname in enumerate(pnames):
+        rpos2d[:, index] = (spos[pname] - range_min[pname]) / (range_max[pname] - range_min[pname])
+    out_filename = plot_genetic_chart(out_filename, rpos2d, pnames, title=title, cmap=cmap, canvas=canvas)
+    return out_filename
+
+
+def plot_genetic_chart(filename, rpos2d, param_labels, title=None, cmap=None, canvas=None):
+    """
+    produce a genetic chart from the given data.
+
+    a genetic chart is a pseudo-colour representation of the coordinates of each individual in the model space.
+    the chart should highlight the amount of diversity in the population
+    and - by comparing charts of different generations - the changes due to mutation.
+    the axes are the model parameter (x) and particle number (y).
+    the colour is mapped from the relative position of a parameter value within the parameter range.
+
+    in contrast to render_genetic_chart() this function contains only the drawing code.
+    it requires input in the final form and does not do any checks, conversion or processing.
+
+    the graphics file format can be changed by providing a specific canvas. default is PNG.
+
+    this function requires the matplotlib module.
+    if it is not available, the function raises an error.
+
+    @param filename: path and name of the output file without extension.
+    @param rpos2d: (two-dimensional numpy array of numeric type)
+        relative positions of the particles in the model space.
+        dimension 0 (y-axis) is the particle index,
+        dimension 1 (x-axis) is the parameter index (in the order given by param_labels).
+        all values must be between 0 and 1.
+    @param param_labels: (sequence) list or tuple of parameter names.
+    @param title: (str) string to be printed as chart title. default is 'genetic chart'.
+    @param cmap: (str) name of colour map supported by matplotlib.
+        default is 'jet'.
+        other good-looking options are 'PiYG', 'RdBu', 'RdYlGn', 'coolwarm'.
+    @param canvas: a FigureCanvas class reference from a matplotlib backend.
+        if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
+        some other options are:
+        matplotlib.backends.backend_pdf.FigureCanvasPdf or
+        matplotlib.backends.backend_svg.FigureCanvasSVG.
+
+    @raise TypeError if matplotlib is not available.
+    """
+    if canvas is None:
+        canvas = FigureCanvas
+    if cmap is None:
+        cmap = 'jet'
+    if title is None:
+        title = 'genetic chart'
+
+    fig = Figure()
+    canvas(fig)
+    ax = fig.add_subplot(111)
+    im = ax.imshow(rpos2d, aspect='auto', cmap=cmap, origin='lower')
+    im.set_clim((0.0, 1.0))
+    ax.set_xticks(np.arange(len(param_labels)))
+    ax.set_xticklabels(param_labels, rotation=45, ha="right", rotation_mode="anchor")
+    ax.set_ylabel('particle')
+    ax.set_title(title)
+    cb = ax.figure.colorbar(im, ax=ax)
+    cb.ax.set_ylabel("relative value", rotation=-90, va="bottom")
+
+    out_filename = "{base}.{ext}".format(base=filename, ext=canvas.get_default_filetype())
+    fig.savefig(out_filename)
+    return out_filename
+
+
+def render_swarm(output_file, input_data, model_space=None, title=None, cmap=None, canvas=None):
+    """
+    render a two-dimensional particle swarm population.
+
+    this function generates a schematic rendering of a particle swarm in two dimensions.
+    particles are represented by their position and velocity, indicated by an arrow.
+    the model space is projected on the first two (or selected two) variable parameters.
+    in the background, a scatter plot of results (dots with pseudocolor representing the R-factor) can be plotted.
+    the chart type is designed for the particle swarm optimization algorithm.
+
+    the function requires input in one of the following forms:
+    - position (.pos), velocity (.vel) and result (.dat) files or the respective numpy structured arrays.
+      the arrays must contain regular parameters, as well as the `_particle` column.
+      the result file must also contain an `_rfac` column.
+    - a pmsco.optimizers.population.Population object with valid data.
+
+    the graphics file format can be changed by providing a specific canvas. default is PNG.
+
+    this function requires the matplotlib module.
+    if it is not available, the function raises an error.
+
+    @param output_file: path and base name of the output file without extension.
+        a generation index and the file extension according to the file format are appended.
+    @param input_data: a pmsco.optimizers.population.Population object with valid data,
+        or a sequence of position, velocity and result arrays.
+        the arrays must be structured ndarrays corresponding to the respective Population members.
+        alternatively, the arrays can be referenced as file paths
+        in any format that numpy.genfromtxt() can handle.
+    @param model_space: model space can be a pmsco.project.ModelSpace object,
+        any object that contains the same min and max attributes as pmsco.project.ModelSpace,
+        or a dictionary with to keys 'min' and 'max' that provides the corresponding ModelSpace dictionaries.
+        by default, the model space boundaries are derived from the input data.
+        if a model_space is specified, only the parameters listed in it are plotted.
+    @param title: (str) title of the chart.
+        the title is a {}-style format string, where {base} is the output file name and {gen} is the generation.
+        default: derived from file name.
+    @param cmap: (str) name of colour map supported by matplotlib.
+        default is 'plasma'.
+        other good-looking options are 'viridis', 'plasma', 'inferno', 'magma', 'cividis'.
+    @param canvas: a FigureCanvas class reference from a matplotlib backend.
+        if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
+        some other options are:
+        matplotlib.backends.backend_pdf.FigureCanvasPdf or
+        matplotlib.backends.backend_svg.FigureCanvasSVG.
+
+    @return (str) path and name of the generated graphics file.
+        empty string if an error occurred.
+
+    @raise TypeError if matplotlib is not available.
+    """
+    try:
+        range_min = input_data.model_min
+        range_max = input_data.model_max
+        pos = np.copy(input_data.pos)
+        vel = np.copy(input_data.vel)
+        rfac = np.copy(input_data.results)
+        generation = input_data.generation
+    except AttributeError:
+        try:
+            pos = np.atleast_1d(np.genfromtxt(input_data[0], names=True))
+            vel = np.atleast_1d(np.genfromtxt(input_data[1], names=True))
+            rfac = np.atleast_1d(np.genfromtxt(input_data[2], names=True))
+        except TypeError:
+            pos = np.copy(input_data[0])
+            vel = np.copy(input_data[1])
+            rfac = np.copy(input_data[2])
+        range_min, range_max = _default_range(rfac)
+    pnames = regular_params(pos.dtype.names)
+
+    if model_space is not None:
+        try:
+            # a ModelSpace-like object
+            range_min = model_space.min
+            range_max = model_space.max
+        except AttributeError:
+            # a dictionary-like object
+            range_min = model_space['min']
+            range_max = model_space['max']
+        try:
+            pnames = range_min.keys()
+        except AttributeError:
+            pnames = range_min.dtype.names
+
+    pnames = list(pnames)
+    _prune_constant_params(pnames, range_min, range_max)
+    pnames = pnames[0:2]
+    files = []
+    if len(pnames) == 2:
+        params = {pnames[0]: [range_min[pnames[0]], range_max[pnames[0]]],
+                  pnames[1]: [range_min[pnames[1]], range_max[pnames[1]]]}
+        out_filename = plot_swarm(output_file, pos, vel, rfac, params, title=title, cmap=cmap, canvas=canvas)
+        files.append(out_filename)
+    else:
+        logging.warning("model space must be two-dimensional and non-degenerate.")
+
+    return files
+
+
+def plot_swarm(filename, pos, vel, rfac, params, title=None, cmap=None, canvas=None):
+    """
+    plot a two-dimensional particle swarm population.
+
+    this is a sub-function of render_swarm() containing just the plotting commands.
+
+    the graphics file format can be changed by providing a specific canvas. default is PNG.
+
+    this function requires the matplotlib module.
+    if it is not available, the function raises an error.
+
+    @param filename: path and base name of the output file without extension.
+        a generation index and the file extension according to the file format are appended.
+    @param pos: structured ndarray containing the positions of the particles.
+    @param vel: structured ndarray containing the velocities of the particles.
+    @param rfac: structured ndarray containing positions and R-factor values.
+        this array is independent of pos and vel.
+        it can also be set to None if results should be suppressed.
+    @param params: dictionary of two parameters to be plotted.
+        the keys correspond to columns of the pos, vel and rfac arrays.
+        the values are lists [minimum, maximum] that define the axis range.
+    @param title: (str) title of the chart.
+        the title is a {}-style format string, where {base} is the output file name and {gen} is the generation.
+        default: derived from file name.
+    @param cmap: (str) name of colour map supported by matplotlib.
+        default is 'plasma'.
+        other good-looking options are 'viridis', 'plasma', 'inferno', 'magma', 'cividis'.
+    @param canvas: a FigureCanvas class reference from a matplotlib backend.
+        if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
+        some other options are:
+        matplotlib.backends.backend_pdf.FigureCanvasPdf or
+        matplotlib.backends.backend_svg.FigureCanvasSVG.
+
+    @return (str) path and name of the generated graphics file.
+        empty string if an error occurred.
+
+    @raise TypeError if matplotlib is not available.
+    """
+    if canvas is None:
+        canvas = FigureCanvas
+    if cmap is None:
+        cmap = 'plasma'
+    if title is None:
+        title = 'swarm map'
+
+    pnames = list(params.keys())
+    fig = Figure()
+    canvas(fig)
+    ax = fig.add_subplot(111)
+
+    if rfac is not None:
+        try:
+            s = ax.scatter(rfac[params[0]], rfac[params[1]], s=5, c=rfac['_rfac'], cmap=cmap, vmin=0, vmax=1)
+        except ValueError:
+            # _rfac column missing
+            pass
+        else:
+            cb = ax.figure.colorbar(s, ax=ax)
+            cb.ax.set_ylabel("R-factor", rotation=-90, va="bottom")
+
+    p = ax.plot(pos[pnames[0]], pos[pnames[1]], 'co')
+    q = ax.quiver(pos[pnames[0]], pos[pnames[1]], vel[pnames[0]], vel[pnames[1]], color='c')
+    ax.set_xlim(params[pnames[0]])
+    ax.set_ylim(params[pnames[1]])
+    ax.set_xlabel(pnames[0])
+    ax.set_ylabel(pnames[1])
+    ax.set_title(title)
+
+    out_filename = "{base}.{ext}".format(base=filename, ext=canvas.get_default_filetype())
+    fig.savefig(out_filename)
+    return out_filename
--- a/pmsco/graphics/scan.py
+++ b/pmsco/graphics/scan.py
@ -7,16 +7,13 @@ interface and implementation are subject to change.

@author Matthias Muntwiler, matthias.muntwiler@psi.ch

-@copyright (c) 2018 by Paul Scherrer Institut @n
+@copyright (c) 2018-21 by Paul Scherrer Institut @n
 Licensed under the Apache License, Version 2.0 (the "License"); @n
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
 import logging
 import math
 import numpy as np
@ -135,9 +132,8 @@ def render_ea_scan(filename, data, scan_mode, canvas=None, is_modf=False):
        im.set_cmap("RdBu_r")
        dhi = max(abs(dlo), abs(dhi))
        dlo = -dhi
-        im.set_clim((dlo, dhi))
+        im.set_clim((-1., 1.))
        try:
-            # requires matplotlib 2.1.0
            ti = cb.get_ticks()
            ti = [min(ti), 0., max(ti)]
            cb.set_ticks(ti)
@ -213,9 +209,8 @@ def render_tp_scan(filename, data, canvas=None, is_modf=False):
        # im.set_cmap("coolwarm")
        dhi = max(abs(dlo), abs(dhi))
        dlo = -dhi
-        pc.set_clim((dlo, dhi))
+        pc.set_clim((-1., 1.))
        try:
-            # requires matplotlib 2.1.0
            ti = cb.get_ticks()
            ti = [min(ti), 0., max(ti)]
            cb.set_ticks(ti)
@ -226,9 +221,12 @@ def render_tp_scan(filename, data, canvas=None, is_modf=False):
        # im.set_cmap("inferno")
        # im.set_cmap("viridis")
        pc.set_clim((dlo, dhi))
-        ti = cb.get_ticks()
-        ti = [min(ti), max(ti)]
-        cb.set_ticks(ti)
+        try:
+            ti = cb.get_ticks()
+            ti = [min(ti), max(ti)]
+            cb.set_ticks(ti)
+        except AttributeError:
+            pass

    out_filename = "{0}.{1}".format(filename, canvas.get_default_filetype())
    fig.savefig(out_filename)
--- a/pmsco/handlers.py
+++ b/pmsco/handlers.py
@ -40,23 +40,20 @@ the scan and domain handlers call methods of the project class to invoke project

@author Matthias Muntwiler, matthias.muntwiler@psi.ch

-@copyright (c) 2015-18 by Paul Scherrer Institut @n
+@copyright (c) 2015-21 by Paul Scherrer Institut @n
 Licensed under the Apache License, Version 2.0 (the "License"); @n
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
 import datetime
 from functools import reduce
 import logging
 import math
 import numpy as np
 import os
+from pathlib import Path

 from pmsco.compat import open
 import pmsco.data as md
@ -377,7 +374,7 @@ class SingleModelHandler(ModelHandler):
        keys = [key for key in self.result]
        keys.sort(key=lambda t: t[0].lower())
        vals = (str(self.result[key]) for key in keys)
-        filename = self._project.output_file + ".dat"
+        filename = Path(self._project.output_file).with_suffix(".dat")
        with open(filename, "w") as outfile:
            outfile.write("# ")
            outfile.write(" ".join(keys))
@ -437,11 +434,11 @@ class ScanHandler(TaskHandler):

        if project.combined_scan is not None:
            ext = md.format_extension(project.combined_scan)
-            filename = project.output_file + ext
+            filename = Path(project.output_file).with_suffix(ext)
            md.save_data(filename, project.combined_scan)
        if project.combined_modf is not None:
            ext = md.format_extension(project.combined_modf)
-            filename = project.output_file + ".modf" + ext
+            filename = Path(project.output_file).with_suffix(".modf" + ext)
            md.save_data(filename, project.combined_modf)

        return len(self._project.scans)
@ -695,7 +692,7 @@ class EmitterHandler(TaskHandler):
            the estimate is based on the start parameters, scan 0 and domain 0.
        """
        super(EmitterHandler, self).setup(project, slots)
-        mock_model = self._project.create_model_space().start
+        mock_model = self._project.model_space.start
        mock_index = dispatch.CalcID(-1, 0, 0, -1, -1)
        n_emitters = project.cluster_generator.count_emitters(mock_model, mock_index)
        return n_emitters
--- a/pmsco/optimizers/grid.py
+++ b/pmsco/optimizers/grid.py
@ -304,7 +304,7 @@ class GridSearchHandler(handlers.ModelHandler):
        super(GridSearchHandler, self).setup(project, slots)

        self._pop = GridPopulation()
-        self._pop.setup(self._project.create_model_space())
+        self._pop.setup(self._project.model_space)
        self._invalid_limit = max(slots, self._invalid_limit)

        self._outfile = open(self._project.output_file + ".dat", "w")
--- a/pmsco/optimizers/population.py
+++ b/pmsco/optimizers/population.py
@ -554,7 +554,7 @@ class Population(object):
        however, the patch is applied only upon the next execution of advance_population().

        an info or warning message is printed to the log
-        depending on whether the filed contained a complete dataset or not.
+        depending on whether the file contained a complete dataset or not.

        @attention patching a live population is a potentially dangerous operation.
        it may cause an optimization to abort because of an error in the file.
@ -1209,7 +1209,7 @@ class PopulationHandler(handlers.ModelHandler):
        return self._pop_size

    def setup_population(self):
-        self._pop.setup(self._pop_size, self._project.create_model_space(), **self._project.optimizer_params)
+        self._pop.setup(self._pop_size, self._project.model_space, **self._project.optimizer_params)

    def cleanup(self):
        super(PopulationHandler, self).cleanup()
--- a/pmsco/pmsco.py
+++ b/pmsco/pmsco.py
@ -6,12 +6,12 @@ PEARL Multiple-Scattering Calculation and Structural Optimization

 this is the top-level interface of the PMSCO package.
 all calculations (any mode, any project) start by calling the run_project() function of this module.
-the module also provides a command line parser for common options.
+the module also provides a command line and a run-file/run-dict interface.

 for parallel execution, prefix the command line with mpi_exec -np NN, where NN is the number of processes to use.
 note that in parallel mode, one process takes the role of the coordinator (master).
 the master does not run calculations and is idle most of the time.
-to benefit from parallel execution on a work station, NN should be the number of processors plus one.
+to benefit from parallel execution on a work station, NN should be the number of processors.
 on a cluster, the number of processes is chosen according to the available resources.

 all calculations can also be run in a single process.
@ -25,26 +25,35 @@ refer to the projects folder for examples.

@author Matthias Muntwiler, matthias.muntwiler@psi.ch

-@copyright (c) 2015-18 by Paul Scherrer Institut @n
+@copyright (c) 2015-21 by Paul Scherrer Institut @n
 Licensed under the Apache License, Version 2.0 (the "License"); @n
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
 import argparse
 from builtins import range
-import datetime
 import logging
 import importlib
-import os.path
+import commentjson as json
+from pathlib import Path
 import sys

-from mpi4py import MPI
+try:
+    from mpi4py import MPI
+    mpi_comm = MPI.COMM_WORLD
+    mpi_size = mpi_comm.Get_size()
+    mpi_rank = mpi_comm.Get_rank()
+except ImportError:
+    MPI = None
+    mpi_comm = None
+    mpi_size = 1
+    mpi_rank = 0
+
+pmsco_root = Path(__file__).resolve().parent.parent
+if str(pmsco_root) not in sys.path:
+    sys.path.insert(0, str(pmsco_root))

 import pmsco.dispatch as dispatch
 import pmsco.files as files
@ -71,40 +80,36 @@ def setup_logging(enable=False, filename="pmsco.log", level="WARNING"):

    @param enable: (bool) True=enable logging to the specified file,
        False=do not generate a log (null handler).
-    @param filename: (string) path and name of the log file.
+    @param filename: (Path-like) path and name of the log file.
        if this process is part of an MPI communicator,
        the function inserts a dot and the MPI rank of this process before the extension.
+        if the filename is empty, logging is disabled.
    @param level: (string) name of the log level.
        must be the name of one of "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL".
-        if empty or invalid, the function raises a ValueError.
+        if empty, logging is disabled.
+        if not a valid level, defaults to "WARNING".
    @return None
    """
-    numeric_level = getattr(logging, level.upper(), None)
-    if not isinstance(numeric_level, int):
-        raise ValueError('Invalid log level: %s' % level)
-
-    logger = logging.getLogger("")
-    logger.setLevel(numeric_level)
-
-    logformat = '%(asctime)s (%(name)s) %(levelname)s: %(message)s'
-    formatter = logging.Formatter(logformat)
+    enable = enable and str(filename) and level
+    numeric_level = getattr(logging, level.upper(), logging.WARNING)
+    root_logger = logging.getLogger()
+    root_logger.setLevel(numeric_level)

    if enable:
-        mpi_comm = MPI.COMM_WORLD
-        mpi_size = mpi_comm.Get_size()
        if mpi_size > 1:
-            mpi_rank = mpi_comm.Get_rank()
-            root, ext = os.path.splitext(filename)
-            filename = root + "." + str(mpi_rank) + ext
+            p = Path(filename)
+            filename = p.with_suffix(f".{mpi_rank}" + p.suffix)
+
+        log_format = '%(asctime)s (%(name)s) %(levelname)s: %(message)s'
+        formatter = logging.Formatter(log_format)

        handler = logging.FileHandler(filename, mode="w", delay=True)
        handler.setLevel(numeric_level)
-
        handler.setFormatter(formatter)
    else:
        handler = logging.NullHandler()

-    logger.addHandler(handler)
+    root_logger.addHandler(handler)


 def set_common_args(project, args):
@ -124,67 +129,58 @@ def set_common_args(project, args):

    @return: None
    """
-    log_file = "pmsco.log"

    if args.data_dir:
        project.data_dir = args.data_dir
    if args.output_file:
-        project.set_output(args.output_file)
-        log_file = args.output_file + ".log"
+        project.output_file = args.output_file
    if args.db_file:
        project.db_file = args.db_file
    if args.log_file:
-        log_file = args.log_file
-    setup_logging(enable=args.log_enable, filename=log_file, level=args.log_level)
-
-    logger.debug("creating project")
-    mode = args.mode.lower()
-    if mode in {'single', 'grid', 'swarm', 'genetic', 'table'}:
-        project.mode = mode
-    else:
-        logger.error("invalid optimization mode '%s'.", mode)
-
-    if args.pop_size:
-        project.optimizer_params['pop_size'] = args.pop_size
-
-    if args.seed_file:
-        project.optimizer_params['seed_file'] = args.seed_file
-    if args.seed_limit:
-        project.optimizer_params['seed_limit'] = args.seed_limit
-    if args.table_file:
-        project.optimizer_params['table_file'] = args.table_file
-
+        project.log_file = args.log_file
+    if args.log_level:
+        project.log_level = args.log_level
+    if not args.log_enable:
+        project.log_file = ""
+        project.log_level = ""
+    if args.mode:
+        project.mode = args.mode.lower()
    if args.time_limit:
-        project.set_timedelta_limit(datetime.timedelta(hours=args.time_limit))
-
+        project.time_limit = args.time_limit
    if args.keep_files:
-        if "all" in args.keep_files:
-            cats = set([])
-        else:
-            cats = files.FILE_CATEGORIES - set(args.keep_files)
-        cats -= {'report'}
-        if mode == 'single':
-            cats -= {'model'}
-        project.files.categories_to_delete = cats
-    if args.keep_levels > project.keep_levels:
-        project.keep_levels = args.keep_levels
-    if args.keep_best > project.keep_best:
-        project.keep_best = args.keep_best
+        project.keep_files = args.keep_files
+    if args.keep_levels:
+        project.keep_levels = max(args.keep_levels, project.keep_levels)
+    if args.keep_best:
+        project.keep_best = max(args.keep_best, project.keep_best)


 def run_project(project):
    """
    run a calculation project.

-    @param project:
-    @return:
+    the function sets up logging, validates the project, chooses the handler classes,
+    and passes control to the pmsco.dispatch module to run the calculations.
+
+    @param project: fully initialized project object.
+        the validate method is called as part of this function after setting up the logger.
+    @return: None
    """
-    # log project arguments only in rank 0
-    mpi_comm = MPI.COMM_WORLD
-    mpi_rank = mpi_comm.Get_rank()
+
+    log_file = Path(project.log_file)
+    if not log_file.name:
+        log_file = Path(project.job_name).with_suffix(".log")
+    if log_file.name:
+        log_file.parent.mkdir(exist_ok=True)
+        log_level = project.log_level
+    else:
+        log_level = ""
+    setup_logging(enable=bool(log_level), filename=log_file, level=log_level)
    if mpi_rank == 0:
        project.log_project_args()

+    project.validate()
+
    optimizer_class = None
    if project.mode == 'single':
        optimizer_class = handlers.SingleModelHandler
@ -221,6 +217,34 @@ def run_project(project):
        logger.error("undefined project, optimizer, or calculator.")


+def schedule_project(project, run_dict):
+    """
+    schedule a calculation project.
+
+    the function validates the project and submits a job to the scheduler.
+
+    @param project: fully initialized project object.
+        the validate method is called as part of this function.
+
+    @param run_dict: dictionary holding the contents of the run file.
+
+    @return: None
+    """
+    assert mpi_rank == 0
+    setup_logging(enable=False)
+
+    project.validate()
+
+    schedule_dict = run_dict['schedule']
+    module = importlib.import_module(schedule_dict['__module__'])
+    schedule_class = getattr(module, schedule_dict['__class__'])
+    schedule = schedule_class(project)
+    schedule.set_properties(module, schedule_dict, project)
+    schedule.run_dict = run_dict
+    schedule.validate()
+    schedule.submit()
+
+
 class Args(object):
    """
    arguments of the main function.
@ -233,7 +257,7 @@ class Args(object):
    values as the command line parser.
    """

-    def __init__(self, mode="single", output_file="pmsco_data"):
+    def __init__(self):
        """
        constructor.
        
@ -242,12 +266,8 @@ class Args(object):
        other parameters may be required depending on the project
        and/or the calculation mode.
        """
-        self.mode = mode
-        self.pop_size = 0
-        self.seed_file = ""
-        self.seed_limit = 0
        self.data_dir = ""
-        self.output_file = output_file
+        self.output_file = ""
        self.db_file = ""
        self.time_limit = 24.0
        self.keep_files = files.FILE_CATEGORIES_TO_KEEP
@ -256,13 +276,9 @@ class Args(object):
        self.log_level = "WARNING"
        self.log_file = ""
        self.log_enable = True
-        self.table_file = ""


-def get_cli_parser(default_args=None):
-    if not default_args:
-        default_args = Args()
-
+def get_cli_parser():
    KEEP_FILES_CHOICES = files.FILE_CATEGORIES | {'all'}

    parser = argparse.ArgumentParser(
@ -290,56 +306,45 @@ def get_cli_parser(default_args=None):
    # for simplicity, the parser does not check these requirements.
    # all parameters are optional and accepted regardless of mode.
    # errors may occur if implicit requirements are not met.
-    parser.add_argument('project_module',
+    parser.add_argument('project_module', nargs='?',
                        help="path to custom module that defines the calculation project")
-    parser.add_argument('-m', '--mode', default=default_args.mode,
+    parser.add_argument('-r', '--run-file',
+                        help="path to run-time parameters file which contains all program arguments. " +
+                        "must be in JSON format.")
+    parser.add_argument('-m', '--mode',
                        choices=['single', 'grid', 'swarm', 'genetic', 'table'],
                        help='calculation mode')
-    parser.add_argument('--pop-size', type=int, default=default_args.pop_size,
-                        help='population size (number of particles) in swarm or genetic optimization mode. ' +
-                        'default is the greater of 4 or the number of calculation processes.')
-    parser.add_argument('--seed-file',
-                        help='path and name of population seed file. ' +
-                        'population data of previous optimizations can be used to seed a new optimization. ' +
-                        'the file must have the same structure as the .pop or .dat files.')
-    parser.add_argument('--seed-limit', type=int, default=default_args.seed_limit,
-                        help='maximum number of models to use from the seed file. ' +
-                        'the models with the best R-factors are selected.')
-    parser.add_argument('-d', '--data-dir', default=default_args.data_dir,
+    parser.add_argument('-d', '--data-dir',
                        help='directory path for experimental data files (if required by project). ' +
                             'default: working directory')
-    parser.add_argument('-o', '--output-file', default=default_args.output_file,
+    parser.add_argument('-o', '--output-file',
                        help='base path for intermediate and output files.')
-    parser.add_argument('-b', '--db-file', default=default_args.db_file,
+    parser.add_argument('-b', '--db-file',
                        help='name of an sqlite3 database file where the results should be stored.')
-    parser.add_argument('--table-file',
-                        help='path and name of population table file for table optimization mode. ' +
-                        'the file must have the same structure as the .pop or .dat files.')
-    parser.add_argument('-k', '--keep-files', nargs='*', default=default_args.keep_files,
+    parser.add_argument('-k', '--keep-files', nargs='*',
                        choices=KEEP_FILES_CHOICES,
                        help='output file categories to keep after the calculation. '
                             'by default, cluster and model (simulated data) '
                             'of a limited number of best models are kept.')
-    parser.add_argument('--keep-best', type=int, default=default_args.keep_best,
+    parser.add_argument('--keep-best', type=int,
                        help='number of best models for which to keep result files '
                             '(at each node from root down to keep-levels).')
    parser.add_argument('--keep-levels', type=int, choices=range(5),
-                        default=default_args.keep_levels,
                        help='task level down to which result files of best models are kept. '
                             '0 = model, 1 = scan, 2 = domain, 3 = emitter, 4 = region.')
-    parser.add_argument('-t', '--time-limit', type=float, default=default_args.time_limit,
+    parser.add_argument('-t', '--time-limit', type=float,
                        help='wall time limit in hours. the optimizers try to finish before the limit.')
-    parser.add_argument('--log-file', default=default_args.log_file,
+    parser.add_argument('--log-file',
                        help='name of the main log file. ' +
                             'under MPI, the rank of the process is inserted before the extension.')
-    parser.add_argument('--log-level', default=default_args.log_level,
+    parser.add_argument('--log-level',
                        help='minimum level of log messages. DEBUG, INFO, WARNING, ERROR, CRITICAL.')
    feature_parser = parser.add_mutually_exclusive_group(required=False)
    feature_parser.add_argument('--log-enable', dest='log_enable', action="store_true",
                        help="enable logging. by default, logging is on.")
    feature_parser.add_argument('--log-disable', dest='log_enable', action='store_false',
                        help="disable logging. by default, logging is on.")
-    parser.set_defaults(log_enable=default_args.log_enable)
+    parser.set_defaults(log_enable=True)

    return parser

@ -350,52 +355,135 @@ def parse_cli():

    @return: Namespace object created by the argument parser.
    """
-    default_args = Args()
-    parser = get_cli_parser(default_args)
+    parser = get_cli_parser()

    args, unknown_args = parser.parse_known_args()

    return args, unknown_args


-def import_project_module(path):
+def import_module(module_name):
    """
-    import the custom project module.
+    import a custom module by name.

-    imports the project module given its file path.
-    the path is expanded to its absolute form and appended to the python path.
+    import a module given its file path or module name (like in an import statement).

-    @param path: path and name of the module to be loaded.
-        path is optional and defaults to the python path.
-        if the name includes an extension, it is stripped off.
+    preferably, the module name should be given as in an import statement.
+    as the top-level pmsco directory is on the python path,
+    the module name will begin with `projects` for a custom project module or `pmsco` for a core pmsco module.
+    in this case, the function just calls importlib.import_module.
+
+    if a file path is given, i.e., `module_name` links to an existing file and has a `.py` extension,
+    the function extracts the directory path,
+    inserts it into the python path,
+    and calls importlib.import_module on the stem of the file name.
+
+    @note the file path remains in the python path.
+    this option should be used carefully to avoid breaking file name resolution.
+
+    @param module_name: file path or module name.
+        file path is interpreted relative to the working directory.

    @return: the loaded module as a python object
    """
-    path, name = os.path.split(path)
-    name, __ = os.path.splitext(name)
-    path = os.path.abspath(path)
-    sys.path.append(path)
-    project_module = importlib.import_module(name)
-    return project_module
+    p = Path(module_name)
+    if p.is_file() and p.suffix == ".py":
+        path = p.parent.resolve()
+        module_name = p.stem
+        if path not in sys.path:
+            sys.path.insert(0, path)
+
+    module = importlib.import_module(module_name)
+    return module
+
+
+def main_dict(run_params):
+    """
+    main function with dictionary run-time parameters
+
+    this starts the whole process with all direct parameters.
+    the command line is not parsed.
+    no run-file is loaded (just the project module).
+
+    @param run_params: dictionary with the same structure as the JSON run-file.
+
+    @return: None
+    """
+    project_params = run_params['project']
+
+    module = importlib.import_module(project_params['__module__'])
+    try:
+        project_class = getattr(module, project_params['__class__'])
+    except KeyError:
+        project = module.create_project()
+    else:
+        project = project_class()
+
+    project._module = module
+    project.directories['pmsco'] = Path(__file__).parent
+    project.directories['project'] = Path(module.__file__).parent
+    project.set_properties(module, project_params, project)
+    run_project(project)


 def main():
+    """
+    main function with command line parsing
+
+    this function starts the whole process with parameters from the command line.
+
+    if the command line contains a run-file parameter, it determines the module to load and the project parameters.
+    otherwise, the command line parameters apply.
+
+    the project class can be specified either in the run-file or the project module.
+    if the run-file specifies a class name, that class is looked up in the project module and instantiated.
+    otherwise, the module's create_project is called.
+
+    @return: None
+    """
    args, unknown_args = parse_cli()

-    if args:
-        module = import_project_module(args.project_module)
-        try:
-            project_args = module.parse_project_args(unknown_args)
-        except NameError:
-            project_args = None
+    try:
+        with open(args.run_file, 'r') as f:
+            rf = json.load(f)
+    except AttributeError:
+        rfp = {'__module__': args.project_module}
+    else:
+        rfp = rf['project']

+    module = import_module(rfp['__module__'])
+    try:
+        project_args = module.parse_project_args(unknown_args)
+    except AttributeError:
+        project_args = None
+
+    try:
+        project_class = getattr(module, rfp['__class__'])
+    except (AttributeError, KeyError):
        project = module.create_project()
-        set_common_args(project, args)
-        try:
-            module.set_project_args(project, project_args)
-        except NameError:
-            pass
+    else:
+        project = project_class()
+        project_args = None

+    project._module = module
+    project.directories['pmsco'] = Path(__file__).parent
+    project.directories['project'] = Path(module.__file__).parent
+    project.set_properties(module, rfp, project)
+
+    set_common_args(project, args)
+    try:
+        if project_args:
+            module.set_project_args(project, project_args)
+    except AttributeError:
+        pass
+
+    try:
+        schedule_enabled = rf['schedule']['enabled']
+    except KeyError:
+        schedule_enabled = False
+    if schedule_enabled:
+        schedule_project(project, rf)
+    else:
        run_project(project)


--- a/pmsco/project.py
+++ b/pmsco/project.py
@ -19,36 +19,32 @@ the ModelSpace and CalculatorParams classes are typically used unchanged.

@author Matthias Muntwiler, matthias.muntwiler@psi.ch

-@copyright (c) 2015 by Paul Scherrer Institut @n
+@copyright (c) 2015-21 by Paul Scherrer Institut @n
 Licensed under the Apache License, Version 2.0 (the "License"); @n
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
 import collections
 import copy
 import datetime
 import git
 import logging
 import numpy as np
-import os.path
+from pathlib import Path
 import socket
-import sys

 from pmsco.calculators.calculator import InternalAtomicCalculator
 from pmsco.calculators.edac import EdacCalculator
-import pmsco.cluster as mc
+import pmsco.cluster
+import pmsco.config as config
 from pmsco.compat import open
 import pmsco.data as md
-import pmsco.database as database
-import pmsco.dispatch as dispatch
-import pmsco.files as files
-import pmsco.handlers as handlers
+import pmsco.database
+import pmsco.dispatch
+import pmsco.files
+import pmsco.handlers
 from pmsco.helpers import BraceMessage as BMsg

 logger = logging.getLogger(__name__)
@ -157,6 +153,34 @@ class ModelSpace(object):
        """
        return ParamSpace(self.start[name], self.min[name], self.max[name], self.step[name])

+    def set_param_dict(self, d):
+        """
+        initialize model space from dictionary.
+
+        @param d: dictionary with two levels:
+            the top level are parameter names,
+            the second level the space descriptors 'start', 'min', 'max', 'step' and 'width'.
+            see add_param() for possible combinations.
+        @return: None
+        """
+        self.__init__()
+        for k, v in d.items():
+            self.add_param(k, **v)
+
+    def get_param_dict(self):
+        """
+        return model space parameters in dictionary form
+
+        the top level are parameter names,
+        the second level the space descriptors 'start', 'min', 'max' and 'step'.
+
+        @return: dict
+        """
+        d = {}
+        for name in self.start:
+            d[name] = {self.start[name], self.min[name], self.max[name], self.step[name]}
+        return d
+

 class CalculatorParams(object):
    """
@ -568,9 +592,166 @@ class Scan(object):
            self.raw_data[dim] = grid[i].reshape(-1)
        self.raw_data['i'] = 1

+    def load(self):
+        return self
+
+
+class ScanKey(config.ConfigurableObject):
+    """
+    create a Scan object based on a project-supplied dictionary
+
+    this class can be used in a run file to create a scan object based on the scan_dict attribute of the project.
+    this may be convenient if you're project should selectively use scans out of a long list of data files
+    and you don't want to clutter up the run file with parameters that don't change.
+
+    to do so, set the key property to match an item of scan_dict.
+    the load method will look up the corresponding scan_dict item and construct the final Scan object.
+    """
+    def __init__(self, project=None):
+        super().__init__()
+        self.key = ""
+        self.project = project
+
+    def load(self, dirs=None):
+        """
+        load the selected scan as specified in the project's scan dictionary
+
+        the method uses ScanLoader or ScanCreator as an intermediate.
+
+        @return a new Scan object which contains the loaded data.
+        """
+        scan_spec = self.project.scan_dict[self.key]
+        if hasattr(scan_spec, 'positions'):
+            loader = ScanCreator()
+        else:
+            loader = ScanLoader()
+        for k, v in scan_spec.items():
+            setattr(loader, k, v)
+        scan = loader.load(dirs=dirs)
+        return scan
+
+
+class ScanLoader(config.ConfigurableObject):
+    """
+    create a Scan object from a data file reference
+
+    this class can be used in a run file to create a scan object from an experimental data file.
+    to do so, fill the properties with values as documented.
+    the load() method is called when the project is run.
+    """
+
+    ## @var filename (string)
+    # file name from which the scan should be loaded.
+    # the file name can contain a format specifier like {project} to include the base path.
+
+    ## @var emitter (string)
+    # chemical symbol and, optionally following, further specification (chemical state, environment, ...)
+    # of photo-emitting atoms.
+    # the interpretation of this string is up to the project and its cluster generator.
+    # it should, however, always start with a chemical element symbol.
+    #
+    # examples: 'Ca' (calcium), 'CA' (carbon A), 'C a' (carbon a), 'C 1' (carbon one), 'N=O', 'FeIII'.
+
+    ## @var initial_state (string)
+    # nl term of initial state
+    #
+    # in the form expected by EDAC, for example: '2p1/2'
+
+    ## @var is_modf (bool)
+    # declares whether the data file contains the modulation function rather than intensity values
+    #
+    # if false, the project will calculate a modulation function from the raw data
+
+    def __init__(self):
+        super().__init__()
+        self.filename = ""
+        self.emitter = ""
+        self.initial_state = "1s"
+        self.is_modf = False
+
+    def load(self, dirs=None):
+        """
+        load the scan according to specification
+
+        create a new Scan object and load the file by calling Scan.import_scan_file().
+
+        @return a new Scan object which contains the loaded data file.
+        """
+        scan = Scan()
+        filename = config.resolve_path(self.filename, dirs)
+        scan.import_scan_file(filename, self.emitter, self.initial_state)
+        if self.is_modf:
+            scan.modulation = scan.raw_data
+        return scan
+
+
+class ScanCreator(config.ConfigurableObject):
+    """
+    create a Scan object from string expressions
+
+    this class can be used in a run file to create a scan object from python expressions,
+    such as lists, ranges or numpy functions.
+    to do so, fill the properties with values as documented.
+    the load() method is called when the project is run.
+
+    @note the raw_data property of the scan cannot be filled this way.
+    thus, the class is useful in `single` calculation mode only.
+    """
+
+    ## @var filename (string)
+    # name of the file which should receive the scan data.
+    # the file name can contain a format specifier like {project} to include the base path.
+
+    ## @var positions (dict)
+    # dictionary specifying the scan positions
+    #
+    # the dictionary must contain four keys: 'e', 't', 'p', 'a' representing the four scan axes.
+    # each key holds a string that contains a python expression.
+    # the string is evaluated using python's built-in eval() function.
+    # the expression must evaluate to an iterable object or numpy ndarray of the scan positions.
+    # the `np` namespace can be used to access numpy functions.
+    #
+    # example:
+    # the following dictionary generates a hemispherical scan
+    # self.position = {'e': '100', 't': 'np.linspace(0, 90, 91)', 'p': 'range(0, 360, 2)', 'a': '0'}
+
+    ## @var emitter (string)
+    # chemical symbol and, optionally following, further specification (chemical state, environment, ...)
+    # of photo-emitting atoms.
+    # the interpretation of this string is up to the project and its cluster generator.
+    # it should, however, always start with a chemical element symbol.
+    #
+    # examples: 'Ca' (calcium), 'CA' (carbon A), 'C a' (carbon a), 'C 1' (carbon one), 'N=O', 'FeIII'.
+
+    ## @var initial_state (string)
+    # nl term of initial state
+    #
+    # in the form expected by EDAC, for example: '2p1/2'
+
+    def __init__(self):
+        super().__init__()
+        self.filename = ""
+        self.positions = {'e': None, 't': None, 'p': None, 'a': None}
+        self.emitter = ""
+        self.initial_state = "1s"
+
+    def load(self, dirs=None):
+        """
+        create the scan according to specification
+
+        @return a new Scan object which contains the created scan array.
+        """
+        scan = Scan()
+        positions = {}
+        for axis in self.positions.keys():
+            positions[axis] = np.atleast_1d(np.asarray(eval(self.positions[axis])))
+        scan.define_scan(positions, self.emitter, self.initial_state)
+        scan.filename = config.resolve_path(self.filename, dirs)
+        return scan
+

 # noinspection PyMethodMayBeStatic
-class Project(object):
+class Project(config.ConfigurableObject):
    """
    base class of a calculation project.

@ -609,17 +790,18 @@ class Project(object):
    #

    ## @var scans (list of Scan objects)
-    #  list of experimental or scan files for which calculations are to be run.
+    # list of experimental scans for which calculations are to be run.
    #
-    #  the list must be populated by calling the add_scan() method.
-    #  this should be done in the create_project() function, or through the command line arguments.
+    # during project initialization, this list must be populated with Scan, ScanLoader or ScanCreator objects.
+    # while Scan objects contain all scan data, the latter two classes contain only scan specifications
+    # which are expanded (i.e. files are loaded or arrays are calculated) just before the calculations start.
+    # the Project.add_scan() method is a short-cut to create the respective scan object from few arguments.
+    # before the calculation starts, all objects are converted into fully specified Scan objects
+    # and scan data is loaded or calculated.
    #
-    #  the modulation function is calculated internally.
-    #  if your scan files contain the modulation function (as opposed to intensity),
-    #  you must add the files in the create_project() function.
-    #  the command line does not support loading modulation functions.
-    #
-    #  @c scans must be considered read-only. use project methods to change it.
+    # there are two ways to fill this list:
+    # either the project code fills it as a part of its initialization (create_project),
+    # or the list is populated via the run-file.

    ## @var domains (list of arbitrary objects)
    #  list of domains for which calculations are to be run.
@ -661,28 +843,22 @@ class Project(object):
    #   set this argument to False only if the calculation is a continuation of a previous one
    #   without any changes to the code.

-    ## @var data_dir
-    # directory path to experimental data.
+    ## @var directories
+    # dictionary for various directory paths.
    #
-    # the project should load experimental data (scan files) from this path.
-    # this attribute receives the --data-dir argument from the command line
-    # if the project parses the common arguments (pmsco.set_common_args).
-    #
-    # it is up to the project to define where to load scan files from.
-    # if the location of the files may depend on the machine or user account,
-    # the user may want to specify the data path on the command line.
-
-    ## @var output_dir (string)
-    # directory path for data files produced during the calculation, including intermediate files.
+    # home: user's home directory.
+    # data: where to load experimental data (scan files) from.
+    # project: directory of the project module.
+    # output: where to write output and intermediate files.
+    # temp: for temporary files.
    #
    # output_dir and output_file are set at once by @ref set_output.

-    ## @var output_file (string)
+    ## @var output_file (Path)
    # file name root for data files produced during the calculation, including intermediate files.
    #
-    # the file name should include the path. the path must also be set in @ref output_dir.
-    #
-    # output_dir and output_file are set at once by @ref set_output.
+    # this is the concatenation of self.directories['output'] and self.job_name.
+    # assignment to this property will update the two basic attributes.

    ## @var db_file (string)
    # name of an sqlite3 database file where the calculation results should be stored.
@ -694,14 +870,17 @@ class Project(object):
    #
    # the actual wall time may be longer by the remaining time of running calculations.
    # running calculations will not be aborted.
+    #
+    # the time_limit property is an alternative representation as hours.
+    # reading and writing accesses timedelta_limit.

    ## @var combined_scan
    # combined raw data from scans.
-    # updated by add_scan().
+    # updated by self.load_scans().

    ## @var combined_modf
    # combined modulation function from scans.
-    # updated by add_scan().
+    # updated by self.load_scans().

    ## @var files
    # list of all generated data files with metadata.
@ -741,14 +920,17 @@ class Project(object):
    #

    def __init__(self):
+        super().__init__()
+        self._module = None
        self.mode = "single"
-        self.job_name = ""
+        self.job_name = "pmsco0"
        self.job_tags = {}
        self.git_hash = ""
        self.description = ""
        self.features = {}
-        self.cluster_format = mc.FMT_EDAC
-        self.cluster_generator = mc.LegacyClusterGenerator(self)
+        self.cluster_format = pmsco.cluster.FMT_EDAC
+        self.cluster_generator = pmsco.cluster.LegacyClusterGenerator(self)
+        self._model_space = None
        self.scans = []
        self.domains = []
        self.optimizer_params = {
@ -758,39 +940,170 @@ class Project(object):
            'recalc_seed': True,
            'table_file': ""
        }
-        self.data_dir = ""
-        self.output_dir = ""
-        self.output_file = "pmsco_data"
+        self.directories = {
+            "home": Path.home(),
+            "work": Path.cwd(),
+            "data": "",
+            "project": "",
+            "output": "",
+            "temp": ""}
+        self.log_file = ""
+        self.log_level = "WARNING"
        self.db_file = ':memory:'
        self.timedelta_limit = datetime.timedelta(days=1)
        self.combined_scan = None
        self.combined_modf = None
-        self.files = files.FileTracker()
+        self.files = pmsco.files.FileTracker()
+        self.keep_files = list(pmsco.files.FILE_CATEGORIES_TO_KEEP)
        self.keep_levels = 1
        self.keep_best = 10
        self.handler_classes = {
-            'model': handlers.SingleModelHandler,
-            'scan': handlers.ScanHandler,
-            'domain': handlers.DomainHandler,
-            'emit': handlers.EmitterHandler,
-            'region': handlers.SingleRegionHandler
+            'model': pmsco.handlers.SingleModelHandler,
+            'scan': pmsco.handlers.ScanHandler,
+            'domain': pmsco.handlers.DomainHandler,
+            'emit': pmsco.handlers.EmitterHandler,
+            'region': pmsco.handlers.SingleRegionHandler
        }
        self.atomic_scattering_factory = InternalAtomicCalculator
        self.multiple_scattering_factory = EdacCalculator
        self._tasks_fields = []
-        self._db = database.ResultsDatabase()
+        self._db = pmsco.database.ResultsDatabase()
+
+    def validate(self):
+        """
+        validate the project parameters before starting the calculations
+
+        the method checks and fixes attributes that may cause trouble or go unnoticed if they are wrong.
+        in addition, it fixes attributes which may be incomplete after loading a run-file.
+        failed critical checks raise an exception (AssertionError, AttributeError, KeyError, ValueError).
+        checks that cause an attribute do revert to default, are logged as warning.
+
+        the following attributes are fixed silently:
+        - scattering factories that are declared as string are looked up in the project module.
+        - place holders in the directories attribute are resolved.
+        - place holders in the output_file attribute are resolved.
+        - output_file and output_dir are made consistent (so that output_file includes output_dir).
+        - the create_model_space() method is called if the model_space attribute is undefined.
+        - scan data are loaded.
+
+        @note to check the syntax of a run-file, set the calculation mode to 'validate' and run pmsco.
+        this will pass the validate method but will stop execution before calculations are started.
+
+        @raise AssertionError if a parameter is not correct.
+        @raise AttributeError if a class name cannot be resolved.
+        """
+        assert self.mode in {"single", "swarm", "genetic", "grid", "table", "test", "validate"}
+
+        if isinstance(self.atomic_scattering_factory, str):
+            self.atomic_scattering_factory = getattr(self._module, self.atomic_scattering_factory)
+        if isinstance(self.multiple_scattering_factory, str):
+            self.multiple_scattering_factory = getattr(self._module, self.multiple_scattering_factory)
+
+        self.directories = {k: config.resolve_path(Path(v), self.directories) for k, v in self.directories.items()}
+
+        assert len(str(self.output_file))
+        d = config.resolve_path(self.directories['output'], self.directories)
+        f = config.resolve_path(self.output_file, self.directories)
+        self.output_file = Path(d, f)
+        self.directories['output'] = self.output_file.parent
+
+        if self._model_space is None or not self._model_space.start:
+            logger.warning("undefined model_space attribute, trying project's create_model_space")
+            self._model_space = self.create_model_space()
+
+        self.load_scans()
+
+    @property
+    def data_dir(self):
+        return self.directories['data']
+
+    @data_dir.setter
+    def data_dir(self, path):
+        self.directories['data'] = Path(path)
+
+    @property
+    def output_dir(self):
+        return self.directories['output']
+
+    @output_dir.setter
+    def output_dir(self, path):
+        self.directories['output'] = Path(path)
+
+    @property
+    def output_file(self):
+        return Path(self.directories['output'], self.job_name)
+
+    @output_file.setter
+    def output_file(self, filename):
+        """
+        set path and base name of output file.
+
+        path is copied to the output_dir attribute.
+        the file stem is copied to the job_name attribute.
+
+        @param filename: (PathLike)
+        """
+        p = Path(filename)
+        s = str(p.parent)
+        if s and s != ".":
+            self.directories['output'] = p.parent
+        s = str(p.stem)
+        if s:
+            self.job_name = s
+        else:
+            raise ValueError("invalid output file name")
+
+    @property
+    def time_limit(self):
+        return self.timedelta_limit.total_seconds() / 3600 / 24
+
+    @time_limit.setter
+    def time_limit(self, hours):
+        self.timedelta_limit = datetime.timedelta(hours=hours)

    def create_model_space(self):
        """
        create a project.ModelSpace object which defines the allowed range for model parameters.

-        this method must be implemented by the actual project class.
-        the ModelSpace object must declare all model parameters used in the project.
+        there are three ways for a project to declare the model space:
+        1. implement the @ref create_model_space method.
+           this is the older way and may become deprecated in a future version.
+        2. assign a ModelSpace to the self.model_space property directly
+           (in the @ref validate method).
+        3. declare the model space in the run-file.
+
+        this method is called by the validate method only if self._model_space is undefined.

        @return ModelSpace object
        """
        return None

+    @property
+    def model_space(self):
+        """
+        ModelSpace object that defines the allowed range for model parameters.
+
+        there are three ways for a project to declare the model space:
+        1. implement the @ref create_model_space method.
+           this is the older way and may become deprecated in a future version.
+        2. assign a ModelSpace to the self.model_space property directly
+           (in the @ref validate method).
+        3. declare the model space in the run-file.
+
+        initially, this property is None.
+        """
+        return self._model_space
+
+    @model_space.setter
+    def model_space(self, value):
+        if isinstance(value, ModelSpace):
+            self._model_space = value
+        elif hasattr(value, 'items'):
+            self._model_space = ModelSpace()
+            self._model_space.set_param_dict(value)
+        else:
+            raise ValueError("incompatible object type")
+
    def create_params(self, model, index):
        """
        create a CalculatorParams object given the model parameters and calculation index.
@ -816,11 +1129,15 @@ class Project(object):
        self.combined_scan = None
        self.combined_modf = None

-    def add_scan(self, filename, emitter, initial_state, is_modf=False, modf_model=None, positions=None):
+    def add_scan(self, filename, emitter, initial_state, is_modf=False, positions=None):
        """
-        add the file name of reference experiment and load it.
-        
-        the extension must be one of msc_data.DATATYPES (case insensitive)
+        add a scan specification to the scans list.
+
+        this is a shortcut for adding a ScanCreator or ScanLoader object to the self.scans list.
+        the creator or loader are converted into full Scan objects just before the calculation starts
+        (in the self.setup() method).
+
+        the extension must be one of pmsco.data.DATATYPES (case insensitive)
        corresponding to the meaning of the columns in the file.
        
        caution: EDAC can only calculate equidistant, rectangular scans.
@ -831,9 +1148,6 @@ class Project(object):
        * intensity vs theta, phi, or alpha
        * intensity vs theta and phi (hemisphere or hologram scan)

-        the method calculates the modulation function if @c is_modf is @c False.
-        it also updates @c combined_scan and @c combined_modf which may be used as R-factor comparison targets.
-
        @param filename: (string) file name of the experimental data, possibly including a path.
            the file is not loaded when the optional positions argument is present,
            but the filename may serve as basename for output files (e.g. modulation function).
@ -852,57 +1166,64 @@ class Project(object):
        @param is_modf: (bool) declares whether the file contains the modulation function (True),
            or intensity (False, default). In the latter case, the modulation function is calculated internally.

-        @param modf_model: (dict) model parameters to be passed to the modulation function.
-
        @return (Scan) the new scan object (which is also a member of self.scans).
        """
-        scan = Scan()
        if positions is not None:
-            scan.define_scan(positions, emitter, initial_state)
-            scan.filename = filename
+            scan = ScanCreator()
+            scan.positions = positions
        else:
-            scan.import_scan_file(filename, emitter, initial_state)
+            scan = ScanLoader()
+            scan.is_modf = is_modf
+
+        scan.filename = filename
+        scan.emitter = emitter
+        scan.initial_state = initial_state
        self.scans.append(scan)

-        if modf_model is None:
-            modf_model = {}
+        return scan

-        if scan.raw_data is not None:
-            if is_modf:
-                scan.modulation = scan.raw_data
-            else:
+    def load_scans(self):
+        """
+        load all scan data.
+
+        initially, the self.scans list may contain objects of different classes (Scan, ScanLoader, ScanCreator)
+        depending on the project initialization.
+        this method loads all data, so that the scans list contains only Scan objects.
+
+        also, the self.combined_scan and self.combined_modf fields are calculated from the scans.
+        """
+        has_raw_data = True
+        has_mod_func = True
+        loaded_scans = []
+
+        for idx, scan in enumerate(self.scans):
+            scan = scan.load(dirs=self.directories)
+            loaded_scans.append(scan)
+            if scan.modulation is None:
                try:
-                    scan.modulation = self.calc_modulation(scan.raw_data, modf_model)
+                    scan.modulation = self.calc_modulation(scan.raw_data, self.model_space.start)
                except ValueError:
-                    logger.error("error calculating the modulation function of experimental data.")
-                    scan.modulation = None
-        else:
-            scan.modulation = None
+                    logger.error(f"error calculating the modulation function of scan {idx}.")
+            has_raw_data = has_raw_data and scan.raw_data is not None
+            has_mod_func = has_mod_func and scan.modulation is not None
+        self.scans = loaded_scans

-        if scan.raw_data is not None:
-            if self.combined_scan is not None:
-                dt = md.common_dtype((self.combined_scan, scan.raw_data))
-                d1 = md.restructure_data(self.combined_scan, dt)
-                d2 = md.restructure_data(scan.raw_data, dt)
-                self.combined_scan = np.hstack((d1, d2))
-            else:
-                self.combined_scan = scan.raw_data.copy()
+        if has_raw_data:
+            stack1 = [scan.raw_data for scan in self.scans]
+            dtype = md.common_dtype(stack1)
+            stack2 = [md.restructure_data(data, dtype) for data in stack1]
+            self.combined_scan = np.hstack(tuple(stack2))
        else:
            self.combined_scan = None

-        if scan.modulation is not None:
-            if self.combined_modf is not None:
-                dt = md.common_dtype((self.combined_modf, scan.modulation))
-                d1 = md.restructure_data(self.combined_modf, dt)
-                d2 = md.restructure_data(scan.modulation, dt)
-                self.combined_modf = np.hstack((d1, d2))
-            else:
-                self.combined_modf = scan.modulation.copy()
+        if has_mod_func:
+            stack1 = [scan.modulation for scan in self.scans]
+            dtype = md.common_dtype(stack1)
+            stack2 = [md.restructure_data(data, dtype) for data in stack1]
+            self.combined_modf = np.hstack(tuple(stack2))
        else:
            self.combined_modf = None

-        return scan
-
    def clear_domains(self):
        """
        clear domains.
@ -933,42 +1254,6 @@ class Project(object):
        """
        self.domains.append(domain)

-    def set_output(self, filename):
-        """
-        set path and base name of output file.
-
-        path and name are copied to the output_file attribute.
-        path is copied to the output_dir attribute.
-
-        if the path is missing, the destination is the current working directory.
-        """
-        self.output_file = filename
-        path, name = os.path.split(filename)
-        self.output_dir = path
-        self.job_name = name
-
-    def set_timedelta_limit(self, timedelta, margin_minutes=10):
-        """
-        set the walltime limit with a safety margin.
-
-        this method sets the internal self.timedelta_limit attribute.
-        by default, a safety margin of 10 minutes is subtracted to the main argument
-        in order to increase the probability that the process ends in time.
-        if this is not wanted, the project class may override the method and provide its own margin.
-
-        the method is typically called with the command line time limit from the main module.
-
-        @note the safety margin could be applied at various levels.
-        it is done here because it can easily be overridden by the project subclass.
-        to keep run scripts simple, the command line can be given the same time limit
-        as the job scheduler of the computing cluster.
-
-        @param timedelta: (datetime.timedelta) max. duration of the calculation process (wall time).
-
-        @param margin_minutes: (int) safety margin in minutes to subtract from timedelta.
-        """
-        self.timedelta_limit = timedelta - datetime.timedelta(minutes=margin_minutes)
-
    def log_project_args(self):
        """
        send some common project attributes to the log.
@ -981,6 +1266,14 @@ class Project(object):
        @return: None
        """
        try:
+            for key in self.directories:
+                val = self.directories[key]
+                lev = logging.WARNING if val else logging.DEBUG
+                logger.log(lev, f"directories['{key}']: {val}")
+
+            logger.warning("output file: {0}".format(self.output_file))
+            logger.warning("database: {0}".format(self.db_file))
+
            logger.warning("atomic scattering: {0}".format(self.atomic_scattering_factory))
            logger.warning("multiple scattering: {0}".format(self.multiple_scattering_factory))
            logger.warning("optimization mode: {0}".format(self.mode))
@ -990,15 +1283,11 @@ class Project(object):
                lev = logging.WARNING if val else logging.DEBUG
                logger.log(lev, "optimizer_params['{k}']: {v}".format(k=key, v=val))

-            logger.warning("data directory: {0}".format(self.data_dir))
-            logger.warning("output file: {0}".format(self.output_file))
-            logger.warning("database: {0}".format(self.db_file))
-
-            _files_to_keep = files.FILE_CATEGORIES - self.files.categories_to_delete
+            _files_to_keep = pmsco.files.FILE_CATEGORIES - self.files.categories_to_delete
            logger.warning("intermediate files to keep: {0}".format(", ".join(_files_to_keep)))

            for idx, scan in enumerate(self.scans):
-                logger.warning(f"scan {idx}: {scan.filename} ({scan.emitter} {scan.initial_state}")
+                logger.warning(f"scan {idx}: {scan.filename} ({scan.emitter} {scan.initial_state})")
            for idx, dom in enumerate(self.domains):
                logger.warning(f"domain {idx}: {dom}")

@ -1247,16 +1536,26 @@ class Project(object):
        """
        self.git_hash = self.get_git_hash()
        fields = ["rfac"]
-        fields.extend(dispatch.CalcID._fields)
+        fields.extend(pmsco.dispatch.CalcID._fields)
        fields.append("secs")
        fields = ["_" + f for f in fields]
-        mspace = self.create_model_space()
-        model_fields = list(mspace.start.keys())
+        model_fields = list(self.model_space.start.keys())
        model_fields.sort(key=lambda name: name.lower())
        fields.extend(model_fields)
        self._tasks_fields = fields

-        with open(self.output_file + ".tasks.dat", "w") as outfile:
+        if 'all' in self.keep_files:
+            cats = set([])
+        else:
+            cats = pmsco.files.FILE_CATEGORIES - set(self.keep_files)
+        cats -= {'report'}
+        if self.mode == 'single':
+            cats -= {'model'}
+        self.files.categories_to_delete = cats
+
+        Path(self.output_file).parent.mkdir(parents=True, exist_ok=True)
+        tasks_file = Path(self.output_file).with_suffix(".tasks.dat")
+        with open(tasks_file, "w") as outfile:
            outfile.write("# ")
            outfile.write(" ".join(fields))
            outfile.write("\n")
@ -1311,7 +1610,8 @@ class Project(object):
                values_dict['_rfac'] = parent_task.rfac
                values_dict['_secs'] = parent_task.time.total_seconds()
                values_list = [values_dict[field] for field in self._tasks_fields]
-                with open(self.output_file + ".tasks.dat", "a") as outfile:
+                tasks_file = Path(self.output_file).with_suffix(".tasks.dat")
+                with open(tasks_file, "a") as outfile:
                    outfile.write(" ".join(format(value) for value in values_list) + "\n")

                db_id = self._db.insert_result(parent_task.id, values_dict)
@ -1548,11 +1848,11 @@ class Project(object):
        """
        _files = {}
        xyz_filename = filename + ".xyz"
-        cluster.save_to_file(xyz_filename, fmt=mc.FMT_XYZ)
+        cluster.save_to_file(xyz_filename, fmt=pmsco.cluster.FMT_XYZ)
        _files[xyz_filename] = 'cluster'

        xyz_filename = filename + ".emit.xyz"
-        cluster.save_to_file(xyz_filename, fmt=mc.FMT_XYZ, emitters_only=True)
+        cluster.save_to_file(xyz_filename, fmt=pmsco.cluster.FMT_XYZ, emitters_only=True)
        _files[xyz_filename] = 'cluster'

        return _files
--- a/pmsco/schedule.py
+++ b/pmsco/schedule.py
@ -0,0 +1,309 @@
+"""
+@package pmsco.schedule
+job schedule interface
+
+this module defines common infrastructure to submit a pmsco calculation job to a job scheduler such as slurm.
+
+the schedule can be defined as part of the run-file (see pmsco module).
+users may derive sub-classes in a separate module to adapt to their own computing cluster.
+
+the basic call sequence is:
+1. create a schedule object.
+2. initialize its properties with job parameters.
+3. validate()
+4. submit()
+
+@author Matthias Muntwiler, matthias.muntwiler@psi.ch
+
+@copyright (c) 2015-21 by Paul Scherrer Institut @n
+Licensed under the Apache License, Version 2.0 (the "License"); @n
+  you may not use this file except in compliance with the License.
+  You may obtain a copy of the License at
+  http://www.apache.org/licenses/LICENSE-2.0
+"""
+
+import collections.abc
+import commentjson as json
+import datetime
+import logging
+from pathlib import Path
+import shutil
+import subprocess
+import pmsco.config
+
+logger = logging.getLogger(__name__)
+
+
+class JobSchedule(pmsco.config.ConfigurableObject):
+    """
+    base class for job schedule
+
+    this class defines the abstract interface and some utilities.
+    derived classes may override any method, but should call the inherited method.
+
+    usage:
+    1. create object, assigning a project instance.
+    2. assign run_file.
+    3. call validate.
+    4. call submit.
+
+    this class' properties should not be listed in the run file - they will be overwritten.
+    """
+
+    ## @var enabled (bool)
+    #
+    # this parameter signals whether pmsco should schedule a job or run the calculation.
+    # it is not directly used by the schedule classes but by the pmsco module.
+    # it must be defined in the run file and set to true to submit the job to a scheduler.
+    # it is set to false in the run file copied to the job directory so that the job script starts the calculation.
+
+    def __init__(self, project):
+        super(JobSchedule, self).__init__()
+        self.project = project
+        self.enabled = False
+        self.run_dict = {}
+        self.job_dir = Path()
+        self.job_file = Path()
+        self.run_file = Path()
+        # directory that contains the pmsco and projects directories
+        self.pmsco_root = Path(__file__).parent.parent
+
+    def validate(self):
+        """
+        validate the job parameters.
+
+        make sure all object attributes are correct for submission.
+
+        @return: None
+        """
+        self.pmsco_root = Path(self.project.directories['pmsco']).parent
+        output_dir = Path(self.project.directories['output'])
+
+        assert self.pmsco_root.is_dir()
+        assert (self.pmsco_root / "pmsco").is_dir()
+        assert (self.pmsco_root / "projects").is_dir()
+        assert output_dir.is_dir()
+        assert self.project.job_name
+
+        self.job_dir = output_dir / self.project.job_name
+        self.job_dir.mkdir(parents=True, exist_ok=True)
+        self.job_file = (self.job_dir / self.project.job_name).with_suffix(".sh")
+        self.run_file = (self.job_dir / self.project.job_name).with_suffix(".json")
+
+    def submit(self):
+        """
+        submit the job to the scheduler.
+
+        as of this class, the method does to following:
+
+        1. copy source files
+        2. copy a patched version of the run file.
+        3. write the job file (_write_job_file must be implemented by a derived class).
+
+        @return: None
+        """
+        self._copy_source()
+        self._fix_run_file()
+        self._write_run_file()
+        self._write_job_file()
+
+    def _copy_source(self):
+        """
+        copy the source files to the job directory.
+
+        the source_dir and job_dir attributes must be correct.
+        the job_dir directory must not exist and will be created.
+
+        this is a utility method used internally by derived classes.
+
+        job_dir/pmsco/pmsco/**
+        job_dir/pmsco/projects/**
+        job_dir/job.sh
+        job_dir/job.json
+
+        @return: None
+        """
+
+        source = self.pmsco_root
+        dest = self.job_dir / "pmsco"
+        ignore = shutil.ignore_patterns(".*", "~*", "*~")
+        shutil.copytree(source / "pmsco", dest / "pmsco", ignore=ignore)
+        shutil.copytree(source / "projects", dest / "projects", ignore=ignore)
+
+    def _fix_run_file(self):
+        """
+        fix the run file.
+
+        patch some entries of self.run_dict so that it can be used as run file.
+        the following changes are made:
+        1. set schedule.enabled to false so that the calculation is run.
+        2. set the output directory to the job directory.
+        3. set the log file to the job directory.
+
+        @return: None
+        """
+        self.run_dict['schedule']['enabled'] = False
+        self.run_dict['project']['directories']['output'] = str(self.job_dir)
+        self.run_dict['project']['log_file'] = str((self.job_dir / self.project.job_name).with_suffix(".log"))
+
+    def _write_run_file(self):
+        """
+        copy the run file.
+
+        this is a JSON dump of self.run_dict to the self.run_file file.
+
+        @return: None
+        """
+        with open(self.run_file, "wt") as f:
+            json.dump(self.run_dict, f, indent=2)
+
+    def _write_job_file(self):
+        """
+        create the job script.
+
+        this method must be implemented by a derived class.
+        the script must be written to the self.job_file file.
+        don't forget to make the file executable.
+
+        @return: None
+        """
+        pass
+
+
+class SlurmSchedule(JobSchedule):
+    """
+    job schedule for a slurm scheduler.
+
+    this class implements commonly used features of the slurm scheduler.
+    host-specific features and the creation of the job file should be done in a derived class.
+    derived classes must, in particular, implement the _write_job_file method.
+    they can override other methods, too, but should call the inherited method first.
+
+    1. copy the source trees (pmsco and projects) to the job directory
+    2. copy a patched version of the run file.
+    3. call the submission command
+
+    the public properties of this class should be assigned from the run file.
+    """
+    def __init__(self, project):
+        super(SlurmSchedule, self).__init__(project)
+        self.host = ""
+        self.nodes = 1
+        self.tasks_per_node = 8
+        self.wall_time = datetime.timedelta(hours=1)
+        self.signal_time = 600
+        self.manual = True
+
+    @staticmethod
+    def parse_timedelta(td):
+        """
+        parse time delta input formats
+
+        converts a string or dictionary from run-file into datetime.timedelta.
+
+        @param td:
+            str: [days-]hours[:minutes[:seconds]]
+            dict: days, hours, minutes, seconds - at least one needs to be defined. values must be numeric.
+            datetime.timedelta - native type
+        @return: datetime.timedelta
+        """
+        if isinstance(td, str):
+            dt = {}
+            d = td.split("-")
+            if len(d) > 1:
+                dt['days'] = float(d.pop(0))
+            t = d[0].split(":")
+            try:
+                dt['hours'] = float(t.pop(0))
+                dt['minutes'] = float(t.pop(0))
+                dt['seconds'] = float(t.pop(0))
+            except (IndexError, ValueError):
+                pass
+            td = datetime.timedelta(**dt)
+        elif isinstance(td, collections.abc.Mapping):
+            td = datetime.timedelta(**td)
+        return td
+
+    def validate(self):
+        super(SlurmSchedule, self).validate()
+        self.wall_time = self.parse_timedelta(self.wall_time)
+        assert self.job_dir.is_absolute()
+
+    def submit(self):
+        """
+        call the sbatch command
+
+        if manual is true, the job files are generated but the job is not submitted.
+
+        @return: None
+        """
+        super(SlurmSchedule, self).submit()
+        args = ['sbatch', str(self.job_file)]
+        print(" ".join(args))
+        if self.manual:
+            print("manual run - job files created but not submitted")
+        else:
+            cp = subprocess.run(args)
+            cp.check_returncode()
+
+
+class PsiRaSchedule(SlurmSchedule):
+    """
+    job shedule for the Ra cluster at PSI.
+
+    this class selects specific features of the Ra cluster,
+    such as the partition and node type (24 or 32 cores).
+    it also implements the _write_job_file method.
+    """
+
+    ## @var partition (str)
+    #
+    # the partition is selected based on wall time and number of tasks by the validate() method.
+    # it should not be listed in the run file.
+
+    def __init__(self, project):
+        super(PsiRaSchedule, self).__init__(project)
+        self.partition = "shared"
+
+    def validate(self):
+        super(PsiRaSchedule, self).validate()
+        assert self.nodes <= 2
+        assert self.tasks_per_node <= 24 or self.tasks_per_node == 32
+        assert self.wall_time.total_seconds() >= 60
+        if self.wall_time.total_seconds() > 24 * 60 * 60:
+            self.partition = "week"
+        elif self.tasks_per_node < 24:
+            self.partition = "shared"
+        else:
+            self.partition = "day"
+        assert self.partition in ["day", "week", "shared"]
+
+    def _write_job_file(self):
+        lines = []
+
+        lines.append('#!/bin/bash')
+        lines.append('#SBATCH --export=NONE')
+        lines.append(f'#SBATCH --job-name="{self.project.job_name}"')
+        lines.append(f'#SBATCH --partition={self.partition}')
+        lines.append(f'#SBATCH --time={int(self.wall_time.total_seconds() / 60)}')
+        lines.append(f'#SBATCH --nodes={self.nodes}')
+        lines.append(f'#SBATCH --ntasks-per-node={self.tasks_per_node}')
+        if self.tasks_per_node > 24:
+            lines.append('#SBATCH --cores-per-socket=16')
+        # 0 - 65535 seconds
+        # currently, PMSCO does not react to signals properly
+        # lines.append(f'#SBATCH --signal=TERM@{self.signal_time}')
+        lines.append(f'#SBATCH --output="{self.project.job_name}.o.%j"')
+        lines.append(f'#SBATCH --error="{self.project.job_name}.e.%j"')
+        lines.append('module load psi-python36/4.4.0')
+        lines.append('module load gcc/4.8.5')
+        lines.append('module load openmpi/3.1.3')
+        lines.append('source activate pmsco')
+        lines.append(f'cd "{self.job_dir}"')
+        lines.append(f'mpirun python pmsco/pmsco -r {self.run_file.name}')
+        lines.append(f'cd "{self.job_dir}"')
+        lines.append('rm -rf pmsco')
+        lines.append('exit 0')
+
+        self.job_file.write_text("\n".join(lines))
+        self.job_file.chmod(0o755)
--- a/projects/twoatom/twoatom-energy.json
+++ b/projects/twoatom/twoatom-energy.json
@ -0,0 +1,93 @@
+{
+  // line comments using // or # prefix are allowed as an extension of JSON syntax
+  "project": {
+    "__module__": "projects.twoatom.twoatom",
+    "__class__": "TwoatomProject",
+    "job_name": "twoatom0002",
+    "job_tags": [],
+    "description": "",
+    "mode": "single",
+    "directories": {
+      "data": "",
+      "output": ""
+    },
+    "keep_files": [
+      "cluster",
+      "model",
+      "scan",
+      "report",
+      "population"
+    ],
+    "keep_best": 10,
+    "keep_levels": 1,
+    "time_limit": 24,
+    "log_file": "",
+    "log_level": "WARNING",
+    "cluster_generator": {
+      "__class__": "TwoatomCluster",
+      "atom_types": {
+        "A": "N",
+        "B": "Ni"
+      },
+      "model_dict": {
+        "dAB": "dNNi",
+        "th": "pNNi",
+        "ph": "aNNi"
+      }
+    },
+    "atomic_scattering_factory": "InternalAtomicCalculator",
+    "multiple_scattering_factory": "EdacCalculator",
+    "model_space": {
+      "dNNi": {
+        "start": 2.109,
+        "min": 2.0,
+        "max": 2.25,
+        "step": 0.05
+      },
+      "pNNi": {
+        "start": 15.0,
+        "min": 0.0,
+        "max": 30.0,
+        "step": 1.0
+      },
+      "V0": {
+        "start": 21.966,
+        "min": 15.0,
+        "max": 25.0,
+        "step": 1.0
+      },
+      "Zsurf": {
+        "start": 1.449,
+        "min": 0.5,
+        "max": 2.0,
+        "step": 0.25
+      }
+    },
+    "domains": [
+      {
+        "default": 0.0
+      }
+    ],
+    "scans": [
+      {
+        "__class__": "mp.ScanCreator",
+        "filename": "twoatom_energy_alpha.etpai",
+        "emitter": "N",
+        "initial_state": "1s",
+        "positions": {
+          "e": "np.arange(10, 400, 5)",
+          "t": "0",
+          "p": "0",
+          "a": "np.linspace(-30, 30, 31)"
+        }
+      }
+    ],
+    "optimizer_params": {
+      "pop_size": 0,
+      "seed_file": "",
+      "seed_limit": 0,
+      "recalc_seed": true,
+      "table_file": ""
+    }
+  }
+}
--- a/projects/twoatom/twoatom-hemi.json
+++ b/projects/twoatom/twoatom-hemi.json
@ -0,0 +1,90 @@
+{
+  // line comments using // or # prefix are allowed as an extension of JSON syntax
+  "project": {
+    "__module__": "projects.twoatom.twoatom",
+    "__class__": "TwoatomProject",
+    "job_name": "twoatom0001",
+    "job_tags": [],
+    "description": "",
+    "mode": "single",
+    "directories": {
+      "data": "",
+      "output": ""
+    },
+    "keep_files": [
+      "cluster",
+      "model",
+      "scan",
+      "report",
+      "population"
+    ],
+    "keep_best": 10,
+    "keep_levels": 1,
+    "time_limit": 24,
+    "log_file": "",
+    "log_level": "WARNING",
+    "cluster_generator": {
+      "__class__": "TwoatomCluster",
+      "atom_types": {
+        "A": "N",
+        "B": "Ni"
+      },
+      "model_dict": {
+        "dAB": "dNNi",
+        "th": "pNNi",
+        "ph": "aNNi"
+      }
+    },
+    "atomic_scattering_factory": "InternalAtomicCalculator",
+    "multiple_scattering_factory": "EdacCalculator",
+    "model_space": {
+        "dNNi": {
+          "start": 2.109,
+          "min": 2.0,
+          "max": 2.25,
+          "step": 0.05
+        },
+        "pNNi": {
+          "start": 15.0,
+          "min": 0.0,
+          "max": 30.0,
+          "step": 1.0
+        },
+        "V0": {
+          "start": 21.966,
+          "min": 15.0,
+          "max": 25.0,
+          "step": 1.0
+        },
+        "Zsurf": {
+          "start": 1.449,
+          "min": 0.5,
+          "max": 2.0,
+          "step": 0.25
+        }
+    },
+    "domains": [
+      {
+        "default": 0.0
+      }
+    ],
+    "scans": [
+      {
+        // class name as it would be used in the project module
+        "__class__": "mp.ScanLoader",
+        // any placeholder key from project.directories can be used
+        "filename": "{project}/twoatom_hemi_250e.etpi",
+        "emitter": "N",
+        "initial_state": "1s",
+        "is_modf": false
+      }
+    ],
+    "optimizer_params": {
+      "pop_size": 0,
+      "seed_file": "",
+      "seed_limit": 0,
+      "recalc_seed": true,
+      "table_file": ""
+    }
+  }
+}
--- a/projects/twoatom/twoatom.py
+++ b/projects/twoatom/twoatom.py
@ -308,14 +308,12 @@ def set_project_args(project, project_args):
    @param project_args: (Namespace object) project arguments.
    """

-    scans = ['tp250e']
+    scans = []
    try:
        if project_args.scans:
            scans = project_args.scans
-        else:
-            logger.warning(BMsg("missing scan argument, using {0}", scans[0]))
    except AttributeError:
-        logger.warning(BMsg("missing scan argument, using {0}", scans[0]))
+        pass

    for scan_key in scans:
        scan_spec = project.scan_dict[scan_key]
@ -337,7 +335,7 @@ def parse_project_args(_args):
    parser = argparse.ArgumentParser()

    # main arguments
-    parser.add_argument('-s', '--scans', nargs="*", default=['tp250e'],
+    parser.add_argument('-s', '--scans', nargs="*",
                        help="nick names of scans to use in calculation (see create_project function)")

    parsed_args = parser.parse_args(_args)
--- a/requirements.txt
+++ b/requirements.txt
@ -1,3 +1,4 @@
+python >= 3.6
 attrdict
 fasteners
 numpy >= 1.13
@ -11,3 +12,4 @@ matplotlib
 future
 swig
 gitpython
+commentjson
--- a/tests/test_project.py
+++ b/tests/test_project.py
@ -10,20 +10,17 @@ to run the tests, change to the directory which contains the tests directory, an

@author Matthias Muntwiler, matthias.muntwiler@psi.ch

-@copyright (c) 2015-18 by Paul Scherrer Institut @n
+@copyright (c) 2015-21 by Paul Scherrer Institut @n
 Licensed under the Apache License, Version 2.0 (the "License"); @n
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at
  http://www.apache.org/licenses/LICENSE-2.0
 """

-from __future__ import absolute_import
-from __future__ import division
-from __future__ import print_function
-
 import mock
 import numpy as np
 import os
+from pathlib import Path
 import unittest

 import pmsco.data as data
@ -31,6 +28,103 @@ import pmsco.dispatch as dispatch
 import pmsco.project as project


+class TestModelSpace(unittest.TestCase):
+    def setUp(self):
+        self.d1 = {
+            "A": {"start": 2.1, "min": 2.0, "max": 3.0, "step": 0.05},
+            "B": {"start": 15.0, "min": 0.0, "max": 30.0, "step": 1.0}}
+        self.d2 = {
+            "C": {"start": 22.0, "min": 15.0, "max": 25.0, "step": 1.0},
+            "D": {"start": 1.5, "min": 0.5, "max": 2.0, "step": 0.25}}
+
+    def test_add_param(self):
+        ms = project.ModelSpace()
+        ms.start['A'] = 2.1
+        ms.min['A'] = 2.0
+        ms.max['A'] = 3.0
+        ms.step['A'] = 0.05
+        ms.add_param("E", 5.0, 1.0, 9.0, 0.2)
+        ms.add_param("F", 8.0, width=6.0, step=0.5)
+        d_start = {'A': 2.1, 'E': 5.0, 'F': 8.0}
+        d_min = {'A': 2.0, 'E': 1.0, 'F': 5.0}
+        d_max = {'A': 3.0, 'E': 9.0, 'F': 11.0}
+        d_step = {'A': 0.05, 'E': 0.2, 'F': 0.5}
+        self.assertDictEqual(ms.start, d_start)
+        self.assertDictEqual(ms.min, d_min)
+        self.assertDictEqual(ms.max, d_max)
+        self.assertDictEqual(ms.step, d_step)
+
+    def test_get_param(self):
+        ms = project.ModelSpace()
+        ms.add_param("A", **self.d1['A'])
+        ms.add_param("B", **self.d1['B'])
+        result = ms.get_param('B')
+        expected = {'start': 15.0, 'min': 0.0, 'max': 30.0, 'step': 1.0}
+        self.assertIsInstance(result, project.ParamSpace)
+        self.assertEqual(result.start, expected['start'])
+        self.assertEqual(result.min, expected['min'])
+        self.assertEqual(result.max, expected['max'])
+        self.assertEqual(result.step, expected['step'])
+
+    def test_set_param_dict(self):
+        ms = project.ModelSpace()
+        ms.set_param_dict(self.d1)
+        ms.set_param_dict(self.d2)
+        d_start = {'C': 22.0, 'D': 1.5}
+        d_min = {'C': 15.0, 'D': 0.5}
+        d_max = {'C': 25.0, 'D': 2.0}
+        d_step = {'C': 1.0, 'D': 0.25}
+        self.assertDictEqual(ms.start, d_start)
+        self.assertDictEqual(ms.min, d_min)
+        self.assertDictEqual(ms.max, d_max)
+        self.assertDictEqual(ms.step, d_step)
+
+
+class TestScanCreator(unittest.TestCase):
+    """
+    test case for @ref pmsco.project.ScanCreator class
+
+    """
+    def test_load_1(self):
+        """
+        test the load method, case 1
+
+        test for:
+        - correct array expansion of an ['e', 'a'] scan.
+        - correct file name expansion with place holders and pathlib.Path objects.
+        """
+        sc = project.ScanCreator()
+        sc.filename = Path("{test_p}", "twoatom_energy_alpha.etpai")
+        sc.positions = {
+            "e": "np.arange(10, 400, 5)",
+            "t": "0",
+            "p": "0",
+            "a": "np.linspace(-30, 30, 31)"
+        }
+        sc.emitter = "Cu"
+        sc.initial_state = "2p3/2"
+
+        p = Path(__file__).parent / ".." / "projects" / "twoatom"
+        dirs = {"test_p": p,
+                "test_s": str(p)}
+
+        result = sc.load(dirs=dirs)
+
+        self.assertEqual(result.mode, ['e', 'a'])
+        self.assertEqual(result.emitter, sc.emitter)
+        self.assertEqual(result.initial_state, sc.initial_state)
+
+        e = np.arange(10, 400, 5)
+        a = np.linspace(-30, 30, 31)
+        t = p = np.asarray([0])
+        np.testing.assert_array_equal(result.energies, e)
+        np.testing.assert_array_equal(result.thetas, t)
+        np.testing.assert_array_equal(result.phis, p)
+        np.testing.assert_array_equal(result.alphas, a)
+
+        self.assertTrue(Path(result.filename).is_file(), msg=f"file {result.filename} not found")
+
+
 class TestScan(unittest.TestCase):
    """
    test case for @ref pmsco.project.Scan class