public release 3.0.0 - see README and CHANGES for details

This commit is contained in:
muntwiler_m 2021-02-09 12:46:20 +01:00
parent 2b3dbd8bac
commit ef781e2db4
46 changed files with 4390 additions and 1655 deletions

View File

@ -1,6 +1,7 @@
pages:
stage: deploy
script:
- ~/miniconda3/bin/activate pmsco
- make docs
- mv docs/html/ public/
artifacts:
@ -10,4 +11,4 @@ pages:
- master
tags:
- doxygen

View File

@ -1,3 +1,28 @@
Release 3.0.0 (2021-02-01)
==========================
| Hash | Date | Description |
| ---- | ---- | ----------- |
| 72a9f38 | 2021-02-06 | introduce run file based job scheduling |
| 42e12d8 | 2021-02-05 | compatibility with recent conda and singularity versions |
| caf9f43 | 2021-02-03 | installation: include plantuml.jar |
| 574c88a | 2021-02-01 | docs: replace doxypy by doxypypy |
| a5cb831 | 2021-02-05 | redefine output_file property |
| 49dbb89 | 2021-01-27 | documentation of run file interface |
| 940d9ae | 2021-01-07 | introduce run file interface |
| 6950f98 | 2021-02-05 | set legacy fortran for compatibility with recent compiler |
| 28d8bc9 | 2021-01-27 | graphics: fixed color range for modulation functions |
| 1382508 | 2021-01-16 | cluster: build_element accepts symbol or number |
| 53508b7 | 2021-01-06 | graphics: swarm plot |
| 4a24163 | 2021-01-05 | graphics: genetic chart |
| 99e9782 | 2020-12-23 | periodic table: use common binding energies in condensed matter XPS |
| fdfcf90 | 2020-12-23 | periodic table: reformat bindingenergy.json, add more import/export functions |
| 13cf90f | 2020-12-21 | hbnni: parameters for xpd demo with two domains |
| 680edb4 | 2020-12-21 | documentation: update documentation of optimizers |
| d909469 | 2020-12-18 | doc: update top components diagram (pmsco module is entry point) |
| 574993e | 2020-12-09 | spectrum: add plot cross section function |
Release 2.2.0 (2020-09-04)
==========================

View File

@ -6,7 +6,7 @@ It is a collection of computer programs to calculate photoelectron diffraction p
and to optimize structural models based on measured data.
The actual scattering calculation is done by code developed by other parties.
PMSCO wraps around that program and facilitates parameter handling, cluster building, structural optimization and parallel processing.
PMSCO wraps around those programs and facilitates parameter handling, cluster building, structural optimization and parallel processing.
In the current version, the [EDAC](http://garciadeabajos-group.icfo.es/widgets/edac/) code
developed by F. J. García de Abajo, M. A. Van Hove, and C. S. Fadley (1999) is used for scattering calculations.
Instead of EDAC built-in routines, alternatively,
@ -20,11 +20,12 @@ Highlights
- various scanning modes including energy, manipulator angle (polar/azimuthal), emission angle.
- averaging over multiple domains and emitters.
- global optimization of multiple scans.
- structural optimization algorithms: particle swarm optimization, grid search, gradient search.
- structural optimization algorithms: particle swarm optimization, genetic algorithm, grid scan, table scan.
- detailed reports and graphs of result files.
- calculation of the modulation function.
- calculation of the weighted R-factor.
- automatic parallel processing using OpenMPI.
- tested on Linux cluster machines.
- compatible with Slurm resource manager on Linux cluster machines.
Installation
@ -39,13 +40,12 @@ The code requires about 2 GB of RAM per process.
Detailed installation instructions and dependencies can be found in the documentation
(docs/src/installation.dox).
A [Doxygen](http://www.stack.nl/~dimitri/doxygen/index.html) compiler with Doxypy is required to generate the documentation in HTML or LaTeX format.
A [Doxygen](http://www.stack.nl/~dimitri/doxygen/index.html) compiler with Doxypypy is required to generate the documentation in HTML format.
The easiest way to set up an environment with all dependencies and without side-effects on other installed software is to use a [Singularity](https://www.sylabs.io/guides/2.5/user-guide/index.html) container.
A Singularity recipe file is part of the distribution, see the PMSCO documentation for details.
On newer Linux systems (e.g. Ubuntu 18.04), Singularity is available from the package manager.
Installation in a [virtual box](https://www.virtualbox.org/) on Windows or Mac is straightforward using the [Vagrant](https://www.vagrantup.com/) system.
A Vagrant file is included in the distribution.
The easiest way to set up an environment with all dependencies and without side-effects on other installed software is to use a [Singularity](https://www.sylabs.io/guides/3.7/user-guide/index.html) container.
A Singularity recipe file is part of the distribution, see the PMSCO documentation for details, Singularity must be installed separately.
Installation in a [virtual box](https://www.virtualbox.org/) on Windows or Mac is straightforward using pre-compiled images with [Vagrant](https://www.vagrantup.com/).
A Vagrant definition file is included in the distribution.
The public distribution of PMSCO does not contain the [EDAC](http://garciadeabajos-group.icfo.es/widgets/edac/) code.
Please obtain the EDAC source code from the original author, copy it to the pmsco/edac directory, and apply the edac_all.patch patch.
@ -70,7 +70,7 @@ Matthias Muntwiler, <mailto:matthias.muntwiler@psi.ch>
Copyright
---------
Copyright 2015-2020 by [Paul Scherrer Institut](http://www.psi.ch)
Copyright 2015-2021 by [Paul Scherrer Institut](http://www.psi.ch)
Release Notes
@ -78,6 +78,22 @@ Release Notes
For a detailed list of changes, see the CHANGES.md file.
3.0.0 (2021-02-08)
------------------
- Run file interface replaces command line arguments:
- Specify all run-time parameters in a JSON-formatted text file.
- Override any public attribute of the project class.
- Only the name of the run file is needed on the command line.
- The command line interface is still available, some default values and the handling of directory paths have changed.
Check your code for compatibility.
- Integrated job scheduling with the Slurm resource manager:
- Declare all job arguments in the run file and have PMSCO submit the job.
- Graphics scripts for genetic chart and swarm population (experimental feature).
- Update for compatibility with recent Ubuntu (20.04), Anaconda (4.8) and Singularity (3.7).
- Drop compatibility with Python 2.7, minimum requirement is Python 3.6.
2.2.0 (2020-09-04)
------------------

View File

@ -1,136 +0,0 @@
#!/bin/bash
#
# Slurm script template for PMSCO calculations on the Ra cluster
# based on run_mpi_HPL_nodes-2.sl by V. Markushin 2016-03-01
#
# this version checks out the source code from a git repository
# to a temporary location and compiles the code.
# this is to minimize conflicts between different jobs
# but requires that each job has its own git commit.
#
# Use:
# - enter the appropriate parameters and save as a new file.
# - call the sbatch command to pass the job script.
# request a specific number of nodes and tasks.
# example:
# sbatch --nodes=2 --ntasks-per-node=24 --time=02:00:00 run_pmsco.sl
# the qpmsco script does all this for you.
#
# PMSCO arguments
# copy this template to a new file, and set the arguments
#
# PMSCO_WORK_DIR
# path to be used as working directory.
# contains the script derived from this template
# and a copy of the pmsco code in the 'pmsco' directory.
# receives output and temporary files.
#
# PMSCO_PROJECT_FILE
# python module that declares the project and starts the calculation.
# must include the file path relative to $PMSCO_WORK_DIR.
#
# PMSCO_OUT
# name of output file. should not include a path.
#
# all paths are relative to $PMSCO_WORK_DIR or (better) absolute.
#
#
# Further arguments
#
# PMSCO_JOBNAME (required)
# the job name is the base name for output files.
#
# PMSCO_WALLTIME_HR (integer, required)
# wall time limit in hours. must be integer, minimum 1.
# this value is passed to PMSCO.
# it should specify the same amount of wall time as requested from the scheduler.
#
# PMSCO_PROJECT_ARGS (optional)
# extra arguments that are parsed by the project module.
#
#SBATCH --job-name="_PMSCO_JOBNAME"
#SBATCH --output="_PMSCO_JOBNAME.o.%j"
#SBATCH --error="_PMSCO_JOBNAME.e.%j"
PMSCO_WORK_DIR="_PMSCO_WORK_DIR"
PMSCO_JOBNAME="_PMSCO_JOBNAME"
PMSCO_WALLTIME_HR=_PMSCO_WALLTIME_HR
PMSCO_PROJECT_FILE="_PMSCO_PROJECT_FILE"
PMSCO_OUT="_PMSCO_JOBNAME"
PMSCO_PROJECT_ARGS="_PMSCO_PROJECT_ARGS"
module load psi-python36/4.4.0
module load gcc/4.8.5
module load openmpi/3.1.3
source activate pmsco3
echo '================================================================================'
echo "=== Running $0 at the following time and place:"
date
/bin/hostname
cd $PMSCO_WORK_DIR
pwd
ls -lA
#the intel compiler is currently not compatible with mpi4py. -mm 170131
#echo
#echo '================================================================================'
#echo "=== Setting the environment to use Intel Cluster Studio XE 2016 Update 2 intel/16.2:"
#cmd="source /opt/psi/Programming/intel/16.2/bin/compilervars.sh intel64"
#echo $cmd
#$cmd
echo
echo '================================================================================'
echo "=== The environment is set as following:"
env
echo
echo '================================================================================'
echo "BEGIN test"
which mpirun
cmd="mpirun /bin/hostname"
echo $cmd
$cmd
echo "END test"
echo
echo '================================================================================'
echo "BEGIN mpirun pmsco"
echo
cd "$PMSCO_WORK_DIR"
cd pmsco
echo "code revision"
git log --pretty=tformat:'%h %ai %d' -1
make -C pmsco all
python -m compileall pmsco
python -m compileall projects
echo
cd "$PMSCO_WORK_DIR"
PMSCO_CMD="python pmsco/pmsco $PMSCO_PROJECT_FILE"
PMSCO_ARGS="$PMSCO_PROJECT_ARGS"
if [ -n "$PMSCO_SCAN_FILES" ]; then
PMSCO_ARGS="-s $PMSCO_SCAN_FILES $PMSCO_ARGS"
fi
if [ -n "$PMSCO_OUT" ]; then
PMSCO_ARGS="-o $PMSCO_OUT $PMSCO_ARGS"
fi
if [ "$PMSCO_WALLTIME_HR" -ge 1 ]; then
PMSCO_ARGS="-t $PMSCO_WALLTIME_HR $PMSCO_ARGS"
fi
if [ -n "$PMSCO_LOGLEVEL" ]; then
PMSCO_ARGS="--log-level $PMSCO_LOGLEVEL --log-file $PMSCO_JOBNAME.log $PMSCO_ARGS"
fi
# Do no use the OpenMPI specific options, like "-x LD_LIBRARY_PATH", with the Intel mpirun.
cmd="mpirun $PMSCO_CMD $PMSCO_ARGS"
echo $cmd
$cmd
echo "END mpirun pmsco"
echo '================================================================================'
cd "$PMSCO_WORK_DIR"
rm -rf pmsco
date
ls -lAtr
echo '================================================================================'
exit 0

View File

@ -1,157 +0,0 @@
#!/bin/bash
#
# Slurm script template for PMSCO calculations on the Ra cluster
# based on run_mpi_HPL_nodes-2.sl by V. Markushin 2016-03-01
#
# Use:
# - enter the appropriate parameters and save as a new file.
# - call the sbatch command to pass the job script.
# request a specific number of nodes and tasks.
# example:
# sbatch --nodes=2 --ntasks-per-node=24 --time=02:00:00 run_pmsco.sl
#
# PMSCO arguments
# copy this template to a new file, and set the arguments
#
# PMSCO_WORK_DIR
# path to be used as working directory.
# contains the script derived from this template.
# receives output and temporary files.
#
# PMSCO_PROJECT_FILE
# python module that declares the project and starts the calculation.
# must include the file path relative to $PMSCO_WORK_DIR.
#
# PMSCO_SOURCE_DIR
# path to the pmsco source directory
# (the directory which contains the bin, lib, pmsco sub-directories)
#
# PMSCO_SCAN_FILES
# list of scan files.
#
# PMSCO_OUT
# name of output file. should not include a path.
#
# all paths are relative to $PMSCO_WORK_DIR or (better) absolute.
#
#
# Further arguments
#
# PMSCO_JOBNAME (required)
# the job name is the base name for output files.
#
# PMSCO_WALLTIME_HR (integer, required)
# wall time limit in hours. must be integer, minimum 1.
# this value is passed to PMSCO.
# it should specify the same amount of wall time as requested from the scheduler.
#
# PMSCO_MODE (optional)
# calculation mode: single, swarm, grid, gradient
#
# PMSCO_CODE (optional)
# calculation code: edac, msc, test
#
# PMSCO_LOGLEVEL (optional)
# request log level: DEBUG, INFO, WARNING, ERROR
# create a log file based on the job name.
#
# PMSCO_PROJECT_ARGS (optional)
# extra arguments that are parsed by the project module.
#
#SBATCH --job-name="_PMSCO_JOBNAME"
#SBATCH --output="_PMSCO_JOBNAME.o.%j"
#SBATCH --error="_PMSCO_JOBNAME.e.%j"
PMSCO_WORK_DIR="_PMSCO_WORK_DIR"
PMSCO_JOBNAME="_PMSCO_JOBNAME"
PMSCO_WALLTIME_HR=_PMSCO_WALLTIME_HR
PMSCO_PROJECT_FILE="_PMSCO_PROJECT_FILE"
PMSCO_MODE="_PMSCO_MODE"
PMSCO_CODE="_PMSCO_CODE"
PMSCO_SOURCE_DIR="_PMSCO_SOURCE_DIR"
PMSCO_SCAN_FILES="_PMSCO_SCAN_FILES"
PMSCO_OUT="_PMSCO_JOBNAME"
PMSCO_LOGLEVEL="_PMSCO_LOGLEVEL"
PMSCO_PROJECT_ARGS="_PMSCO_PROJECT_ARGS"
module load psi-python36/4.4.0
module load gcc/4.8.5
module load openmpi/3.1.3
source activate pmsco3
echo '================================================================================'
echo "=== Running $0 at the following time and place:"
date
/bin/hostname
cd $PMSCO_WORK_DIR
pwd
ls -lA
#the intel compiler is currently not compatible with mpi4py. -mm 170131
#echo
#echo '================================================================================'
#echo "=== Setting the environment to use Intel Cluster Studio XE 2016 Update 2 intel/16.2:"
#cmd="source /opt/psi/Programming/intel/16.2/bin/compilervars.sh intel64"
#echo $cmd
#$cmd
echo
echo '================================================================================'
echo "=== The environment is set as following:"
env
echo
echo '================================================================================'
echo "BEGIN test"
echo "=== Intel native mpirun will get the number of nodes and the machinefile from Slurm"
which mpirun
cmd="mpirun /bin/hostname"
echo $cmd
$cmd
echo "END test"
echo
echo '================================================================================'
echo "BEGIN mpirun pmsco"
echo "Intel native mpirun will get the number of nodes and the machinefile from Slurm"
echo
echo "code revision"
cd "$PMSCO_SOURCE_DIR"
git log --pretty=tformat:'%h %ai %d' -1
python -m compileall pmsco
python -m compileall projects
cd "$PMSCO_WORK_DIR"
echo
PMSCO_CMD="python $PMSCO_SOURCE_DIR/pmsco $PMSCO_PROJECT_FILE"
PMSCO_ARGS="$PMSCO_PROJECT_ARGS"
if [ -n "$PMSCO_SCAN_FILES" ]; then
PMSCO_ARGS="-s $PMSCO_SCAN_FILES $PMSCO_ARGS"
fi
if [ -n "$PMSCO_CODE" ]; then
PMSCO_ARGS="-c $PMSCO_CODE $PMSCO_ARGS"
fi
if [ -n "$PMSCO_MODE" ]; then
PMSCO_ARGS="-m $PMSCO_MODE $PMSCO_ARGS"
fi
if [ -n "$PMSCO_OUT" ]; then
PMSCO_ARGS="-o $PMSCO_OUT $PMSCO_ARGS"
fi
if [ "$PMSCO_WALLTIME_HR" -ge 1 ]; then
PMSCO_ARGS="-t $PMSCO_WALLTIME_HR $PMSCO_ARGS"
fi
if [ -n "$PMSCO_LOGLEVEL" ]; then
PMSCO_ARGS="--log-level $PMSCO_LOGLEVEL --log-file $PMSCO_JOBNAME.log $PMSCO_ARGS"
fi
which mpirun
ls -l "$PMSCO_SOURCE_DIR"
ls -l "$PMSCO_PROJECT_FILE"
# Do no use the OpenMPI specific options, like "-x LD_LIBRARY_PATH", with the Intel mpirun.
cmd="mpirun $PMSCO_CMD $PMSCO_ARGS"
echo $cmd
$cmd
echo "END mpirun pmsco"
echo '================================================================================'
date
ls -lAtr
echo '================================================================================'
exit 0

View File

@ -1,178 +0,0 @@
#!/bin/bash
#
# SGE script template for MSC calculations
#
# This script uses the tight integration of openmpi-1.4.5-gcc-4.6.3 in SGE
# using the parallel environment (PE) "orte".
# This script must be used only with qsub command - do NOT run it as a stand-alone
# shell script because it will start all processes on the local node.
#
# PhD arguments
# copy this template to a new file, and set the arguments
#
# PHD_WORK_DIR
# path to be used as working directory.
# contains the SGE script derived from this template.
# receives output and temporary files.
#
# PHD_PROJECT_FILE
# python module that declares the project and starts the calculation.
# must include the file path relative to $PHD_WORK_DIR.
#
# PHD_SOURCE_DIR
# path to the pmsco source directory
# (the directory which contains the bin, lib, pmsco sub-directories)
#
# PHD_SCAN_FILES
# list of scan files.
#
# PHD_OUT
# name of output file. should not include a path.
#
# all paths are relative to $PHD_WORK_DIR or (better) absolute.
#
#
# Further arguments
#
# PHD_JOBNAME (required)
# the job name is the base name for output files.
#
# PHD_NODES (required)
# number of computing nodes (processes) to allocate for the job.
#
# PHD_WALLTIME_HR (required)
# wall time limit (hours)
#
# PHD_WALLTIME_MIN (required)
# wall time limit (minutes)
#
# PHD_MODE (optional)
# calculation mode: single, swarm, grid, gradient
#
# PHD_CODE (optional)
# calculation code: edac, msc, test
#
# PHD_LOGLEVEL (optional)
# request log level: DEBUG, INFO, WARNING, ERROR
# create a log file based on the job name.
#
# PHD_PROJECT_ARGS (optional)
# extra arguments that are parsed by the project module.
#
PHD_WORK_DIR="_PHD_WORK_DIR"
PHD_JOBNAME="_PHD_JOBNAME"
PHD_NODES=_PHD_NODES
PHD_WALLTIME_HR=_PHD_WALLTIME_HR
PHD_WALLTIME_MIN=_PHD_WALLTIME_MIN
PHD_PROJECT_FILE="_PHD_PROJECT_FILE"
PHD_MODE="_PHD_MODE"
PHD_CODE="_PHD_CODE"
PHD_SOURCE_DIR="_PHD_SOURCE_DIR"
PHD_SCAN_FILES="_PHD_SCAN_FILES"
PHD_OUT="_PHD_JOBNAME"
PHD_LOGLEVEL="_PHD_LOGLEVEL"
PHD_PROJECT_ARGS="_PHD_PROJECT_ARGS"
# Define your job name, parallel environment with the number of slots, and run time:
#$ -cwd
#$ -N _PHD_JOBNAME.job
#$ -pe orte _PHD_NODES
#$ -l ram=2G
#$ -l s_rt=_PHD_WALLTIME_HR:_PHD_WALLTIME_MIN:00
#$ -l h_rt=_PHD_WALLTIME_HR:_PHD_WALLTIME_MIN:30
#$ -V
###################################################
# Fix the SGE environment-handling bug (bash):
source /usr/share/Modules/init/sh
export -n -f module
# Load the environment modules for this job (the order may be important):
module load python/python-2.7.5
module load gcc/gcc-4.6.3
module load mpi/openmpi-1.4.5-gcc-4.6.3
module load blas/blas-20110419-gcc-4.6.3
module load lapack/lapack-3.4.2-gcc-4.6.3
export LD_LIBRARY_PATH=$PHD_SOURCE_DIR/lib/:$LD_LIBRARY_PATH
###################################################
# Set the environment variables:
MPIEXEC=$OPENMPI/bin/mpiexec
# OPENMPI is set by the mpi/openmpi-* module.
export OMP_NUM_THREADS=1
export OMPI_MCA_btl='openib,sm,self'
# export OMPI_MCA_orte_process_binding=core
##############
# BEGIN DEBUG
# Print the SGE environment on master host:
echo "================================================================"
echo "=== SGE job JOB_NAME=$JOB_NAME JOB_ID=$JOB_ID"
echo "================================================================"
echo DATE=`date`
echo HOSTNAME=`hostname`
echo PWD=`pwd`
echo "NSLOTS=$NSLOTS"
echo "PE_HOSTFILE=$PE_HOSTFILE"
cat $PE_HOSTFILE
echo "================================================================"
echo "Running environment:"
env
echo "================================================================"
echo "Loaded environment modules:"
module list 2>&1
echo
# END DEBUG
##############
##############
# Setup
cd "$PHD_SOURCE_DIR"
python -m compileall .
cd "$PHD_WORK_DIR"
ulimit -c 0
###################################################
# The command to run with mpiexec:
CMD="python $PHD_PROJECT_FILE"
ARGS="$PHD_PROJECT_ARGS"
if [ -n "$PHD_SCAN_FILES" ]; then
ARGS="-s $PHD_SCAN_FILES -- $ARGS"
fi
if [ -n "$PHD_CODE" ]; then
ARGS="-c $PHD_CODE $ARGS"
fi
if [ -n "$PHD_MODE" ]; then
ARGS="-m $PHD_MODE $ARGS"
fi
if [ -n "$PHD_OUT" ]; then
ARGS="-o $PHD_OUT $ARGS"
fi
if [ "$PHD_WALLTIME_HR" -ge 1 ]
then
ARGS="-t $PHD_WALLTIME_HR $ARGS"
else
ARGS="-t 0.5 $ARGS"
fi
if [ -n "$PHD_LOGLEVEL" ]; then
ARGS="--log-level $PHD_LOGLEVEL --log-file $PHD_JOBNAME.log $ARGS"
fi
# The MPI command to run:
MPICMD="$MPIEXEC --prefix $OPENMPI -x PATH -x LD_LIBRARY_PATH -x OMP_NUM_THREADS -x OMPI_MCA_btl -np $NSLOTS $CMD $ARGS"
echo "Command to run:"
echo "$MPICMD"
echo
exec $MPICMD
exit 0

View File

@ -1,145 +0,0 @@
#!/bin/sh
#
# submission script for PMSCO calculations on the Ra cluster
#
# this version clones the current git repository at HEAD to the work directory.
# thus, version conflicts between jobs are avoided.
#
if [ $# -lt 1 ]; then
echo "Usage: $0 [NOSUB] GIT_TAG DESTDIR JOBNAME NODES TASKS_PER_NODE WALLTIME:HOURS PROJECT [ARGS [ARGS [...]]]"
echo ""
echo " NOSUB (optional): do not submit the script to the queue. default: submit."
echo " GIT_TAG: git tag or branch name of the code. HEAD for current code."
echo " DESTDIR: destination directory. must exist. a sub-dir \$JOBNAME is created."
echo " JOBNAME (text): name of job. use only alphanumeric characters, no spaces."
echo " NODES (integer): number of computing nodes. (1 node = 24 or 32 processors)."
echo " do not specify more than 2."
echo " TASKS_PER_NODE (integer): 1...24, or 32."
echo " 24 or 32 for full-node allocation."
echo " 1...23 for shared node allocation."
echo " WALLTIME:HOURS (integer): requested wall time."
echo " 1...24 for day partition"
echo " 24...192 for week partition"
echo " 1...192 for shared partition"
echo " PROJECT: python module (file path) that declares the project and starts the calculation."
echo " ARGS (optional): any number of further PMSCO or project arguments (except time)."
echo ""
echo "the job script is written to \$DESTDIR/\$JOBNAME which is also the destination of calculation output."
exit 1
fi
# location of the pmsco package is derived from the path of this script
SCRIPTDIR="$(dirname $(readlink -f $0))"
SOURCEDIR="$(readlink -f $SCRIPTDIR/..)"
PMSCO_SOURCE_DIR="$SOURCEDIR"
# read arguments
if [ "$1" == "NOSUB" ]; then
NOSUB="true"
shift
else
NOSUB="false"
fi
if [ "$1" == "HEAD" ]; then
BRANCH_ARG=""
else
BRANCH_ARG="-b $1"
fi
shift
DEST_DIR="$1"
shift
PMSCO_JOBNAME=$1
shift
PMSCO_NODES=$1
PMSCO_TASKS_PER_NODE=$2
PMSCO_TASKS=$(expr $PMSCO_NODES \* $PMSCO_TASKS_PER_NODE)
shift 2
PMSCO_WALLTIME_HR=$1
PMSCO_WALLTIME_MIN=$(expr $PMSCO_WALLTIME_HR \* 60)
shift
# select partition
if [ $PMSCO_WALLTIME_HR -ge 25 ]; then
PMSCO_PARTITION="week"
else
PMSCO_PARTITION="day"
fi
if [ $PMSCO_TASKS_PER_NODE -lt 24 ]; then
PMSCO_PARTITION="shared"
fi
PMSCO_PROJECT_FILE="$(readlink -f $1)"
shift
PMSCO_PROJECT_ARGS="$*"
# set up working directory
cd "$DEST_DIR"
if [ ! -d "$PMSCO_JOBNAME" ]; then
mkdir "$PMSCO_JOBNAME"
fi
cd "$PMSCO_JOBNAME"
WORKDIR="$(pwd)"
PMSCO_WORK_DIR="$WORKDIR"
# copy code
PMSCO_SOURCE_REPO="file://$PMSCO_SOURCE_DIR"
echo "$PMSCO_SOURCE_REPO"
cd "$PMSCO_WORK_DIR"
git clone $BRANCH_ARG --single-branch --depth 1 $PMSCO_SOURCE_REPO pmsco || exit
cd pmsco
PMSCO_REV=$(git log --pretty=format:"%h, %ai" -1) || exit
cd "$WORKDIR"
echo "$PMSCO_REV" > revision.txt
# generate job script from template
sed -e "s:_PMSCO_WORK_DIR:$PMSCO_WORK_DIR:g" \
-e "s:_PMSCO_JOBNAME:$PMSCO_JOBNAME:g" \
-e "s:_PMSCO_NODES:$PMSCO_NODES:g" \
-e "s:_PMSCO_WALLTIME_HR:$PMSCO_WALLTIME_HR:g" \
-e "s:_PMSCO_PROJECT_FILE:$PMSCO_PROJECT_FILE:g" \
-e "s:_PMSCO_PROJECT_ARGS:$PMSCO_PROJECT_ARGS:g" \
"$SCRIPTDIR/pmsco.ra-git.template" > $PMSCO_JOBNAME.job
chmod u+x "$PMSCO_JOBNAME.job" || exit
# request nodes and tasks
#
# The option --ntasks-per-node is meant to be used with the --nodes option.
# (For the --ntasks option, the default is one task per node, use the --cpus-per-task option to change this default.)
#
# sbatch options
# --cores-per-socket=16
# 32 cores per node
# --partition=[shared|day|week]
# --time=8-00:00:00
# override default time limit (2 days in long queue)
# time formats: "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes", "days-hours:minutes:seconds"
# --mail-type=ALL
# --test-only
# check script but do not submit
#
SLURM_ARGS="--nodes=$PMSCO_NODES --ntasks-per-node=$PMSCO_TASKS_PER_NODE"
if [ $PMSCO_TASKS_PER_NODE -gt 24 ]; then
SLURM_ARGS="--cores-per-socket=16 $SLURM_ARGS"
fi
SLURM_ARGS="--partition=$PMSCO_PARTITION $SLURM_ARGS"
SLURM_ARGS="--time=$PMSCO_WALLTIME_HR:00:00 $SLURM_ARGS"
CMD="sbatch $SLURM_ARGS $PMSCO_JOBNAME.job"
echo $CMD
if [ "$NOSUB" != "true" ]; then
$CMD
fi
exit 0

View File

@ -1,151 +0,0 @@
#!/bin/sh
#
# submission script for PMSCO calculations on the Ra cluster
#
# CAUTION: the job will execute the pmsco code which is present in the directory tree
# of this script _at the time of job execution_, not submission!
# before changing the code, make sure that all pending jobs have started execution,
# otherwise you will experience version conflicts.
# it's better to use the qpmsco.ra-git.sh script which clones the code.
if [ $# -lt 1 ]; then
echo "Usage: $0 [NOSUB] DESTDIR JOBNAME NODES TASKS_PER_NODE WALLTIME:HOURS PROJECT MODE [ARGS [ARGS [...]]]"
echo ""
echo " NOSUB (optional): do not submit the script to the queue. default: submit."
echo " DESTDIR: destination directory. must exist. a sub-dir \$JOBNAME is created."
echo " JOBNAME (text): name of job. use only alphanumeric characters, no spaces."
echo " NODES (integer): number of computing nodes. (1 node = 24 or 32 processors)."
echo " do not specify more than 2."
echo " TASKS_PER_NODE (integer): 1...24, or 32."
echo " 24 or 32 for full-node allocation."
echo " 1...23 for shared node allocation."
echo " WALLTIME:HOURS (integer): requested wall time."
echo " 1...24 for day partition"
echo " 24...192 for week partition"
echo " 1...192 for shared partition"
echo " PROJECT: python module (file path) that declares the project and starts the calculation."
echo " MODE: PMSCO calculation mode (single|swarm|gradient|grid)."
echo " ARGS (optional): any number of further PMSCO or project arguments (except mode and time)."
echo ""
echo "the job script is written to \$DESTDIR/\$JOBNAME which is also the destination of calculation output."
exit 1
fi
# location of the pmsco package is derived from the path of this script
SCRIPTDIR="$(dirname $(readlink -f $0))"
SOURCEDIR="$SCRIPTDIR/.."
PMSCO_SOURCE_DIR="$SOURCEDIR"
# read arguments
if [ "$1" == "NOSUB" ]; then
NOSUB="true"
shift
else
NOSUB="false"
fi
DEST_DIR="$1"
shift
PMSCO_JOBNAME=$1
shift
PMSCO_NODES=$1
PMSCO_TASKS_PER_NODE=$2
PMSCO_TASKS=$(expr $PMSCO_NODES \* $PMSCO_TASKS_PER_NODE)
shift 2
PMSCO_WALLTIME_HR=$1
PMSCO_WALLTIME_MIN=$(expr $PMSCO_WALLTIME_HR \* 60)
shift
# select partition
if [ $PMSCO_WALLTIME_HR -ge 25 ]; then
PMSCO_PARTITION="week"
else
PMSCO_PARTITION="day"
fi
if [ $PMSCO_TASKS_PER_NODE -lt 24 ]; then
PMSCO_PARTITION="shared"
fi
PMSCO_PROJECT_FILE="$(readlink -f $1)"
shift
PMSCO_MODE="$1"
shift
PMSCO_PROJECT_ARGS="$*"
# use defaults, override explicitly in PMSCO_PROJECT_ARGS if necessary
PMSCO_SCAN_FILES=""
PMSCO_LOGLEVEL=""
PMSCO_CODE=""
# set up working directory
cd "$DEST_DIR"
if [ ! -d "$PMSCO_JOBNAME" ]; then
mkdir "$PMSCO_JOBNAME"
fi
cd "$PMSCO_JOBNAME"
WORKDIR="$(pwd)"
PMSCO_WORK_DIR="$WORKDIR"
# provide revision information, requires git repository
cd "$SOURCEDIR"
PMSCO_REV=$(git log --pretty=format:"%h, %ai" -1)
if [ $? -ne 0 ]; then
PMSCO_REV="revision unknown, "$(date +"%F %T %z")
fi
cd "$WORKDIR"
echo "$PMSCO_REV" > revision.txt
# generate job script from template
sed -e "s:_PMSCO_WORK_DIR:$PMSCO_WORK_DIR:g" \
-e "s:_PMSCO_JOBNAME:$PMSCO_JOBNAME:g" \
-e "s:_PMSCO_NODES:$PMSCO_NODES:g" \
-e "s:_PMSCO_WALLTIME_HR:$PMSCO_WALLTIME_HR:g" \
-e "s:_PMSCO_PROJECT_FILE:$PMSCO_PROJECT_FILE:g" \
-e "s:_PMSCO_PROJECT_ARGS:$PMSCO_PROJECT_ARGS:g" \
-e "s:_PMSCO_CODE:$PMSCO_CODE:g" \
-e "s:_PMSCO_MODE:$PMSCO_MODE:g" \
-e "s:_PMSCO_SOURCE_DIR:$PMSCO_SOURCE_DIR:g" \
-e "s:_PMSCO_SCAN_FILES:$PMSCO_SCAN_FILES:g" \
-e "s:_PMSCO_LOGLEVEL:$PMSCO_LOGLEVEL:g" \
"$SCRIPTDIR/pmsco.ra.template" > $PMSCO_JOBNAME.job
chmod u+x "$PMSCO_JOBNAME.job"
# request nodes and tasks
#
# The option --ntasks-per-node is meant to be used with the --nodes option.
# (For the --ntasks option, the default is one task per node, use the --cpus-per-task option to change this default.)
#
# sbatch options
# --cores-per-socket=16
# 32 cores per node
# --partition=[shared|day|week]
# --time=8-00:00:00
# override default time limit (2 days in long queue)
# time formats: "minutes", "minutes:seconds", "hours:minutes:seconds", "days-hours", "days-hours:minutes", "days-hours:minutes:seconds"
# --mail-type=ALL
# --test-only
# check script but do not submit
#
SLURM_ARGS="--nodes=$PMSCO_NODES --ntasks-per-node=$PMSCO_TASKS_PER_NODE"
if [ $PMSCO_TASKS_PER_NODE -gt 24 ]; then
SLURM_ARGS="--cores-per-socket=16 $SLURM_ARGS"
fi
SLURM_ARGS="--partition=$PMSCO_PARTITION $SLURM_ARGS"
SLURM_ARGS="--time=$PMSCO_WALLTIME_HR:00:00 $SLURM_ARGS"
CMD="sbatch $SLURM_ARGS $PMSCO_JOBNAME.job"
echo $CMD
if [ "$NOSUB" != "true" ]; then
$CMD
fi
exit 0

View File

@ -1,128 +0,0 @@
#!/bin/sh
#
# submission script for PMSCO calculations on Merlin cluster
#
if [ $# -lt 1 ]; then
echo "Usage: $0 [NOSUB] JOBNAME NODES WALLTIME:HOURS PROJECT MODE [LOG_LEVEL]"
echo ""
echo " NOSUB (optional): do not submit the script to the queue. default: submit."
echo " WALLTIME:HOURS (integer): sets the wall time limits."
echo " soft limit = HOURS:00:00"
echo " hard limit = HOURS:00:30"
echo " for short.q: HOURS = 0 (-> MINUTES=30)"
echo " for all.q: HOURS <= 24"
echo " for long.q: HOURS <= 96"
echo " PROJECT: python module (file path) that declares the project and starts the calculation."
echo " MODE: PMSCO calculation mode (single|swarm|gradient|grid)."
echo " LOG_LEVEL (optional): one of DEBUG, INFO, WARNING, ERROR if log files should be produced."
echo ""
echo "the job script complete with the program code and input/output data is generated in ~/jobs/\$JOBNAME"
exit 1
fi
# location of the pmsco package is derived from the path of this script
SCRIPTDIR="$(dirname $(readlink -f $0))"
SOURCEDIR="$SCRIPTDIR/.."
PHD_SOURCE_DIR="$SOURCEDIR"
PHD_CODE="edac"
# read arguments
if [ "$1" == "NOSUB" ]; then
NOSUB="true"
shift
else
NOSUB="false"
fi
PHD_JOBNAME=$1
shift
PHD_NODES=$1
shift
PHD_WALLTIME_HR=$1
PHD_WALLTIME_MIN=0
shift
PHD_PROJECT_FILE="$(readlink -f $1)"
PHD_PROJECT_ARGS=""
shift
PHD_MODE="$1"
shift
PHD_LOGLEVEL=""
if [ "$1" == "DEBUG" ] || [ "$1" == "INFO" ] || [ "$1" == "WARNING" ] || [ "$1" == "ERROR" ]; then
PHD_LOGLEVEL="$1"
shift
fi
# ignore remaining arguments
PHD_SCAN_FILES=""
# select allowed queues
QUEUE=short.q,all.q,long.q
# for short queue (limit 30 minutes)
if [ "$PHD_WALLTIME_HR" -lt 1 ]; then
PHD_WALLTIME_HR=0
PHD_WALLTIME_MIN=30
fi
# set up working directory
cd ~
if [ ! -d "jobs" ]; then
mkdir jobs
fi
cd jobs
if [ ! -d "$PHD_JOBNAME" ]; then
mkdir "$PHD_JOBNAME"
fi
cd "$PHD_JOBNAME"
WORKDIR="$(pwd)"
PHD_WORK_DIR="$WORKDIR"
# provide revision information, requires git repository
cd "$SOURCEDIR"
PHD_REV=$(git log --pretty=format:"%h, %ad" --date=iso -1)
if [ $? -ne 0 ]; then
PHD_REV="revision unknown, "$(date +"%F %T %z")
fi
cd "$WORKDIR"
echo "$PHD_REV" > revision.txt
# generate job script from template
sed -e "s:_PHD_WORK_DIR:$PHD_WORK_DIR:g" \
-e "s:_PHD_JOBNAME:$PHD_JOBNAME:g" \
-e "s:_PHD_NODES:$PHD_NODES:g" \
-e "s:_PHD_WALLTIME_HR:$PHD_WALLTIME_HR:g" \
-e "s:_PHD_WALLTIME_MIN:$PHD_WALLTIME_MIN:g" \
-e "s:_PHD_PROJECT_FILE:$PHD_PROJECT_FILE:g" \
-e "s:_PHD_PROJECT_ARGS:$PHD_PROJECT_ARGS:g" \
-e "s:_PHD_CODE:$PHD_CODE:g" \
-e "s:_PHD_MODE:$PHD_MODE:g" \
-e "s:_PHD_SOURCE_DIR:$PHD_SOURCE_DIR:g" \
-e "s:_PHD_SCAN_FILES:$PHD_SCAN_FILES:g" \
-e "s:_PHD_LOGLEVEL:$PHD_LOGLEVEL:g" \
"$SCRIPTDIR/pmsco.sge.template" > $PHD_JOBNAME.job
chmod u+x "$PHD_JOBNAME.job"
if [ "$NOSUB" != "true" ]; then
# suppress bash error [stackoverflow.com/questions/10496758]
unset module
# submit the job script
# EMAIL must be defined in the environment
if [ -n "$EMAIL" ]; then
qsub -q $QUEUE -m ae -M $EMAIL $PHD_JOBNAME.job
else
qsub -q $QUEUE $PHD_JOBNAME.job
fi
fi
exit 0

View File

@ -32,7 +32,7 @@ DOXYFILE_ENCODING = UTF-8
# title of most generated pages and in a few other places.
# The default value is: My Project.
PROJECT_NAME = "PEARL MSCO"
PROJECT_NAME = "PMSCO"
# The PROJECT_NUMBER tag can be used to enter a project or revision number. This
# could be handy for archiving the generated documentation or if some version
@ -765,8 +765,10 @@ src/concepts-tasks.dox \
src/concepts-emitter.dox \
src/concepts-atomscat.dox \
src/installation.dox \
src/project.dox \
src/execution.dox \
src/commandline.dox \
src/runfile.dox \
src/optimizers.dox \
../pmsco \
../projects \
@ -889,7 +891,7 @@ INPUT_FILTER =
# filters are used. If the FILTER_PATTERNS tag is empty or if none of the
# patterns match the file name, INPUT_FILTER is applied.
FILTER_PATTERNS = *.py=/usr/bin/doxypy
FILTER_PATTERNS = *.py=./py_filter.sh
# If the FILTER_SOURCE_FILES tag is set to YES, the input filter (if set using
# INPUT_FILTER) will also be used to filter the input files that are used for
@ -2083,12 +2085,6 @@ EXTERNAL_GROUPS = YES
EXTERNAL_PAGES = YES
# The PERL_PATH should be the absolute path and name of the perl script
# interpreter (i.e. the result of 'which perl').
# The default file (with absolute path) is: /usr/bin/perl.
PERL_PATH = /usr/bin/perl
#---------------------------------------------------------------------------
# Configuration options related to the dot tool
#---------------------------------------------------------------------------
@ -2102,15 +2098,6 @@ PERL_PATH = /usr/bin/perl
CLASS_DIAGRAMS = YES
# You can define message sequence charts within doxygen comments using the \msc
# command. Doxygen will then run the mscgen tool (see:
# http://www.mcternan.me.uk/mscgen/)) to produce the chart and insert it in the
# documentation. The MSCGEN_PATH tag allows you to specify the directory where
# the mscgen tool resides. If left empty the tool is assumed to be found in the
# default search path.
MSCGEN_PATH =
# You can include diagrams made with dia in doxygen documentation. Doxygen will
# then run dia to produce the diagram and insert it in the documentation. The
# DIA_PATH tag allows you to specify the directory where the dia binary resides.

2
docs/py_filter.sh Executable file
View File

@ -0,0 +1,2 @@
#!/bin/bash
python -m doxypypy.doxypypy -a -c $1

View File

@ -1,7 +1,17 @@
to compile the source code documentation, you need the following packages (naming according to Debian):
To compile the source code documentation in HTML format,
you need the following packages.
They are available from Linux distributions unless noted otherwise.
GNU make
doxygen
doxygen-gui (optional)
doxypy
python
doxypypy (pip)
graphviz
latex (optional)
java JRE
plantuml (download from plantuml.com)
export the location of plantuml.jar in the PLANTUML_JAR_PATH environment variable.
go to the `docs` directory and execute `make html`.
open `docs/html/index.html` in your browser.

View File

@ -22,7 +22,7 @@ Do not include the extension <code>.py</code> or a trailing slash.
Common args and project args are described below.
\subsection sec_common_args Common Arguments
\subsection sec_command_common Common Arguments
All common arguments are optional and default to more or less reasonable values if omitted.
They can be added to the command line in arbitrary order.
@ -34,7 +34,7 @@ The following table is ordered by importance.
| -h , --help | | Display a command line summary and exit. |
| -m , --mode | single (default), grid, swarm, genetic | Operation mode. |
| -d, --data-dir | file system path | Directory path for experimental data files (if required by project). Default: current working directory. |
| -o, --output-file | file system path | Base path and/or name for intermediate and output files. Default: pmsco_data |
| -o, --output-file | file system path | Base path and/or name for intermediate and output files. Default: pmsco0 |
| -t, --time-limit | decimal number | Wall time limit in hours. The optimizers try to finish before the limit. Default: 24.0. |
| -k, --keep-files | list of file categories | Output file categories to keep after the calculation. Multiple values can be specified and must be separated by spaces. By default, cluster and model (simulated data) of a limited number of best models are kept. See @ref sec_file_categories below. |
| --log-level | DEBUG, INFO, WARNING (default), ERROR, CRITICAL | Minimum level of messages that should be added to the log. |
@ -45,7 +45,7 @@ The following table is ordered by importance.
| --table-file | file system path | Name of the model table file in table scan mode. |
\subsubsection sec_file_categories File Categories
\subsubsection sec_command_files File Categories
The following category names can be used with the `--keep-files` option.
Multiple names can be specified and must be separated by spaces.
@ -79,7 +79,7 @@ you have to add the file categories that you want to keep, e.g.,
Do not specify `rfac` alone as this will effectively not return any file.
\subsection sec_project_args Project Arguments
\subsection sec_command_project_args Project Arguments
The following table lists a few recommended options that are handled by the project code.
Project options that are not listed here should use the long form to avoid conflicts in future versions.
@ -90,7 +90,7 @@ Project options that are not listed here should use the long form to avoid confl
| -s, --scans | project-dependent | Nick names of scans to use in calculation. The nick name selects the experimental data file and the initial state of the photoelectron. Multiple values can be specified and must be separated by spaces. |
\subsection sec_scanfile Experimental Scan Files
\subsection sec_command_scanfile Experimental Scan Files
The recommended way of specifying experimental scan files is using nick names (dictionary keys) and the @c --scans option.
A dictionary in the module code defines the corresponding file name, chemical species of the emitter and initial state of the photoelectron.
@ -99,7 +99,7 @@ This way, the file names and photoelectron parameters are versioned with the cod
whereas command line arguments may easily get forgotten in the records.
\subsection sec_project_example Argument Handling
\subsection sec_command_example Argument Handling
To handle command line arguments in a project module,
the module must define a <code>parse_project_args</code> and a <code>set_project_args</code> function.

View File

@ -8,28 +8,30 @@ The code for a PMSCO job consists of the following components.
skinparam componentStyle uml2
component "project" as project
component "PMSCO" as pmsco
component "project" as project
component "scattering code\n(calculator)" as calculator
interface "command line" as cli
interface "input files" as input
interface "output files" as output
interface "experimental data" as data
interface "results" as results
interface "output files" as output
cli --> pmsco
data -> project
project ..> pmsco
pmsco ..> project
pmsco ..> calculator
cli --> project
input -> calculator
calculator -> output
pmsco -> results
@enduml
The main entry point is the _PMSCO_ module.
It implements a task loop to carry out the structural optimization
and provides an interface between calculation programs and project-specific code.
It also provides common utility classes and functions for the handling project data.
The _project_ consists of program code, system and experimental parameters
The _project_ consists of program code and parameters
that are specific to a particular experiment and calculation job.
The project code reads experimental data, defines the parameter dictionary of the model,
and contains code to generate the cluster, parameter and phase files for the scattering code.
@ -40,10 +42,6 @@ which accepts detailed input files
(parameters, atomic coordinates, emitter specification, scattering phases)
and outputs an intensity distribution of photoelectrons versus energy and/or angle.
The _PMSCO core_ interfaces between the project and the calculator.
It carries out the structural optimization and manages the calculation tasks.
It generates and sends input files to the calculator and reads back the output.
\section sec_control_flow Control flow

View File

@ -2,10 +2,15 @@
\section sec_run Running PMSCO
To run PMSCO you need the PMSCO code and its dependencies (cf. @ref pag_install),
a code module that contains the project-specific code,
a customized code module that contains the project-specific code,
and one or several files containing the scan parameters and experimental data.
Please check the <code>projects</code> folder for examples of project modules.
For a detailed description of the command line, see @ref pag_command.
The run-time arguments can either be passed on the command line
(@ref pag_command - the older and less flexible way)
or in a JSON-formatted run-file
(@ref pag_runfile - the recommended new and flexible way).
For beginners, it's also possible to hard-code all project parameters in the custom project module.
\subsection sec_run_single Single Process
@ -14,40 +19,28 @@ Run PMSCO from the command prompt:
@code{.sh}
cd work-dir
python pmsco-dir project-dir/project.py [pmsco-arguments] [project-arguments]
python pmsco-dir -r run-file
@endcode
where <code>work-dir</code> is the destination directory for output files,
<code>pmsco-dir</code> is the directory containing the <code>__main__.py</code> file,
<code>project.py</code> is the specific project module,
and <code>project-dir</code> is the directory where the project file is located.
PMSCO is run in one process which handles all calculations sequentially.
<code>run-file</code> is a json-formatted configuration file that defines run-time parameters.
The format and content of the run-file is described in a separate section.
The command line arguments are divided into common arguments interpreted by the main pmsco code (pmsco.py),
and project-specific arguments interpreted by the project module.
In this form, PMSCO is run in one process which handles all calculations sequentially.
Example command line for a single EDAC calculation of the two-atom project:
@code{.sh}
cd work/twoatom
python ../../pmsco ../../projects/twoatom/twoatom.py -s ea -o twoatom-demo -m single
python ../../pmsco -r twoatom-hemi.json
@endcode
This command line executes the main pmsco module <code>pmsco.py</code>.
The main module loads the project file <code>twoatom.py</code> as a plug-in
and starts processing the common arguments.
The <code>twoatom.py</code> module contains only project-specific code
with several defined entry-points called from the main module.
The information which project to load is contained in the <code>twoatom-hemi.json</code> file,
along with all common and specific project arguments.
In the command line above, the <code>-o twoatom-demo</code> and <code>-m single</code> arguments
are interpreted by the pmsco module.
<code>-o</code> sets the base name of output files,
and <code>-m</code> selects the operation mode to a single calculation.
The scan argument is interpreted by the project module.
It refers to a dictionary entry that declares the scan file, the emitting atomic species, and the initial state.
In this example, the project looks for the <code>twoatom_energy_alpha.etpai</code> scan file in the project directory,
and calculates the modulation function for a N 1s initial state.
The kinetic energy and emission angles are contained in the scan file.
This example can be run for testing.
All necessary parameters and data files are included in the code repository.
\subsection sec_run_parallel Parallel Processes
@ -61,29 +54,45 @@ The slave processes will run the scattering calculations, while the master coord
and optimizes the model parameters (depending on the operation mode).
For optimum performance, the number of processes should not exceed the number of available processors.
To start a two-hour optimization job with multiple processes on an quad-core workstation with hyperthreading:
To start an optimization job with multiple processes on an quad-core workstation with hyperthreading:
@code{.sh}
cd work/my_project
mpiexec -np 8 pmsco-dir/pmsco project-dir/project.py -o my_job_0001 -t 2 -m swarm
mpiexec -np 8 --use-hwthread-cpus python pmsco-dir -r run-file
@endcode
The `--use-hwthread` option may be necessary on certain hyperthreading architectures.
\subsection sec_run_hpc High-Performance Cluster
The script @c bin/qpmsco.ra.sh takes care of submitting a PMSCO job to the slurm queue of the Ra cluster at PSI.
The script can be adapted to other machines running the slurm resource manager.
The script generates a job script based on @c pmsco.ra.template,
substituting the necessary environment and parameters,
and submits it to the queue.
PMSCO is ready to run with resource managers on cluster machines.
Code for submitting jobs to the slurm queue of the Ra cluster at PSI is included in the pmsco.schedule module
(see also the PEARL wiki pages in the PSI intranet).
The job parameters are entered in a separate section of the run file, cf. @pag_runfile for details.
Other machines can be supported by sub-classing pmsco.schedule.JobSchedule or pmsco.schedule.SlurmSchedule.
Execute @c bin/qpmsco.ra.sh without arguments to see a summary of the arguments.
If a schedule section is present and enabled in the run file,
the following command will submit a job to the cluster machine
rather than starting a calculation directly:
To submit a job to the PSI clusters (see also the PEARL-Wiki page MscCalcRa),
the analog command to the previous section would be:
@code{.sh}
bin/qpmsco.ra.sh my_job_0001 1 8 2 projects/my_project/project.py swarm
cd ~/pmsco
python pmsco -r run-file.json
@endcode
The command will copy the pmsco and project source trees as well as the run file and job script to a job directory
under the output directory specified in the project section of the run file.
The full path of the job directory is _output-dir/job-name.
The directory must be empty or not existing when you run the above command.
Be careful to specify correct project file paths.
The output and data directories should be specified as absolute paths.
The scheduling command will also load the project and scan files.
Many parameter errors can, thus, be caught and fixed before the job is submitted to the queue.
The run file also offers an option to stop just before submitting the job
so that you can inspect the job files and submit the job manually.
Be sure to consider the resource allocation policy of the cluster
before you decide on the number of processes.
Requesting less resources will prolong the run time but might increase the scheduling priority.

View File

@ -51,6 +51,14 @@ and it's difficult to switch between different Python versions.
On the PSI cluster machines, the environment must be set using the module system and conda (on Ra).
Details are explained in the PEARL Wiki.
The following tools are required to compile the documentation:
- doxygen
- doxypypy
- graphviz
- Java
- [plantUML](https://plantuml.com)
- LaTeX (optional, generally not recommended)
\subsection sec_install_instructions Instructions
@ -66,7 +74,6 @@ sudo apt install \
binutils \
build-essential \
doxygen \
doxypy \
f2c \
g++ \
gcc \
@ -92,12 +99,15 @@ cd /usr/lib
sudo ln -s /usr/lib/libblas/libblas.so.3 libblas.so
@endcode
Install Miniconda according to their [instructions](https://conda.io/docs/user-guide/install/index.html),
Download and install [Miniconda](https://conda.io/),
then configure the Python environment:
@code{.sh}
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh
bash ~/miniconda.sh
conda create -q --yes -n pmsco python=3.6
source activate pmsco
conda activate pmsco
conda install -q --yes -n pmsco \
pip \
"numpy>=1.13" \
@ -110,7 +120,7 @@ conda install -q --yes -n pmsco \
statsmodels \
swig \
gitpython
pip install periodictable attrdict fasteners mpi4py
pip install periodictable attrdict commentjson fasteners mpi4py doxypypy
@endcode
@note `mpi4pi` should be installed via pip, _not_ conda.
@ -119,16 +129,15 @@ pip install periodictable attrdict fasteners mpi4py
\subsubsection sec_install_singularity Installation in Singularity container
A [Singularity](https://www.sylabs.io/guides/2.5/user-guide/index.html) container
A [Singularity](https://sylabs.io/singularity/) container
contains all OS and Python dependencies for running PMSCO.
Besides the Singularity executable, nothing else needs to be installed in the host system.
This may be the fastest way to get PMSCO running.
For installation of Singularity,
see their [user guide](https://www.sylabs.io/guides/2.5/user-guide/installation.html).
On newer Linux systems (e.g. Ubuntu 18.04), Singularity is available from the package manager.
Installation in a virtual machine on Windows or Mac are straightforward
thanks to the [Vagrant system](https://www.vagrantup.com/).
To get started with Singularity,
download it from [sylabs.io](https://www.sylabs.io/singularity/) and install it according to their instructions.
On Windows, Singularity can be installed in a virtual machine using the [Vagrant](https://www.vagrantup.com/)
script included under `extras/vagrant`.
After installing Singularity,
check out PMSCO as explained in the @ref sec_compile section:
@ -136,6 +145,7 @@ check out PMSCO as explained in the @ref sec_compile section:
@code{.sh}
cd ~
mkdir containers
cd containers
git clone git@git.psi.ch:pearl/pmsco.git pmsco
cd pmsco
git checkout master
@ -143,11 +153,14 @@ git checkout -b my_branch
@endcode
Then, either copy a pre-built container into `~/containers`,
or build one from a script provided by the PMSCO repository:
or build one from the definition file included under extras/singularity.
You may need to customize the definition file to match the host OS
or to install compatible OpenMPI libraries,
cf. cf. [Singularity user guide](https://sylabs.io/guides/3.7/user-guide/mpi.html).
@code{.sh}
cd ~/containers
sudo singularity build pmsco.simg ~/containers/pmsco/extras/singularity/singularity_python3
sudo singularity build pmsco.sif ~/containers/pmsco/extras/singularity/singularity_python3
@endcode
To work with PMSCO, start an interactive shell in the container and switch to the pmsco environment.
@ -155,8 +168,9 @@ Note that the PMSCO code is outside the container and can be edited with the usu
@code{.sh}
cd ~/containers
singularity shell pmsco.simg
source activate pmsco
singularity shell pmsco.sif
. /opt/miniconda/etc/profile.d/conda.sh
conda activate pmsco
cd ~/containers/pmsco
make all
nosetests -w tests/
@ -168,16 +182,17 @@ Or call PMSCO from outside:
cd ~/containers
mkdir output
cd output
singularity run ../pmsco.simg python ~/containers/pmsco/pmsco path/to/your-project.py arg1 arg2 ...
singularity run -e ../pmsco.sif ~/containers/pmsco/pmsco -r path/to/your-runfile
@endcode
For parallel processing, prepend `mpirun -np X` to the singularity command as needed.
Note that this requires "compatible" OpenMPI versions on the host and container to avoid runtime errors.
\subsubsection sec_install_extra Additional Applications
For working with the code and data, some other applications are recommended.
The PyCharm IDE can be installed from the Ubuntu software center.
The PyCharm IDE (community edition) can be installed from the Ubuntu software center.
The following commands install other useful helper applications:
@code{.sh}
@ -187,10 +202,24 @@ gitg \
meld
@endcode
To produce documentation in PDF format (not recommended on virtual machine), install LaTeX:
To compile the documentation install the following tools.
The basic documentation is in HTML format and can be opened in any internet browser.
If you have a working LaTeX installation, a PDF document can be produced as well.
It is not recommended to install LaTeX just for this documentation, however.
@code{.sh}
sudo apt-get install texlive-latex-recommended
sudo apt install \
doxygen \
graphviz \
default-jre
conda activate pmsco
conda install -q --yes -n pmsco doxypypy
wget -O plantuml.jar https://sourceforge.net/projects/plantuml/files/plantuml.jar/download
sudo mkdir /opt/plantuml/
sudo mv plantuml.jar /opt/plantuml/
echo "export PLANTUML_JAR_PATH=/opt/plantuml/plantuml.jar" | sudo tee /etc/profile.d/pmsco-env.sh
@endcode
@ -250,7 +279,7 @@ mkdir work
cd work
mkdir twoatom
cd twoatom/
nice python ~/pmsco/pmsco ~/pmsco/projects/twoatom/twoatom.py -s ea -o twoatom_energy_alpha -m single
nice python ~/pmsco/pmsco -r ~/pmsco/projects/twoatom/twoatom-energy.json
@endcode
Runtime warnings may appear because the twoatom project does not contain experimental data.

View File

@ -26,13 +26,13 @@ Other programs may be integrated as well.
- various scanning modes including energy, polar angle, azimuthal angle, analyser angle.
- averaging over multiple domains and emitters.
- global optimization of multiple scans.
- structural optimization algorithms: particle swarm optimization, grid search, gradient search.
- structural optimization algorithms: genetic, particle swarm, grid search.
- calculation of the modulation function.
- calculation of the weighted R-factor.
- automatic parallel processing using OpenMPI.
\section sec_project Optimization Projects
\section sec_intro_project Optimization Projects
To set up a new optimization project, you need to:
@ -44,8 +44,7 @@ To set up a new optimization project, you need to:
- add a global function create_project to my_project.py.
- provide experimental data files (intensity or modulation function).
For details, see the documentation of the Project class,
and the example projects.
For details, see @ref pag_project, the documentation of the pmsco.project.Project class and the example projects.
\section sec_intro_start Getting Started
@ -54,8 +53,9 @@ and the example projects.
- @ref pag_concepts_tasks
- @ref pag_concepts_emitter
- @ref pag_install
- @ref pag_project
- @ref pag_run
- @ref pag_command
- @ref pag_opt
\section sec_license License Information
@ -70,6 +70,6 @@ These programs may not be used without an explicit agreement by the respective o
\author Matthias Muntwiler, <mailto:matthias.muntwiler@psi.ch>
\version This documentation is compiled from version $(REVISION).
\copyright 2015-2019 by [Paul Scherrer Institut](http://www.psi.ch)
\copyright 2015-2021 by [Paul Scherrer Institut](http://www.psi.ch)
\copyright Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0)
*/

View File

@ -3,28 +3,34 @@
\subsection sec_opt_swarm Particle swarm
\subsection sec_opt_swarm Particle swarm optimization (PSO)
The particle swarm algorithm is adapted from
The particle swarm optimization (PSO) algorithm seeks to find a global optimum in a multi-dimensional model space
by employing the _swarm intelligence_ of a number of particles traversing space,
each at its own velocity and direction,
but adjusting its trajectory based on its own experience and the results of its peers.
The PSO algorithm is adapted from
D. A. Duncan et al., Surface Science 606, 278 (2012).
It is implemented in the @ref pmsco.optimizers.swarm module.
The general parameters of the genetic algorithm are specified in the @ref Project.optimizer_params dictionary.
The general parameters of the algorithm are specified in the @ref Project.optimizer_params dictionary.
Some of them can be changed on the command line.
| Parameter | Command line | Range | Description |
| --- | --- | --- | --- |
| pop_size | --pop-size | &ge; 1 | |
| pop_size | --pop-size | &ge; 1 | Recommended 20..50 |
| position_constrain_mode | | default bounce | Resolution of domain limit violations. |
| seed_file | --seed-file | a file path, default none | |
| seed_limit | --seed-limit | 0..pop_size | |
| rfac_limit | | 0..1, default 0.8 | Accept only seed values that have a lower R-factor. |
| recalc_seed | | True or False, default True | |
The domain parameters have the following meanings:
The model space attributes have the following meaning:
| Parameter | Description |
| --- | --- |
| start | Seed model. The start values are copied into particle 0 of the initial population. |
| start | Start value of particle 0 in first iteration. |
| min | Lower limit of the parameter range. |
| max | Upper limit of the parameter range. |
| step | Not used. |
@ -32,23 +38,23 @@ The domain parameters have the following meanings:
\subsubsection sec_opt_seed Seeding a population
By default, one particle is initialized with the start value declared in the parameter domain,
and the other are set to random values within the domain.
By default, one particle is initialized with the start value declared with the model space,
and the other ones are initialized at random positions in the model space.
You may initialize more particles of the population with specific values by providing a seed file.
The seed file must have a similar format as the result `.dat` files
with a header line specifying the column names and data rows containing the values for each particle.
A good practice is to use a previous `.dat` file and remove unwanted rows.
To continue an interrupted optimization,
the `.dat` file from the previous optimization can be used as is.
The `.dat` file from a previous optimization job can be used as is to continue the optimization,
also in a different optimization mode.
The seeding procedure can be tweaked by several optimizer parameters (see above).
PMSCO normally loads the first rows up to population size - 1 or up to the `seed_limit` parameter,
whichever is lower.
If an `_rfac` column is present, the file is first sorted by R-factor and only the best models are loaded.
Models that resulted in an R-factor above the `rfac_limit` parameter are always ignored.
Models that resulted in an R-factor above the `rfac_limit` parameter are ignored in any case.
During the optimization process, all models loaded from the seed file are normally re-calculated.
In the first iteration of the optimization run, the models loaded from the seed file are re-calculated by default.
This may waste CPU time if the calculation is run under the same conditions
and would result in exactly the same R-factor,
as is the case if the seed is used to continue a previous optimization, for example.
@ -58,25 +64,26 @@ and PMSCO will use the R-factor value from the seed file rather than calculating
\subsubsection sec_opt_patch Patching a running optimization
While an optimization process is running, the user can manually patch the population with arbitrary values,
While an optimization job is running, the user can manually patch the population with arbitrary values,
for instance, to kick the population out of a local optimum or to drive it to a less sampled parameter region.
To patch a running population, prepare a population file named `pmsco_patch.pop` and copy it to the work directory.
The file must have a similar format as the result `.dat` files
The patch file must have the same format as the result `.dat` files
with a header line specifying the column names and data rows containing the values.
It should contain as many rows as particles to be patched but not more than the size of the population.
The columns must include a `_particle` column which specifies the particle to patch
as well as the model parameters to be changed.
The columns must include a `_particle` column and the model parameters to be changed.
The `_particle` column specifies the index of the particle that is patched (ranging from 0 to population size - 1).
Parameters that should remain unaffected can be left out,
extra columns including `_gen`, `_rfac` etc. are ignored.
PMSCO checks the file for syntax errors and ignores it if errors are present.
Parameter values that lie outside the domain boundary are ignored.
Individual parameter values that lie outside the domain boundary are silently ignored.
Successful or failed patching is logged at warning level.
The patch file is re-applied whenever its time stamp has changed.
PMSCO keeps track of the time stamp of the file and re-applies the patch whenever the time stamp has changed.
\attention Do not edit the patch file in the working directory
to prevent it from being read in an unfinished state or multiple times.
\attention Since each change of time stamp may trigger patching,
do not edit the patch file in the working directory
to prevent it from being read in an unfinished state or multiple times!
\subsection sec_opt_genetic Genetic optimization
@ -103,7 +110,7 @@ Some of them can be changed on the command line.
| Parameter | Command line | Range | Description |
| --- | --- | --- | --- |
| pop_size | --pop-size | &ge; 1 | |
| pop_size | --pop-size | &ge; 1 | Recommended 10..40 |
| mating_factor | | 1..pop_size, default 4 | |
| strong_mutation_probability | | 0..1, default 0.01 | Probability that a parameter undergoes a strong mutation. |
| weak_mutation_probability | | 0..1, default 1 | Probability that a parameter undergoes a weak mutation. This parameters should be left at 1. Lower values tend to produce discrete parameter values. Weak mutations can be tuned by the step domain parameters. |
@ -113,7 +120,7 @@ Some of them can be changed on the command line.
| rfac_limit | | 0..1, default 0.8 | Accept only seed values that have a lower R-factor. |
| recalc_seed | | True or False, default True | |
The domain parameters have the following meanings:
The model space attributes have the following meaning:
| Parameter | Description |
| --- | --- |
@ -129,7 +136,11 @@ cf. sections @ref sec_opt_seed and @ref sec_opt_swarm.
\subsection sec_opt_grid Grid search
The grid search algorithm samples the parameter space at equidistant steps.
The order of calculations is randomized so that distant parts of the parameter space are sampled at an early stage.
It is implemented in the @ref pmsco.optimizers.grid module.
The model space attributes have the following meaning.
The order of calculations is random so that results from different parts of the model space become available early.
| Parameter | Description |
| --- | --- |
@ -149,15 +160,19 @@ The table scan calculates models from an explicit table of model parameters.
It can be used to recalculate models from a previous optimization run on other experimental data,
as an interface to external optimizers,
or as a simple input of manually edited model parameters.
It is implemented in the @ref pmsco.optimizers.table module.
The table can be stored in an external file that is specified on the command line,
or supplied in one of several forms by the custom project class.
The table can be left unchanged during the calculations,
or new models can be added on the go.
Duplicate models are ignored.
@attention Because it is not easily possible to know when and which models have been read from the table file, if you do modify the table file during processing, pay attention to the following hints:
1. The file on disk must not be locked for more than a second. Do not keep the file open unnecessarily.
2. _Append_ new models to the end of the table rather than overwriting previous ones. Otherwise, some models may be lost before they have been calculated.
@attention Because it is not easily possible to know when the table file is read,
if you do modify the table file while calculations are running,
1. Do not keep the file locked for longer than a second.
2. Append new models to the end of the table rather than overwriting previous ones.
3. Delete lines only if you're sure that they are not needed any more.
The general parameters of the table scan are specified in the @ref Project.optimizer_params dictionary.
Some of them can be changed on the command line or in the project class (depending on how the project class is implemented).
@ -167,7 +182,7 @@ Some of them can be changed on the command line or in the project class (dependi
| pop_size | --pop-size | &ge; 1 | Number of models in a generation (calculated in parallel). In table mode, this parameter is not so important and can be left at the default. It has nothing to do with table size. |
| table_file | --table-file | a file path, default none | |
The domain parameters have the following meanings.
The model space attributes have the following meaning.
Models that violate the parameter range are not calculated.
| Parameter | Description |

454
docs/src/project.dox Normal file
View File

@ -0,0 +1,454 @@
/*! @page pag_project Setting up a new project
\section sec_project Setting Up a New Project
This topic guides you through the setup of a new project.
Be sure to check out the examples in the projects folder
and the code documentation as well.
The basic steps are:
1. Create a new folder under `projects`.
2. In the new folder, create a Python module for the project (subsequently called _the project module_).
3. In the project module, define a cluster generator class which derives from pmsco.cluster.ClusterGenerator.
4. In the project module, define a project class which derives from pmsco.project.Project.
5. In the same folder as the project module, create a JSON run-file.
\subsection sec_project_module Project Module
A skeleton of the project module file (with some common imports) may look like this:
~~~~~~{.py}
import logging
import math
import numpy as np
import periodictable as pt
from pathlib import Path
import pmsco.cluster
import pmsco.data
import pmsco.dispatch
import pmsco.elements.bindingenergy
import pmsco.project
logger = logging.getLogger(__name__)
class MyClusterGenerator(pmsco.cluster.ClusterGenerator):
def create_cluster(self, model, index):
clu = pmsco.cluster.Cluster()
# ...
return clu
def count_emitters(self, model, index):
# ...
return 1
class MyProject(pmsco.project.Project):
def __init__(self):
super().__init__()
# ...
self.cluster_generator = MyClusterGenerator(self)
def create_model_space():
spa = pmsco.project.ModelSpace()
# ...
return spa
def create_params(self, model, index):
par = pmsco.project.CalculatorParams()
# ...
return par
~~~~~~
The main purpose of the `MyProject` class is to bundle the project-specific calculation parameters and code.
The purpose of the `MyClusterGenerator` class is to produce atomic clusters as a function of a number of model parameters.
For the project to be useful, some of the methods in the skeleton above need to be implemented.
The individual methods are discussed in the following.
Further descriptions can be found in the documentation of the code.
\subsection sec_project_cluster Cluster Generator
The cluster generator is a project-specific Python object that produces a cluster, i.e., a list of atomic coordinates,
based on a small number of model parameters whenever PMSCO requires it.
The most important member of a cluster generator is its `create_cluster` method.
At least this method must be implemented for a functional cluster generator.
A generic `count_emitters` method is implemented in the base class.
It needs to be overridden if you want to use parallel calculation of multiple emitters.
\subsubsection sec_project_cluster_create Cluster Definition
The `create_cluster` method takes the model parameters (a dictionary)
and the task index (a pmsco.dispatch.CalcID, cf. @ref pag_concepts_tasks) as arguments.
Given these arguments, it must create and fill a pmsco.cluster.Cluster object.
See pmsco.cluster.ClusterGenerator.create_cluster for details on the method contract.
As an example, have a look at the following simplified excerpt from the twoatom demo project.
~~~~~~{.py}
def create_cluster(self, model, index):
# access model parameters
# dAB - distance between atoms in Angstroms
# th - polar angle in degrees
# ph - azimuthal angle in degrees
r = model['dAB']
th = math.radians(model['th'])
ph = math.radians(model['ph'])
# prepare a cluster object
clu = pmsco.cluster.Cluster()
# the comment line is optional but can be useful
clu.comment = "{0} {1}".format(self.__class__, index)
# set the maximum radius of the cluster (outliers will be ignored)
clu.set_rmax(r * 2.0)
# calculate atomic vectors
dx = r * math.sin(th) * math.cos(ph)
dy = r * math.sin(th) * math.sin(ph)
dz = r * math.cos(th)
a_top = np.array((0.0, 0.0, 0.0))
a_bot = np.array((-dx, -dy, -dz))
# add an oxygen atom at a_top position and mark it as emitter
clu.add_atom('O', a_top, 1)
# add a copper atom at a_bot position
clu.add_atom('Cu', a_bot, 0)
# pass the created cluster to the calculator
return clu
~~~~~~
In this example, two atoms are added to the cluster.
The pmsco.cluster.Cluster class provides several methods to simplify the task,
such as adding layers or bulk regions, rotation, translation, trim, emitter selection, etc.
Please refer to the documentation of its code for details.
It may also be instructive to have a look at the demo projects.
The main purposes of the cluster object are to store an array of atoms and to read/write cluster files in a variety of formats.
For each atom, the following properties are stored:
- sequential atom index (1-based, maintained by cluster code)
- atom type (chemical element number)
- chemical element symbol from periodic table
- x coordinate of the atom position
- t coordinate of the atom position
- z coordinate of the atom position
- emitter flag (0 = scatterer, 1 = emitter, default 0)
- charge/ionicity (units of elementary charge, default 0)
- scatterer class (default 0)
All of these properties except the scatterer class can be set by the add methods of the cluster.
The scatterer class is used internally by the atomic scattering factor calculators.
Whether the charge/ionicity is used, depends on the particular calculators, EDAC does not use it, for instance.
Note: You do not need to take care how many emitters a calculator allows,
or whether the emitter needs to be at the origin or the first place of the array.
These technical aspects are handled by PMSCO code transparently.
\subsubsection sec_project_cluster_domains Domains
Domains refer to regions of inequivalent structure in the probing region.
This may include regions of different orientation, different lattice constant, or even different structure.
The cluster methods can read the selected domain from the `index.domain` argument.
This is an index into the pmsco.project.Project.domains list where each item is a dictionary
that holds additional, invariable structural parameters.
A common case are rotational domains.
In this case, the list of domains may look like `[{"zrot": 0.0}, {"zrot": 60.0}]`, for example,
and the `create_cluster` method would include additional code to rotate the cluster:
~~~~~~{.py}
def create_cluster(self, model, index):
# filling atoms here
# ...
dom = self.domains[index.domain]
try:
z_rot = dom['zrot']
except KeyError:
z_rot = 0.0
if z_rot:
clu.rotate_z(z_rot)
# selecting emitters
# ...
return clu
~~~~~~
Depending on the complexity of the system, it may, however, be necessary to write a specific sub-routine for each domain.
The pmsco.project.Project class includes generic code to add intensities of domains incoherently (cf. pmsco.project.Project.combine_domains).
If the model space contains parameters 'wdom0', 'wdom1', etc.,
these parameters are interpreted at weights of domain 0, 1, etc.
One domain must have a fixed weight to avoid correlated parameters.
Typically, 'wdom0' is left undefined and defaults to 1.
\subsubsection sec_project_cluster_emitters Emitter Configurations
If your project has a large cluster and/or many emitters, have a look at @ref pag_concepts_emitter.
In this case, you should override the `count_emitters` method and return the number of emitter configurations.
In the simplest case, this is the number of inequivalent emitters, and the implementation would be:
~~~~~~{.py}
def count_emitters(self, model, index):
index = index._replace(emit=-1)
clu = self.create_cluster(model, index)
return clu.get_emitter_count()
~~~~~~
Next, modify the `create_cluster` method to check the emitter index (`index.emit`).
If it is -1, the method must return the full cluster with all inequivalent emitters marked.
If it is positive, only the corresponding emitter must be marked.
The code could be similar to this example:
~~~~~~{.py}
def create_cluster(self, model, index):
# filling atoms here
# ...
# select all possible emitters (atoms of a specific element) in a cylindrical volume
# idx_emit is an array of atom numbers (0-based atom index)
idx_emit = clu.find_index_cylinder(origin, r_xy, r_z, self.project.scans[index.scan].emitter)
# if a specific emitter should be marked, restrict the array index.
if index.emit >= 0:
idx_emit = idx_emit[index.emit]
# mark the selected emitters
# if index.emit was < 0, all emitters are marked
clu.data['e'][idx_emit] = 1
return clu
~~~~~~
Now, the individual emitter configurations will be calculated in separate tasks
which can be run in parallel in a multi-process environment.
Note that the processing time of EDAC scales linearly with the number of emitters.
Thus, parallel execution is beneficial.
Advanced programmers may exploit more of the flexibility of emitter configurations, cf. @ref pag_concepts_emitter.
\subsection sec_project_project Project Class
Most commonly, a project class overrides the `__init__`, `create_model_space` and `create_params` methods.
Most other inherited methods can be overridden optionally,
for instance `validate`, `setup`, `calc_modulation`, `rfactor`,
as well as the combine methods `combine_rfactors`, `combine_domains`, `combine_emitters`, etc.
Int his introduction, we focus on the most basic three methods.
\subsubsection sec_project_project_init Initialization and Defaults
In the `__init__` method, you define and initialize (with default values) additional project properties.
You may also redefine properties of the base class.
The following code is just an example to give you some ideas.
~~~~~~{.py}
class MyProject(pmsco.project.Project):
def __init__(self):
# call the inherited method first
super().__init__()
# re-define an inherited property
self.directories["data"] = Path("/home/pmsco/data")
# define a scan dictionary
self.scan_dict = {}
# fill the scan dictionary
self.build_scan_dict()
# create the cluster generator
self.cluster_generator = MyClusterGenerator(self)
# declare the list of domains (at least one is required)
self.domains = [{"zrot": 0.}]
def build_scan_dict(self):
self.scan_dict["empty"] = {"filename": "{pmsco}/projects/common/empty-hemiscan.etpi",
"emitter": "Si", "initial_state": "2p3/2"}
self.scan_dict["Si2p"] = {"filename": "{data}/xpd-Si2p.etpis",
"emitter": "Si", "initial_state": "2p3/2"}
~~~~~~
The scan dictionary can come in handy if you want to select scans by a shortcut on the command line or in a run file.
Note that most of the properties can be assigned from a run file.
This happens after the `__init__` method.
The values set by `__init__` serve as default values.
\subsubsection sec_project_project_space Model Space
The model space defines the keys and value ranges of the model parameters.
There are three ways to declare the model space in order of priority:
1. Declare the model space in the run-file.
2. Assign a ModelSpace to the self.model_space property directly in the `__init__` method.
3. Implement the `create_model_space` method.
We begin the third way:
~~~~~~{.py}
# under class MyProject(pmsco.project.Project):
def create_model_space(self):
# create an empty model space
spa = pmsco.project.ModelSpace()
# add parameters
spa.add_param('dAB', 2.10, 2.00, 2.25, 0.05)
spa.add_param('th', 15.00, 0.00, 30.00, 1.00)
spa.add_param('ph', 90.00)
spa.add_param('V0', 21.96, 15.00, 25.00, 1.00)
spa.add_param('Zsurf', 1.50)
spa.add_param('wdom1', 0.5, 0.10, 10.00, 0.10)
# return the model space
return spa
~~~~~~
This code declares six model parameters: `dAB`, `th`, `ph`, `V0`, `Zsurf` and `wdom1`.
Three of them are structural parameters (used by the cluster generator above),
two are used by the `create_params` method (see below),
and `wdom1` is used in pmsco.project.Project.combine_domains while summing up contributions from different domains.
The values in the arguments list correspond to the start value (initial guess),
the lower and upper boundaries of the value range,
and the step size for optimizers that require it.
If just one value is given, like for `ph` and `Zsurf`, the parameter is held constant during the optimization.
The equivalent declaration in the run-file would look like (parameters after `th` omitted):
~~~~~~{.py}
{
"project": {
// ...
"model_space": {
"dAB": {
"start": 2.109,
"min": 2.0,
"max": 2.25,
"step": 0.05
},
"th": {
"start": 15.0,
"min": 0.0,
"max": 30.0,
"step": 1.0
},
// ...
}
}
}
~~~~~~
\subsubsection sec_project_project_params Calculation Parameters
Non-structural parameters that are needed for the input files of the calculators are passed
in a pmsco.project.CalculatorParams object.
This object should be created and filled in the `create_params` method of the project class.
The following example is from the twoatoms demo project:
~~~~~~{.py}
# under class MyProject(pmsco.project.Project):
def create_params(self, model, index):
params = pmsco.project.CalculatorParams()
# meta data
params.title = "two-atom demo"
params.comment = "{0} {1}".format(self.__class__, index)
# initial state and binding energy
initial_state = self.scans[index.scan].initial_state
params.initial_state = initial_state
emitter = self.scans[index.scan].emitter
params.binding_energy = pt.elements.symbol(emitter).binding_energy[initial_state]
# experimental setup
params.polarization = "H"
params.polar_incidence_angle = 60.0
params.azimuthal_incidence_angle = 0.0
params.experiment_temperature = 300.0
# material parameters
params.z_surface = model['Zsurf']
params.work_function = 4.5
params.inner_potential = model['V0']
params.debye_temperature = 356.0
# multiple-scattering parameters (EDAC)
params.emitters = []
params.lmax = 15
params.dmax = 5.0
params.orders = [25]
return params
~~~~~~
Most of the code is generic and can be copied to other projects.
Only the experimental and material parameters need to be adjusted.
Other properties can be changed as needed, see the documentation of pmsco.project.CalculatorParams for details.
\subsection sec_project_args Passing Runtime Parameters
Runtime parameters can be passed in one of three ways:
1. hard-coded in the project module,
2. on the command line, or
3. in a JSON run-file.
In the first way, all parameters are hard-coded in the `create_project` function of the project module.
This is the simplest way for a quick start to a small project.
However, as the project code grows, it's easy to loose track of revisions.
In programming it is usually best practice to separate code and data.
The command line is another option for passing parameters to a process.
It requires extra code for parsing the command line and is not very flexible.
It is difficult to pass complex data types.
Using the command line is no longer recommended and may become deprecated in a future version.
The recommended way of passing parameters is via run-files.
Run-files allow for complete separation of code and data in a generic and flexible way.
For example, run-files can be stored along with the results.
However, the semantics of the run-file may look intimidating at first.
\subsubsection sec_project_args_runfile Setting Up a Run-File
The usage and format of run-files is described in detail under @ref pag_runfile.
\subsubsection sec_project_args_code Hard-Coded Arguments
Hard-coded parameters are usually set in a `create_module` function of the project module.
At the end of the module, this function can easily be found.
The function has two purposes: to create the project object and to set parameters.
The parameters can be any attributes of the project class and its ancestors.
See the parent pmsco.project.Project class for a list of common attributes.
The `create_project` function may look like in the following example.
It must return a project object, i.e. an object instance of a class that inherits from pmsco.project.Project.
~~~~~~{.py}
def create_project():
project = MyProject()
project.optimizer_params["pop_size"] = 20
project_dir = Path(__file__).parent
scan_file = Path(project_dir, "hbnni_e156_int.etpi")
project.add_scan(filename=scan_file, emitter="N", initial_state="1s")
project.add_domain({"zrot": 0.0})
project.add_domain({"zrot": 60.0})
return project
~~~~~~
To have PMSCO call this function,
pass the file path of the containing module as the first command line argument of PMSCO, cf. @ref pag_command.
PMSCO calls this function in absence of a run-file.
\subsubsection sec_project_args_cmd Command Line
Since it is not recommended to pass calculation parameters on the command line,
this mechanism is not described in detail here.
It is, however, still available.
If you really need to use it,
have a look at the code of the pmsco.pmsco.main function
and how it calls the `create_project`, `parse_project_args` and `set_project_args` of the project module.
*/

333
docs/src/runfile.dox Normal file
View File

@ -0,0 +1,333 @@
/*! @page pag_runfile Run File
\section sec_runfile Run File
This section describes the format of a run-file.
Run-files are a new way of passing arguments to a PMSCO process which avoids cluttering up the command line.
It is more flexible than the command line
because run-files can assign a value to any property of the project object in an abstract way.
Moreover, there is no necessity for the project code to parse the command line.
\subsection sec_runfile_how How It Works
Run-files are text files in [JSON](https://en.wikipedia.org/wiki/JSON) format
which shares most syntax elements with Python.
JSON files contain nested dictionaries, lists, strings and numbers.
In PMSCO, run-files contain a dictionary of parameters for the project object
which is the main container for calculation parameters, model objects and links to data files.
An abstract run-file parser reads the run-file,
constructs the specified project object based on the custom project class
and assigns the attributes of the project object.
It's important to note that the parser does not recognize specific data types or classes.
All specific data handling is done by the instantiated objects, mainly the project class.
The parser can handle the following situations:
- Strings, numbers as well as dictionaries and lists of simple objects can be assigned directly to project attributes.
- If the project class defines an attribute as a _property_,
the class can execute custom code to import or validate data.
- The parser can instantiate an object from a class in the namespace of the project module
and assign its properties.
\subsection sec_runfile_general General File Format
Run-files must adhere to the [JSON](https://en.wikipedia.org/wiki/JSON) format,
which shares most syntax elements with Python.
Specifically, a JSON file can declare dictionaries, lists and simple objects
such as strings, numbers and `null`.
As one extension to plain JSON, PMSCO ignores line comments starting with a hash `#` or double-slash `//`.
This can be used to temporarily hide a parameter from the parser.
For example run-files, have a look at the twoatom demo project.
\subsection sec_runfile_project Project Specification
The following minimum run-file demonstrates how to specify the project at the top level:
~~~~~~{.py}
{
"project": {
"__module__": "projects.twoatom.twoatom",
"__class__": "TwoatomProject",
"mode": "single",
"output_file": "twoatom0001"
}
}
~~~~~~
Here, the `project` keyword denotes the dictionary that is used to construct the project object.
Within the project dictionary, the `__module__` key selects the Python module file that contains the project code,
and `__class__` refers to the name of the actual project class.
Further dictionary items correspond to attributes of the project class.
The module name is the same as would be used in a Python import statement.
It must be findable on the Python path.
PMSCO ensures that the directory containing the `pmsco` and `projects` sub-directories is on the Python path.
The class name must be in the namespace of the loaded module.
As PMSCO starts, it imports the specified module,
constructs an object of the specified project class,
and assigns any further items to project attributes.
In the example above, `twoatom0001` is assigned to the `output_file` property.
Any attributes not specified in the run-file will remain at their default values
that were set byt the `__init__` method of the project class.
Note that parameter names must start with an alphabetic character, else they are ignored.
This provides another way to temporarily ignore an item from the file besides line comments.
Also note that PMSCO does not spell-check parameter names.
The parameter values are just written to the corresponding object attribute.
If a name is misspelled, the value will be written under the wrong name and missed by the code eventually.
PMSCO carries out only some most important checks on the given parameter values.
Incorrect values may lead to improper operation or exceptions later in the calculations.
\subsection sec_runfile_common Common Arguments
The following table lists some important parameters controlling the calculations.
They are declared in the pmsco.projects.Project class.
| Key | Values | Description |
| --- | --- | --- |
| mode | `single` (default), `grid`, `swarm`, `genetic`, `table`, `test`, `validate` | Operation mode. `validate` can be used to check the syntax of the run-file, the process exits before starting calculations. |
| directories | dictionary | This dictionary lists common file paths used in the project. It contains keys such as `home`, `project`, `output` (see documentation of Project class in pmsco.project). Enclosed in curly braces, the keys can be used as placeholders in filenames. |
| output_dir | path | Shortcut for directories["output"] |
| data_dir | path | Shortcut for directories["data"] |
| job_name | string, must be a valid file name | Base name for all produced output files. It is recommended to set a unique name for each calculation run. Do not include a path. The path can be set in _output_dir_. |
| cluster_generator | dictionary | Class name and attributes of the cluster generator. See below. |
| atomic_scattering_factory | string<br>Default: InternalAtomicCalculator from pmsco.calculators.calculator | Class name of the atomic scattering calculator. This name must be in the namespace of the project module. |
| multiple_scattering_factory | string<br>Default: EdacCalculator from pmsco.calculators.edac | Class name of the multiple scattering calculator. This name must be in the namespace of the project module. |
| model_space | dictionary | See @ref sec_runfile_space below. |
| domains | list of dictionaries | See @ref sec_runfile_domains below. |
| scans | list of dictionaries | See @ref sec_runfile_scans below. |
| optimizer_params | dictionary | See @ref sec_runfile_optimizer below. |
The following table lists some common control parameters and metadata
that affect the behaviour of the program but do not affect the calculation results.
The job metadata is used to identify and describe a job in the results database if requested.
| Key | Values | Description |
| --- | --- | --- |
| job_tags | list of strings | User-specified job tags (metadata). |
| description | string | Description of the calculation job (metadata) |
| time_limit | decimal number<br>Default: 24. | Wall time limit in hours. The optimizers try to finish before the limit. This cannot be guaranteed, however. |
| keep_files | list of file categories | Output file categories to keep after the calculation. Multiple values can be specified and must be separated by spaces. By default, cluster and model (simulated data) of a limited number of best models are kept. See @ref sec_runfile_files below. |
| keep_best | integer number<br>Default: 10 | number of best models for which result files should be kept. |
| keep_level | integer number<br>Default: 1 | numeric task level down to which files are kept. 1 = scan level, 2 = domain level, etc. |
| log_level | DEBUG, INFO, WARNING, ERROR, CRITICAL | Minimum level of messages that should be added to the log. Empty string turns off logging. |
| log_file | file system path<br>Default: job_name + ".log". | Name of the main log file. Under MPI, the rank of the process is inserted before the extension. The log name is created in the working directory. |
\subsection sec_runfile_space Model Space
The `model_space` parameter is a dictionary of model parameters.
The key is the name of the parameter as used by the cluster and input-formatting code,
the value is a dictionary holding the `start`, `min`, `max`, `step` values to be used by the optimizer.
~~~~~~{.py}
{
"project": {
// ...
"model_space": {
"dAB": {
"start": 2.109,
"min": 2.0,
"max": 2.25,
"step": 0.05
},
"pAB": {
"start": 15.0,
"min": 0.0,
"max": 30.0,
"step": 1.0
},
// ...
}
}
}
~~~~~~
\subsection sec_runfile_domains Domains
Domains is a list of dictionaries.
Each dictionary holds keys describing the domain to the cluster and input-formatting code.
The meaning of these keys is up to the project.
~~~~~~{.py}
{
"project": {
// ...
"domains": [
{"surface": "Te", "doping": null, "zrot": 0.0},
{"surface": "Te", "doping": null, "zrot": 60.0}
],
}
}
~~~~~~
\subsection sec_runfile_scans Experimental Scan Files
The pmsco.project.Scan objects used in the calculation cannot be instantiated from the run-file directly.
Instead, the scans object is a list of scan creators/loaders which specify what to do to create a Scan object.
The pmsco.project module defines three scan creators: ScanLoader, ScanCreator and ScanKey.
The following code block shows an example of each of the three:
~~~~~~{.py}
{
"project": {
// ...
"scans": [
{
"__class__": "pmsco.project.ScanCreator",
"filename": "twoatom_energy_alpha.etpai",
"emitter": "N",
"initial_state": "1s",
"positions": {
"e": "np.arange(10, 400, 5)",
"t": "0",
"p": "0",
"a": "np.linspace(-30, 30, 31)"
}
},
{
"__class__": "pmsco.project.ScanLoader",
"filename": "{project}/twoatom_hemi_250e.etpi",
"emitter": "N",
"initial_state": "1s",
"is_modf": false
},
{
"__class__": "pmsco_project.ScanKey",
"key": "Ge3s113tp"
}
]
}
}
~~~~~~
The class name must be specified as it would be called in the custom project module.
`pmsco.project` must, thus, be imported in the custom project module.
The *ScanCreator* object creates a scan using Numpy array constructors in `positions`.
In the example above, a two-dimensional rectangular energy-alpha scan grid is created.
The values of the positions axes are passed to Python's `eval` function
and must return a one-dimensional Numpy `ndarray`.
The `emitter` and `initial_state` keys define the probed core level.
The *ScanLoader* object loads a data file, specified under `filename`.
The filename can include a placeholder which is replaced by the corresponding item from Project.directories.
Note that some of the directories (including `project`) are pre-set by PMSCO.
It is recommended to add a `data` key under `directories` in the run-file
if the data files are outside of the PMSCO directory tree.
The `is_modf` key indicates whether the file contains a modulation function (`true`) or intensity (`false`).
In the latter case, the modulation function is calculated after loading.
The *ScanKey* is the shortest scan specification in the run-file.
It is a shortcut to a complete scan description in `scan_dict` dictionary in the project object.
The `scan_dict` must be set up in the `__init__` method of the project class.
The `key` item specifies which key of `scan_dict` should be used to create the Scan object.
Each item of `scan_dict` holds a dictionary
that in turn holds the attributes for either a `ScanCreator` or a `ScanLoader`.
If it contains a `positions` key, it represents a `ScanCreator`, else a `ScanLoader`.
\subsection sec_runfile_optimizer Optimizer Parameters
The `optimizer_params` is a dictionary holding one or more of the following items.
| Key | Values | Description |
| --- | --- | --- |
| pop-size | integer<br>The default value is the greater of 4 or the number of parallel calculation processes. | Population size (number of particles) in swarm and genetic optimization mode. |
| seed-file | file system path | Name of the population seed file. Population data of previous optimizations can be used to seed a new optimization. The file must have the same structure as the .pop or .dat files. See @ref pmsco.project.Project.seed_file. |
| table-file | file system path | Name of the model table file in table scan mode. |
\subsubsection sec_runfile_files File Categories
The following category names can be used with the `keep_files` option.
Multiple names can be specified as a list.
| Category | Description | Default Action |
| --- | --- | --- |
| all | shortcut to include all categories | |
| input | raw input files for calculator, including cluster and phase files in custom format | delete |
| output | raw output files from calculator | delete |
| atomic | atomic scattering and emission files in portable format | delete |
| cluster | cluster files in portable XYZ format for report | keep |
| debug | debug files | delete |
| model | output files in ETPAI format: complete simulation (a_-1_-1_-1_-1) | keep |
| scan | output files in ETPAI format: scan (a_b_-1_-1_-1) | keep |
| domain | output files in ETPAI format: domain (a_b_c_-1_-1) | delete |
| emitter | output files in ETPAI format: emitter (a_b_c_d_-1) | delete |
| region | output files in ETPAI format: region (a_b_c_d_e) | delete |
| report| final report of results | keep always |
| population | final state of particle population | keep |
| rfac | files related to models which give bad r-factors, see warning below | delete |
\note
The `report` category is always kept and cannot be turned off.
The `model` category is always kept in single calculation mode.
\warning
If you want to specify `rfac` with the `keep_files` option,
you have to add the file categories that you want to keep, e.g.,
`"keep_files": ["rfac", "cluster", "model", "scan", "population"]`
(to return the default categories for all calculated models).
Do not specify `rfac` alone as this will effectively not return any file.
\subsection sec_runfile_schedule Job Scheduling
To submit a job to a resource manager such as Slurm, add a `schedule` section to the run file
(section ordering is not important):
~~~~~~{.py}
{
"schedule": {
"__module__": "pmsco.schedule",
"__class__": "PsiRaSchedule",
"nodes": 1,
"tasks_per_node": 24,
"walltime": "2:00",
"manual_run": true,
"enabled": true
},
"project": {
"__module__": "projects.twoatom.twoatom",
"__class__": "TwoatomProject",
"mode": "single",
"output_file": "{home}/pmsco/twoatom0001",
...
}
}
~~~~~~
In the same way as for the project, the `__module__` and `__class__` keys select the class that handles the job submission.
In this example, it is pmsco.schedule.PsiRaSchedule which is tied to the Ra cluster at PSI.
For other machines, you can sub-class one of the classes in the pmsco.schedule module and include it in your project module.
The parameters of pmsco.schedule.PsiRaSchedule are as follows.
Some of them are also used in other schedule classes or may have different types or ranges.
| Key | Values | Description |
| --- | --- | --- |
| nodes | integer: 1..2 | Number of compute nodes (main boards on Ra). The maximum number available for PEARL is 2. |
| tasks_per_node | integer: 1..24, 32 | Number of tasks (CPU cores on Ra) per node. Jobs with less than 24 tasks are assigned to the shared partition. |
| wall_time | string: [days-]hours[:minutes[:seconds]] <br> dict: with any combination of days, hours, minutes, seconds | Maximum run time (wall time) of the job. |
| manual | bool | Manual submission (true) or automatic submission (false). Manual submission allows you to inspect the job files before submission. |
| enabled | bool | Enable scheduling (true). Otherwise, the calculation is started directly (false).
@note The calculation job may run in a different working directory than the current one.
It is important to specify absolute data and output directories in the run file (project/directories section).
*/

View File

@ -2,21 +2,19 @@
skinparam componentStyle uml2
component "project" as project
component "PMSCO" as pmsco
component "project" as project
component "scattering code\n(calculator)" as calculator
interface "command line" as cli
interface "input files" as input
interface "output files" as output
interface "experimental data" as data
interface "results" as results
interface "output files" as output
cli --> pmsco
data -> project
project ..> pmsco
pmsco ..> project
pmsco ..> calculator
cli --> project
input -> calculator
calculator -> output
pmsco -> results

View File

@ -1,117 +0,0 @@
BootStrap: debootstrap
OSVersion: bionic
MirrorURL: http://ch.archive.ubuntu.com/ubuntu/
%help
a singularity container for PMSCO.
git clone requires an ssh key for git.psi.ch.
try agent forwarding (-A option to ssh).
#%setup
# executed on the host system outside of the container before %post
#
# this will be inside the container
# touch ${SINGULARITY_ROOTFS}/tacos.txt
# this will be on the host
# touch avocados.txt
#%files
# files are copied before %post
#
# this copies to root
# avocados.txt
# this copies to /opt
# avocados.txt /opt
#
# this does not work
# ~/.ssh/known_hosts /etc/ssh/ssh_known_hosts
# ~/.ssh/id_rsa /etc/ssh/id_rsa
%labels
Maintainer Matthias Muntwiler
Maintainer_Email matthias.muntwiler@psi.ch
Python_Version 2.7
%environment
export PATH="/usr/local/miniconda3/bin:$PATH"
export PYTHON_VERSION=2.7
export SINGULAR_BRANCH="singular"
export LC_ALL=C
%post
export PYTHON_VERSION=2.7
export LC_ALL=C
sed -i 's/$/ universe/' /etc/apt/sources.list
apt-get update
apt-get -y install \
binutils \
build-essential \
doxygen \
doxypy \
f2c \
g++ \
gcc \
gfortran \
git \
graphviz \
libblas-dev \
liblapack-dev \
libopenmpi-dev \
make \
nano \
openmpi-bin \
openmpi-common \
sqlite3 \
wget
apt-get clean
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p /usr/local/miniconda3
export PATH="/usr/local/miniconda3/bin:$PATH"
conda create -q --yes -n pmsco python=${PYTHON_VERSION}
. /usr/local/miniconda3/bin/activate pmsco
conda install -q --yes -n pmsco \
pip \
"numpy>=1.13" \
scipy \
ipython \
matplotlib \
nose \
mock \
future \
statsmodels \
swig
conda clean --all -y
/usr/local/miniconda3/envs/pmsco/bin/pip install periodictable attrdict fasteners mpi4py
#%test
# test the image after build
%runscript
# executes command from command line
. /usr/local/miniconda3/bin/activate pmsco
exec echo "$@"
%apprun install
. /usr/local/miniconda3/bin/activate pmsco
cd ~
git clone https://git.psi.ch/pearl/pmsco.git pmsco
cd pmsco
git checkout develop
git checkout -b ${SINGULAR_BRANCH}
make all
nosetests
%apprun python
. /usr/local/miniconda3/bin/activate pmsco
exec python "${@}"
%apprun conda
. /usr/local/miniconda3/bin/activate pmsco
exec conda "${@}"

View File

@ -3,10 +3,11 @@ OSVersion: bionic
MirrorURL: http://ch.archive.ubuntu.com/ubuntu/
%help
a singularity container for PMSCO.
A singularity container for PMSCO.
git clone requires an ssh key for git.psi.ch.
try agent forwarding (-A option to ssh).
singularity run -e pmsco.sif path/to/pmsco -r path/to/your-runfile
path/to/pmsco must point to the directory that contains the __main__.py file.
#%setup
# executed on the host system outside of the container before %post
@ -34,22 +35,25 @@ try agent forwarding (-A option to ssh).
Python_Version 3
%environment
export PATH="/usr/local/miniconda3/bin:$PATH"
export PYTHON_VERSION=3
export SINGULAR_BRANCH="singular"
export LC_ALL=C
export PYTHON_VERSION=3
export CONDA_ROOT=/opt/miniconda
export PLANTUML_JAR_PATH=/opt/plantuml/plantuml.jar
export SINGULAR_BRANCH="singular"
%post
export PYTHON_VERSION=3
export LC_ALL=C
export PYTHON_VERSION=3
export CONDA_ROOT=/opt/miniconda
export PLANTUML_ROOT=/opt/plantuml
sed -i 's/$/ universe/' /etc/apt/sources.list
apt-get update
apt-get -y install \
binutils \
build-essential \
default-jre \
doxygen \
doxypy \
f2c \
g++ \
gcc \
@ -67,11 +71,11 @@ try agent forwarding (-A option to ssh).
apt-get clean
wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O ~/miniconda.sh
bash ~/miniconda.sh -b -p /usr/local/miniconda3
export PATH="/usr/local/miniconda3/bin:$PATH"
bash ~/miniconda.sh -b -p ${CONDA_ROOT}
. ${CONDA_ROOT}/bin/activate
conda create -q --yes -n pmsco python=${PYTHON_VERSION}
. /usr/local/miniconda3/bin/activate pmsco
conda activate pmsco
conda install -q --yes -n pmsco \
pip \
"numpy>=1.13" \
@ -82,35 +86,36 @@ try agent forwarding (-A option to ssh).
mock \
future \
statsmodels \
swig
swig \
gitpython
conda clean --all -y
/usr/local/miniconda3/envs/pmsco/bin/pip install periodictable attrdict fasteners mpi4py
pip install periodictable attrdict commentjson fasteners mpi4py doxypypy
mkdir ${PLANTUML_ROOT}
wget -O ${PLANTUML_ROOT}/plantuml.jar https://sourceforge.net/projects/plantuml/files/plantuml.jar/download
#%test
# test the image after build
%runscript
# executes command from command line
source /usr/local/miniconda3/bin/activate pmsco
exec echo "$@"
. ${CONDA_ROOT}/etc/profile.d/conda.sh
conda activate pmsco
exec python "$@"
%apprun install
source /usr/local/miniconda3/bin/activate pmsco
. ${CONDA_ROOT}/etc/profile.d/conda.sh
conda activate pmsco
cd ~
git clone https://git.psi.ch/pearl/pmsco.git pmsco
cd pmsco
git checkout develop
git checkout master
git checkout -b ${SINGULAR_BRANCH}
make all
nosetests -w tests/
%apprun compile
. ${CONDA_ROOT}/etc/profile.d/conda.sh
conda activate pmsco
make all
nosetests
%apprun python
source /usr/local/miniconda3/bin/activate pmsco
exec python "${@}"
%apprun conda
source /usr/local/miniconda3/bin/activate pmsco
exec conda "${@}"

View File

@ -12,8 +12,8 @@ Vagrant.configure("2") do |config|
# Every Vagrant development environment requires a box. You can search for
# boxes at https://vagrantcloud.com/search.
config.vm.box = "singularityware/singularity-2.4"
config.vm.box_version = "2.4"
config.vm.box = "sylabs/singularity-3.7-ubuntu-bionic64"
config.vm.box_version = "3.7"
# Disable automatic box update checking. If you disable this, then
# boxes will only be checked for updates when the user runs

View File

@ -8,16 +8,13 @@ python pmsco [pmsco-arguments]
@endverbatim
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
from pathlib import Path
import sys
import os.path
file_dir = os.path.dirname(__file__) or '.'
root_dir = os.path.join(file_dir, '..')
root_dir = os.path.abspath(root_dir)
sys.path[0] = root_dir
pmsco_root = Path(__file__).resolve().parent.parent
if str(pmsco_root) not in sys.path:
sys.path.insert(0, str(pmsco_root))
if __name__ == '__main__':
import pmsco.pmsco

View File

@ -13,8 +13,9 @@ SHELL=/bin/sh
.PHONY: all clean phagen
FC?=gfortran
FCOPTS?=-std=legacy
F2PY?=f2py
F2PYOPTS?=
F2PYOPTS?=--f77flags=-std=legacy --f90flags=-std=legacy
CC?=gcc
CCOPTS?=
SWIG?=swig

View File

@ -17,22 +17,20 @@ pip install --user periodictable
@author Matthias Muntwiler
@copyright (c) 2015-20 by Paul Scherrer Institut @n
@copyright (c) 2015-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import math
import numpy as np
import periodictable as pt
import sys
import pmsco.config as config
## default file format identifier
FMT_DEFAULT = 0
## MSC file format identifier
@ -227,13 +225,13 @@ class Cluster(object):
"""
self.rmax = r
def build_element(self, index, element_number, x, y, z, emitter, charge=0., scatterer_class=0):
def build_element(self, index, element, x, y, z, emitter, charge=0., scatterer_class=0):
"""
build a tuple in the format of the internal data array.
@param index: (int) index
@param element_number: (int) chemical element number
@param element: chemical element number (int) or symbol (str)
@param x, y, z: (float) atom coordinates in the cluster
@ -243,7 +241,13 @@ class Cluster(object):
@param scatterer_class: (int) scatterer class. default = 0.
"""
symbol = pt.elements[element_number].symbol
try:
element_number = int(element)
symbol = pt.elements[element_number].symbol
except ValueError:
symbol = element
element_number = pt.elements.symbol(symbol.strip()).number
element = (index, element_number, symbol, scatterer_class, x, y, z, int(emitter), charge)
return element
@ -251,7 +255,7 @@ class Cluster(object):
"""
add a single atom to the cluster.
@param atomtype: (int) chemical element number
@param atomtype: chemical element number (int) or symbol (str)
@param v_pos: (numpy.ndarray, shape = (3)) position vector
@ -274,7 +278,7 @@ class Cluster(object):
self.rmax (maximum distance from the origin).
all atoms are non-emitters.
@param atomtype: (int) chemical element number
@param atomtype: chemical element number (int) or symbol (str)
@param v_pos: (numpy.ndarray, shape = (3))
position vector of the first atom (basis vector)
@ -307,7 +311,7 @@ class Cluster(object):
and z_surf (position of the surface).
all atoms are non-emitters.
@param atomtype: (int) chemical element number
@param atomtype: chemical element number (int) or symbol (str)
@param v_pos: (numpy.ndarray, shape = (3))
position vector of the first atom (basis vector)
@ -1133,7 +1137,7 @@ class Cluster(object):
np.savetxt(f, data, fmt=file_format, header=header, comments="")
class ClusterGenerator(object):
class ClusterGenerator(config.ConfigurableObject):
"""
cluster generator class.
@ -1151,6 +1155,7 @@ class ClusterGenerator(object):
@param project: reference to the project object.
cluster generators may need to look up project parameters.
"""
super().__init__()
self.project = project
def count_emitters(self, model, index):
@ -1258,7 +1263,7 @@ class LegacyClusterGenerator(ClusterGenerator):
"""
def __init__(self, project):
super(LegacyClusterGenerator, self).__init__(project)
super().__init__(project)
def count_emitters(self, model, index):
"""

120
pmsco/config.py Normal file
View File

@ -0,0 +1,120 @@
"""
@package pmsco.config
infrastructure for configurable objects
@author Matthias Muntwiler
@copyright (c) 2021 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
import collections.abc
import functools
import inspect
import logging
from pathlib import Path
logger = logging.getLogger(__name__)
def resolve_path(path, dirs):
"""
resolve a file path by replacing placeholders
placeholders are enclosed in curly braces.
values for all possible placeholders are provided in a dictionary.
@param path: str, Path or other path-like.
example: '{work}/test/testfile.dat'.
@param dirs: dictionary mapping placeholders to project paths.
the paths can be str, Path or other path-like
example: {'work': '/home/user/work'}
@return: pathlib.Path object
"""
return Path(*(p.format(**dirs) for p in Path(path).parts))
class ConfigurableObject(object):
"""
Parent class for objects that can be configured by a run file
the run file is a JSON file that contains object data in a nested dictionary structure.
in the dictionary structure the keys are property or attribute names of the object to be initialized.
keys starting with a non-alphabetic character (except for some special keys like __class__) are ignored.
these can be used as comments, or they protect private attributes.
the values can be numeric values, strings, lists or dictionaries.
simple values are simply assigned using setattr.
this may call a property setter if defined.
lists are iterated. each item is appended to the attribute.
the attribute must implement an append method in this case.
if an item is a dictionary and contains the special key '__class__',
an object of that class is instantiated and recursively initialized with the dictionary elements.
this requires that the class can be found in the module scope passed to the parser methods,
and that the class inherits from this class.
cases that can't be covered easily using this mechanism
should be implemented in a property setter.
value-checking should also be done in a property setter (or the append method in sequence-like objects).
"""
def __init__(self):
pass
def set_properties(self, module, data_dict, project):
"""
set properties of this class.
@param module: module reference that should be used to resolve class names.
this is usually the project module.
@param data_dict: dictionary of properties to set.
see the class description for details.
@param project: reference to the project object.
@return: None
"""
for key in data_dict:
if key[0].isalpha():
self.set_property(module, key, data_dict[key], project)
def set_property(self, module, key, value, project):
obj = self.parse_object(module, value, project)
if hasattr(self, key):
if obj is not None:
if isinstance(obj, collections.abc.MutableSequence):
attr = getattr(self, key)
for item in obj:
attr.append(item)
elif isinstance(obj, collections.abc.Mapping):
d = getattr(self, key)
if d is not None and isinstance(d, collections.abc.MutableMapping):
d.update(obj)
else:
setattr(self, key, obj)
else:
setattr(self, key, obj)
else:
setattr(self, key, obj)
else:
logger.warning(f"class {self.__class__.__name__} does not have attribute {key}.")
def parse_object(self, module, value, project):
if isinstance(value, collections.abc.MutableMapping) and "__class__" in value:
cn = value["__class__"].split('.')
c = functools.reduce(getattr, cn, module)
s = inspect.signature(c)
if 'project' in s.parameters:
o = c(project=project)
else:
o = c()
o.set_properties(module, value, project)
elif isinstance(value, collections.abc.MutableSequence):
o = [self.parse_object(module, i, project) for i in value]
else:
o = value
return o

View File

@ -4,16 +4,13 @@ calculation dispatcher.
@author Matthias Muntwiler
@copyright (c) 2015 by Paul Scherrer Institut @n
@copyright (c) 2015-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import os
import os.path
import datetime
@ -21,10 +18,20 @@ import signal
import collections
import copy
import logging
import math
from attrdict import AttrDict
from mpi4py import MPI
try:
from mpi4py import MPI
mpi_comm = MPI.COMM_WORLD
mpi_size = mpi_comm.Get_size()
mpi_rank = mpi_comm.Get_rank()
except ImportError:
MPI = None
mpi_comm = None
mpi_size = 1
mpi_rank = 0
from pmsco.helpers import BraceMessage as BMsg
logger = logging.getLogger(__name__)
@ -521,8 +528,7 @@ class MscoProcess(object):
#
# the default is 2 days after start.
def __init__(self, comm):
self._comm = comm
def __init__(self):
self._project = None
self._atomic_scattering = None
self._multiple_scattering = None
@ -829,12 +835,12 @@ class MscoMaster(MscoProcess):
# the values are handlers.TaskHandler objects.
# the objects can be accessed in attribute or dictionary notation.
def __init__(self, comm):
super(MscoMaster, self).__init__(comm)
def __init__(self):
super().__init__()
self._pending_tasks = collections.OrderedDict()
self._running_tasks = collections.OrderedDict()
self._complete_tasks = collections.OrderedDict()
self._slaves = self._comm.Get_size() - 1
self._slaves = mpi_size - 1
self._idle_ranks = []
self.max_calculations = 1000000
self._calculations = 0
@ -879,8 +885,8 @@ class MscoMaster(MscoProcess):
self._idle_ranks = list(range(1, self._running_slaves + 1))
self._root_task = CalculationTask()
self._root_task.file_root = project.output_file
self._root_task.model = project.create_model_space().start
self._root_task.file_root = str(project.output_file)
self._root_task.model = project.model_space.start
for level in self.task_levels:
self.task_handlers[level] = project.handler_classes[level]()
@ -1033,7 +1039,7 @@ class MscoMaster(MscoProcess):
else:
logger.debug("assigning task %s to rank %u", str(task.id), rank)
self._running_tasks[task.id] = task
self._comm.send(task.get_mpi_message(), dest=rank, tag=TAG_NEW_TASK)
mpi_comm.send(task.get_mpi_message(), dest=rank, tag=TAG_NEW_TASK)
self._calculations += 1
else:
if not self._finishing:
@ -1055,7 +1061,7 @@ class MscoMaster(MscoProcess):
while self._idle_ranks:
rank = self._idle_ranks.pop()
logger.debug("send finish tag to rank %u", rank)
self._comm.send(None, dest=rank, tag=TAG_FINISH)
mpi_comm.send(None, dest=rank, tag=TAG_FINISH)
self._running_slaves -= 1
def _receive_result(self):
@ -1065,7 +1071,7 @@ class MscoMaster(MscoProcess):
if self._running_slaves > 0:
logger.debug("waiting for calculation result")
s = MPI.Status()
data = self._comm.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=s)
data = mpi_comm.recv(source=MPI.ANY_SOURCE, tag=MPI.ANY_TAG, status=s)
if s.tag == TAG_NEW_RESULT:
task_id = self._accept_task_done(data)
@ -1185,8 +1191,8 @@ class MscoSlave(MscoProcess):
#
# typically, a task is aborted when an exception is encountered.
def __init__(self, comm):
super(MscoSlave, self).__init__(comm)
def __init__(self):
super().__init__()
self._errors = 0
self._max_errors = 5
@ -1199,7 +1205,7 @@ class MscoSlave(MscoProcess):
self._running = True
while self._running:
logger.debug("waiting for message")
data = self._comm.recv(source=0, tag=MPI.ANY_TAG, status=s)
data = mpi_comm.recv(source=0, tag=MPI.ANY_TAG, status=s)
if s.tag == TAG_NEW_TASK:
logger.debug("received new task")
self.accept_task(data)
@ -1229,17 +1235,17 @@ class MscoSlave(MscoProcess):
logger.exception(BMsg("unhandled exception in calculation task {0}", task.id))
self._errors += 1
if self._errors <= self._max_errors:
self._comm.send(data, dest=0, tag=TAG_INVALID_RESULT)
mpi_comm.send(data, dest=0, tag=TAG_INVALID_RESULT)
else:
logger.error("too many exceptions, aborting")
self._running = False
self._comm.send(data, dest=0, tag=TAG_ERROR_ABORTING)
mpi_comm.send(data, dest=0, tag=TAG_ERROR_ABORTING)
else:
logger.debug(BMsg("sending result of task {0} to master", result.id))
self._comm.send(result.get_mpi_message(), dest=0, tag=TAG_NEW_RESULT)
mpi_comm.send(result.get_mpi_message(), dest=0, tag=TAG_NEW_RESULT)
def run_master(mpi_comm, project):
def run_master(project):
"""
initialize and run the master calculation loop.
@ -1251,25 +1257,25 @@ def run_master(mpi_comm, project):
if an unhandled exception occurs, this function aborts the MPI communicator, killing all MPI processes.
the caller will not have a chance to handle the exception.
@param mpi_comm: MPI communicator (mpi4py.MPI.COMM_WORLD).
@param project: project instance (sub-class of project.Project).
"""
try:
master = MscoMaster(mpi_comm)
master = MscoMaster()
master.setup(project)
master.run()
master.cleanup()
except (SystemExit, KeyboardInterrupt):
mpi_comm.Abort()
if mpi_comm:
mpi_comm.Abort()
raise
except Exception:
logger.exception("unhandled exception in master calculation loop.")
mpi_comm.Abort()
if mpi_comm:
mpi_comm.Abort()
raise
def run_slave(mpi_comm, project):
def run_slave(project):
"""
initialize and run the slave calculation loop.
@ -1282,12 +1288,10 @@ def run_slave(mpi_comm, project):
unless it is a SystemExit or KeyboardInterrupt (where we expect that the master also receives the signal),
the MPI communicator is aborted, killing all MPI processes.
@param mpi_comm: MPI communicator (mpi4py.MPI.COMM_WORLD).
@param project: project instance (sub-class of project.Project).
"""
try:
slave = MscoSlave(mpi_comm)
slave = MscoSlave()
slave.setup(project)
slave.run()
slave.cleanup()
@ -1295,7 +1299,8 @@ def run_slave(mpi_comm, project):
raise
except Exception:
logger.exception("unhandled exception in slave calculation loop.")
mpi_comm.Abort()
if mpi_comm:
mpi_comm.Abort()
raise
@ -1307,12 +1312,9 @@ def run_calculations(project):
@param project: project instance (sub-class of project.Project).
"""
mpi_comm = MPI.COMM_WORLD
mpi_rank = mpi_comm.Get_rank()
if mpi_rank == 0:
logger.debug("MPI rank %u setting up master loop", mpi_rank)
run_master(mpi_comm, project)
run_master(project)
else:
logger.debug("MPI rank %u setting up slave loop", mpi_rank)
run_slave(mpi_comm, project)
run_slave(project)

View File

@ -1,7 +0,0 @@
/* EDAC interface for other programs */
%module edac
%{
extern int run_script(char *scriptfile);
%}
extern int run_script(char *scriptfile);

File diff suppressed because it is too large Load Diff

View File

@ -10,6 +10,8 @@ the binding energies are compiled from Gwyn Williams' web page
(https://userweb.jlab.org/~gwyn/ebindene.html).
please refer to the original web page or the x-ray data booklet
for original sources, definitions and remarks.
binding energies of gases are replaced by respective values of a common compound
from the 'handbook of x-ray photoelectron spectroscopy' (physical electronics, inc., 1995).
usage
-----
@ -52,15 +54,47 @@ from pmsco.compat import open
index_energy = np.zeros(0)
index_number = np.zeros(0)
index_term = []
default_data_path = os.path.join(os.path.dirname(__file__), "bindingenergy.json")
def load_data():
data_path = os.path.join(os.path.dirname(__file__), "bindingenergy.json")
def load_data(data_path=None):
"""
load binding energy data from json file
the data file must be in the same format as generated by save_data.
@param file path of the data file. default: "bindingenergy.json" next to this module file
@return dictionary
"""
if data_path is None:
data_path = default_data_path
with open(data_path) as fp:
data = json.load(fp)
return data
def save_data(data_path=None):
"""
save binding energy data to json file
@param file path of the data file. default: "bindingenergy.json" next to this module file
@return None
"""
if data_path is None:
data_path = default_data_path
data = {}
for element in pt.elements:
element_data = {}
for term, energy in element.binding_energy.items():
element_data[term] = energy
if element_data:
data[element.number] = element_data
with open(data_path, 'w', 'utf8') as fp:
json.dump(data, fp, sort_keys=True, indent='\t')
def init(table, reload=False):
if 'binding_energy' in table.properties and not reload:
return
@ -142,6 +176,9 @@ def export_flat_text(f):
"""
export the binding energies to a flat general text file.
the file has four space-separated columns `number`, `symbol`, `term`, `energy`.
column names are included in the first row.
@param f: file path or open file object
@return: None
"""
@ -153,3 +190,23 @@ def export_flat_text(f):
else:
with open(f, "w") as fi:
export_flat_text(fi)
def import_flat_text(f):
"""
import binding energies from a flat general text file.
data is in space-separated columns.
the first row contains column names.
at least the columns `number`, `term`, `energy` must be present.
the function updates existing entries and appends entries of non-existing terms.
existing terms that are not listed in the file remain unchanged.
@param f: file path or open file object
@return: None
"""
data = np.atleast_1d(np.genfromtxt(f, names=True, dtype=None, encoding="utf8"))
for d in data:
pt.elements[d['number']].binding_energy[d['term']] = d['energy']

View File

@ -92,6 +92,8 @@ def get_cross_section(photon_energy, element, nlj):
@return: (float) cross section in Mb.
"""
nl = nlj[0:2]
if not hasattr(element, "photoionization"):
element = get_element(element)
try:
pet, cst = element.photoionization.cross_section[nl]
except KeyError:
@ -196,3 +198,11 @@ def plot_spectrum(photon_energy, elements, binding_energy=False, work_function=4
ax.set_ylabel('intensity')
ax.set_title(elements)
return fig, ax
def plot_cross_section(el, nlj):
energy = np.arange(100, 1500, 140)
cs = get_cross_section(energy, el, nlj)
fig, ax = plt.subplots()
ax.set_yscale("log")
ax.plot(energy, cs)

View File

@ -0,0 +1,443 @@
"""
@package pmsco.graphics.population
graphics rendering module for population dynamics.
the main function is render_genetic_chart().
this module is experimental.
interface and implementation are subject to change.
@author Matthias Muntwiler, matthias.muntwiler@psi.ch
@copyright (c) 2021 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
import logging
import numpy as np
import os
from pmsco.database import regular_params, special_params
logger = logging.getLogger(__name__)
try:
from matplotlib.figure import Figure
from matplotlib.backends.backend_agg import FigureCanvasAgg as FigureCanvas
# from matplotlib.backends.backend_pdf import FigureCanvasPdf
# from matplotlib.backends.backend_svg import FigureCanvasSVG
except ImportError:
Figure = None
FigureCanvas = None
logger.warning("error importing matplotlib. graphics rendering disabled.")
def _default_range(pos):
"""
determine a default range from actual values.
@param pos: (numpy.ndarray) 1-dimensional structured array of parameter values.
@return: range_min, range_max are dictionaries of the minimum and maximum values of each parameter.
"""
names = regular_params(pos.dtype.names)
range_min = {}
range_max = {}
for name in names:
range_min[name] = pos[name].min()
range_max[name] = pos[name].max()
return range_min, range_max
def _prune_constant_params(pnames, range_min, range_max):
"""
remove constant parameters from the list and range
@param pnames: (list)
@param range_min: (dict)
@param range_max: (dict)
@return:
"""
del_names = [name for name in pnames if range_max[name] <= range_min[name]]
for name in del_names:
pnames.remove(name)
del range_min[name]
del range_max[name]
def render_genetic_chart(output_file, input_data_or_file, model_space=None, generations=None, title=None, cmap=None,
canvas=None):
"""
produce a genetic chart from a given population.
a genetic chart is a pseudo-colour representation of the coordinates of each individual in the model space.
the axes are the particle number and the model parameter.
the colour is mapped from the relative position of a parameter value within the parameter range.
the chart should illustrate the diversity in the population.
converged parameters will show similar colours.
by comparing charts of different generations, the effect of the optimization algorithm can be examined.
though the chart type is designed for the genetic algorithm, it may be useful for other algorithms as well.
the function requires input in one of the following forms:
- a result (.dat) file or numpy structured array.
the array must contain regular parameters, as well as the _particle and _gen columns.
the function generates one chart per generation unless the generation argument is specified.
- a population (.pop) file or numpy structured array.
the array must contain regular parameters, as well as the _particle columns.
- a pmsco.optimizers.population.Population object with valid data.
the graphics file format can be changed by providing a specific canvas. default is PNG.
this function requires the matplotlib module.
if it is not available, the function raises an error.
@param output_file: path and base name of the output file without extension.
a generation index and the file extension according to the file format are appended.
@param input_data_or_file: a numpy structured ndarray of a population or result list from an optimization run.
alternatively, the file path of a result file (.dat) or population file (.pop) can be given.
file can be any object that numpy.genfromtxt() can handle.
@param model_space: model space can be a pmsco.project.ModelSpace object,
any object that contains the same min and max attributes as pmsco.project.ModelSpace,
or a dictionary with to keys 'min' and 'max' that provides the corresponding ModelSpace dictionaries.
by default, the model space boundaries are derived from the input data.
if a model_space is specified, only the parameters listed in it are plotted.
@param generations: (int or sequence) generation index or list of indices.
this index is used in the output file name and for filtering input data by generation.
if the input data does not contain the generation, no filtering is applied.
by default, no filtering is applied, and one graph for each generation is produced.
@param title: (str) title of the chart.
the title is a {}-style format string, where {base} is the output file name and {gen} is the generation.
default: derived from file name.
@param cmap: (str) name of colour map supported by matplotlib.
default is 'jet'.
other good-looking options are 'PiYG', 'RdBu', 'RdYlGn', 'coolwarm'.
@param canvas: a FigureCanvas class reference from a matplotlib backend.
if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
some other options are:
matplotlib.backends.backend_pdf.FigureCanvasPdf or
matplotlib.backends.backend_svg.FigureCanvasSVG.
@return (str) path and name of the generated graphics file.
empty string if an error occurred.
@raise TypeError if matplotlib is not available.
"""
try:
pos = np.copy(input_data_or_file.pos)
range_min = input_data_or_file.model_min
range_max = input_data_or_file.model_max
generations = [input_data_or_file.generation]
except AttributeError:
try:
pos = np.atleast_1d(np.genfromtxt(input_data_or_file, names=True))
except TypeError:
pos = np.copy(input_data_or_file)
range_min, range_max = _default_range(pos)
pnames = regular_params(pos.dtype.names)
if model_space is not None:
try:
# a ModelSpace-like object
range_min = model_space.min
range_max = model_space.max
except AttributeError:
# a dictionary-like object
range_min = model_space['min']
range_max = model_space['max']
try:
pnames = range_min.keys()
except AttributeError:
pnames = range_min.dtype.names
pnames = list(pnames)
_prune_constant_params(pnames, range_min, range_max)
if generations is None:
try:
generations = np.unique(pos['_gen'])
except ValueError:
pass
files = []
path, base = os.path.split(output_file)
if generations is not None and len(generations):
if title is None:
title = "{base} gen {gen}"
for generation in generations:
idx = np.where(pos['_gen'] == generation)
gpos = pos[idx]
gtitle = title.format(base=base, gen=int(generation))
out_filename = "{base}-{gen}".format(base=os.fspath(output_file), gen=int(generation))
out_filename = _render_genetic_chart_2(out_filename, gpos, pnames, range_min, range_max,
gtitle, cmap, canvas)
files.append(out_filename)
else:
if title is None:
title = "{base}"
gtitle = title.format(base=base, gen="")
out_filename = "{base}".format(base=os.fspath(output_file))
out_filename = _render_genetic_chart_2(out_filename, pos, pnames, range_min, range_max, gtitle, cmap, canvas)
files.append(out_filename)
return files
def _render_genetic_chart_2(out_filename, pos, pnames, range_min, range_max, title, cmap, canvas):
"""
internal part of render_genetic_chart()
this function calculates the relative position in the model space,
sorts the positions array by particle index,
and calls plot_genetic_chart().
@param out_filename:
@param pos:
@param pnames:
@param range_max:
@param range_min:
@param cmap:
@param canvas:
@return: out_filename
"""
spos = np.sort(pos, order='_particle')
rpos2d = np.zeros((spos.shape[0], len(pnames)))
for index, pname in enumerate(pnames):
rpos2d[:, index] = (spos[pname] - range_min[pname]) / (range_max[pname] - range_min[pname])
out_filename = plot_genetic_chart(out_filename, rpos2d, pnames, title=title, cmap=cmap, canvas=canvas)
return out_filename
def plot_genetic_chart(filename, rpos2d, param_labels, title=None, cmap=None, canvas=None):
"""
produce a genetic chart from the given data.
a genetic chart is a pseudo-colour representation of the coordinates of each individual in the model space.
the chart should highlight the amount of diversity in the population
and - by comparing charts of different generations - the changes due to mutation.
the axes are the model parameter (x) and particle number (y).
the colour is mapped from the relative position of a parameter value within the parameter range.
in contrast to render_genetic_chart() this function contains only the drawing code.
it requires input in the final form and does not do any checks, conversion or processing.
the graphics file format can be changed by providing a specific canvas. default is PNG.
this function requires the matplotlib module.
if it is not available, the function raises an error.
@param filename: path and name of the output file without extension.
@param rpos2d: (two-dimensional numpy array of numeric type)
relative positions of the particles in the model space.
dimension 0 (y-axis) is the particle index,
dimension 1 (x-axis) is the parameter index (in the order given by param_labels).
all values must be between 0 and 1.
@param param_labels: (sequence) list or tuple of parameter names.
@param title: (str) string to be printed as chart title. default is 'genetic chart'.
@param cmap: (str) name of colour map supported by matplotlib.
default is 'jet'.
other good-looking options are 'PiYG', 'RdBu', 'RdYlGn', 'coolwarm'.
@param canvas: a FigureCanvas class reference from a matplotlib backend.
if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
some other options are:
matplotlib.backends.backend_pdf.FigureCanvasPdf or
matplotlib.backends.backend_svg.FigureCanvasSVG.
@raise TypeError if matplotlib is not available.
"""
if canvas is None:
canvas = FigureCanvas
if cmap is None:
cmap = 'jet'
if title is None:
title = 'genetic chart'
fig = Figure()
canvas(fig)
ax = fig.add_subplot(111)
im = ax.imshow(rpos2d, aspect='auto', cmap=cmap, origin='lower')
im.set_clim((0.0, 1.0))
ax.set_xticks(np.arange(len(param_labels)))
ax.set_xticklabels(param_labels, rotation=45, ha="right", rotation_mode="anchor")
ax.set_ylabel('particle')
ax.set_title(title)
cb = ax.figure.colorbar(im, ax=ax)
cb.ax.set_ylabel("relative value", rotation=-90, va="bottom")
out_filename = "{base}.{ext}".format(base=filename, ext=canvas.get_default_filetype())
fig.savefig(out_filename)
return out_filename
def render_swarm(output_file, input_data, model_space=None, title=None, cmap=None, canvas=None):
"""
render a two-dimensional particle swarm population.
this function generates a schematic rendering of a particle swarm in two dimensions.
particles are represented by their position and velocity, indicated by an arrow.
the model space is projected on the first two (or selected two) variable parameters.
in the background, a scatter plot of results (dots with pseudocolor representing the R-factor) can be plotted.
the chart type is designed for the particle swarm optimization algorithm.
the function requires input in one of the following forms:
- position (.pos), velocity (.vel) and result (.dat) files or the respective numpy structured arrays.
the arrays must contain regular parameters, as well as the `_particle` column.
the result file must also contain an `_rfac` column.
- a pmsco.optimizers.population.Population object with valid data.
the graphics file format can be changed by providing a specific canvas. default is PNG.
this function requires the matplotlib module.
if it is not available, the function raises an error.
@param output_file: path and base name of the output file without extension.
a generation index and the file extension according to the file format are appended.
@param input_data: a pmsco.optimizers.population.Population object with valid data,
or a sequence of position, velocity and result arrays.
the arrays must be structured ndarrays corresponding to the respective Population members.
alternatively, the arrays can be referenced as file paths
in any format that numpy.genfromtxt() can handle.
@param model_space: model space can be a pmsco.project.ModelSpace object,
any object that contains the same min and max attributes as pmsco.project.ModelSpace,
or a dictionary with to keys 'min' and 'max' that provides the corresponding ModelSpace dictionaries.
by default, the model space boundaries are derived from the input data.
if a model_space is specified, only the parameters listed in it are plotted.
@param title: (str) title of the chart.
the title is a {}-style format string, where {base} is the output file name and {gen} is the generation.
default: derived from file name.
@param cmap: (str) name of colour map supported by matplotlib.
default is 'plasma'.
other good-looking options are 'viridis', 'plasma', 'inferno', 'magma', 'cividis'.
@param canvas: a FigureCanvas class reference from a matplotlib backend.
if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
some other options are:
matplotlib.backends.backend_pdf.FigureCanvasPdf or
matplotlib.backends.backend_svg.FigureCanvasSVG.
@return (str) path and name of the generated graphics file.
empty string if an error occurred.
@raise TypeError if matplotlib is not available.
"""
try:
range_min = input_data.model_min
range_max = input_data.model_max
pos = np.copy(input_data.pos)
vel = np.copy(input_data.vel)
rfac = np.copy(input_data.results)
generation = input_data.generation
except AttributeError:
try:
pos = np.atleast_1d(np.genfromtxt(input_data[0], names=True))
vel = np.atleast_1d(np.genfromtxt(input_data[1], names=True))
rfac = np.atleast_1d(np.genfromtxt(input_data[2], names=True))
except TypeError:
pos = np.copy(input_data[0])
vel = np.copy(input_data[1])
rfac = np.copy(input_data[2])
range_min, range_max = _default_range(rfac)
pnames = regular_params(pos.dtype.names)
if model_space is not None:
try:
# a ModelSpace-like object
range_min = model_space.min
range_max = model_space.max
except AttributeError:
# a dictionary-like object
range_min = model_space['min']
range_max = model_space['max']
try:
pnames = range_min.keys()
except AttributeError:
pnames = range_min.dtype.names
pnames = list(pnames)
_prune_constant_params(pnames, range_min, range_max)
pnames = pnames[0:2]
files = []
if len(pnames) == 2:
params = {pnames[0]: [range_min[pnames[0]], range_max[pnames[0]]],
pnames[1]: [range_min[pnames[1]], range_max[pnames[1]]]}
out_filename = plot_swarm(output_file, pos, vel, rfac, params, title=title, cmap=cmap, canvas=canvas)
files.append(out_filename)
else:
logging.warning("model space must be two-dimensional and non-degenerate.")
return files
def plot_swarm(filename, pos, vel, rfac, params, title=None, cmap=None, canvas=None):
"""
plot a two-dimensional particle swarm population.
this is a sub-function of render_swarm() containing just the plotting commands.
the graphics file format can be changed by providing a specific canvas. default is PNG.
this function requires the matplotlib module.
if it is not available, the function raises an error.
@param filename: path and base name of the output file without extension.
a generation index and the file extension according to the file format are appended.
@param pos: structured ndarray containing the positions of the particles.
@param vel: structured ndarray containing the velocities of the particles.
@param rfac: structured ndarray containing positions and R-factor values.
this array is independent of pos and vel.
it can also be set to None if results should be suppressed.
@param params: dictionary of two parameters to be plotted.
the keys correspond to columns of the pos, vel and rfac arrays.
the values are lists [minimum, maximum] that define the axis range.
@param title: (str) title of the chart.
the title is a {}-style format string, where {base} is the output file name and {gen} is the generation.
default: derived from file name.
@param cmap: (str) name of colour map supported by matplotlib.
default is 'plasma'.
other good-looking options are 'viridis', 'plasma', 'inferno', 'magma', 'cividis'.
@param canvas: a FigureCanvas class reference from a matplotlib backend.
if None, the default FigureCanvasAgg is used which produces a bitmap file in PNG format.
some other options are:
matplotlib.backends.backend_pdf.FigureCanvasPdf or
matplotlib.backends.backend_svg.FigureCanvasSVG.
@return (str) path and name of the generated graphics file.
empty string if an error occurred.
@raise TypeError if matplotlib is not available.
"""
if canvas is None:
canvas = FigureCanvas
if cmap is None:
cmap = 'plasma'
if title is None:
title = 'swarm map'
pnames = list(params.keys())
fig = Figure()
canvas(fig)
ax = fig.add_subplot(111)
if rfac is not None:
try:
s = ax.scatter(rfac[params[0]], rfac[params[1]], s=5, c=rfac['_rfac'], cmap=cmap, vmin=0, vmax=1)
except ValueError:
# _rfac column missing
pass
else:
cb = ax.figure.colorbar(s, ax=ax)
cb.ax.set_ylabel("R-factor", rotation=-90, va="bottom")
p = ax.plot(pos[pnames[0]], pos[pnames[1]], 'co')
q = ax.quiver(pos[pnames[0]], pos[pnames[1]], vel[pnames[0]], vel[pnames[1]], color='c')
ax.set_xlim(params[pnames[0]])
ax.set_ylim(params[pnames[1]])
ax.set_xlabel(pnames[0])
ax.set_ylabel(pnames[1])
ax.set_title(title)
out_filename = "{base}.{ext}".format(base=filename, ext=canvas.get_default_filetype())
fig.savefig(out_filename)
return out_filename

View File

@ -7,16 +7,13 @@ interface and implementation are subject to change.
@author Matthias Muntwiler, matthias.muntwiler@psi.ch
@copyright (c) 2018 by Paul Scherrer Institut @n
@copyright (c) 2018-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import logging
import math
import numpy as np
@ -135,9 +132,8 @@ def render_ea_scan(filename, data, scan_mode, canvas=None, is_modf=False):
im.set_cmap("RdBu_r")
dhi = max(abs(dlo), abs(dhi))
dlo = -dhi
im.set_clim((dlo, dhi))
im.set_clim((-1., 1.))
try:
# requires matplotlib 2.1.0
ti = cb.get_ticks()
ti = [min(ti), 0., max(ti)]
cb.set_ticks(ti)
@ -213,9 +209,8 @@ def render_tp_scan(filename, data, canvas=None, is_modf=False):
# im.set_cmap("coolwarm")
dhi = max(abs(dlo), abs(dhi))
dlo = -dhi
pc.set_clim((dlo, dhi))
pc.set_clim((-1., 1.))
try:
# requires matplotlib 2.1.0
ti = cb.get_ticks()
ti = [min(ti), 0., max(ti)]
cb.set_ticks(ti)
@ -226,9 +221,12 @@ def render_tp_scan(filename, data, canvas=None, is_modf=False):
# im.set_cmap("inferno")
# im.set_cmap("viridis")
pc.set_clim((dlo, dhi))
ti = cb.get_ticks()
ti = [min(ti), max(ti)]
cb.set_ticks(ti)
try:
ti = cb.get_ticks()
ti = [min(ti), max(ti)]
cb.set_ticks(ti)
except AttributeError:
pass
out_filename = "{0}.{1}".format(filename, canvas.get_default_filetype())
fig.savefig(out_filename)

View File

@ -40,23 +40,20 @@ the scan and domain handlers call methods of the project class to invoke project
@author Matthias Muntwiler, matthias.muntwiler@psi.ch
@copyright (c) 2015-18 by Paul Scherrer Institut @n
@copyright (c) 2015-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import datetime
from functools import reduce
import logging
import math
import numpy as np
import os
from pathlib import Path
from pmsco.compat import open
import pmsco.data as md
@ -377,7 +374,7 @@ class SingleModelHandler(ModelHandler):
keys = [key for key in self.result]
keys.sort(key=lambda t: t[0].lower())
vals = (str(self.result[key]) for key in keys)
filename = self._project.output_file + ".dat"
filename = Path(self._project.output_file).with_suffix(".dat")
with open(filename, "w") as outfile:
outfile.write("# ")
outfile.write(" ".join(keys))
@ -437,11 +434,11 @@ class ScanHandler(TaskHandler):
if project.combined_scan is not None:
ext = md.format_extension(project.combined_scan)
filename = project.output_file + ext
filename = Path(project.output_file).with_suffix(ext)
md.save_data(filename, project.combined_scan)
if project.combined_modf is not None:
ext = md.format_extension(project.combined_modf)
filename = project.output_file + ".modf" + ext
filename = Path(project.output_file).with_suffix(".modf" + ext)
md.save_data(filename, project.combined_modf)
return len(self._project.scans)
@ -695,7 +692,7 @@ class EmitterHandler(TaskHandler):
the estimate is based on the start parameters, scan 0 and domain 0.
"""
super(EmitterHandler, self).setup(project, slots)
mock_model = self._project.create_model_space().start
mock_model = self._project.model_space.start
mock_index = dispatch.CalcID(-1, 0, 0, -1, -1)
n_emitters = project.cluster_generator.count_emitters(mock_model, mock_index)
return n_emitters

View File

@ -304,7 +304,7 @@ class GridSearchHandler(handlers.ModelHandler):
super(GridSearchHandler, self).setup(project, slots)
self._pop = GridPopulation()
self._pop.setup(self._project.create_model_space())
self._pop.setup(self._project.model_space)
self._invalid_limit = max(slots, self._invalid_limit)
self._outfile = open(self._project.output_file + ".dat", "w")

View File

@ -554,7 +554,7 @@ class Population(object):
however, the patch is applied only upon the next execution of advance_population().
an info or warning message is printed to the log
depending on whether the filed contained a complete dataset or not.
depending on whether the file contained a complete dataset or not.
@attention patching a live population is a potentially dangerous operation.
it may cause an optimization to abort because of an error in the file.
@ -1209,7 +1209,7 @@ class PopulationHandler(handlers.ModelHandler):
return self._pop_size
def setup_population(self):
self._pop.setup(self._pop_size, self._project.create_model_space(), **self._project.optimizer_params)
self._pop.setup(self._pop_size, self._project.model_space, **self._project.optimizer_params)
def cleanup(self):
super(PopulationHandler, self).cleanup()

View File

@ -6,12 +6,12 @@ PEARL Multiple-Scattering Calculation and Structural Optimization
this is the top-level interface of the PMSCO package.
all calculations (any mode, any project) start by calling the run_project() function of this module.
the module also provides a command line parser for common options.
the module also provides a command line and a run-file/run-dict interface.
for parallel execution, prefix the command line with mpi_exec -np NN, where NN is the number of processes to use.
note that in parallel mode, one process takes the role of the coordinator (master).
the master does not run calculations and is idle most of the time.
to benefit from parallel execution on a work station, NN should be the number of processors plus one.
to benefit from parallel execution on a work station, NN should be the number of processors.
on a cluster, the number of processes is chosen according to the available resources.
all calculations can also be run in a single process.
@ -25,26 +25,35 @@ refer to the projects folder for examples.
@author Matthias Muntwiler, matthias.muntwiler@psi.ch
@copyright (c) 2015-18 by Paul Scherrer Institut @n
@copyright (c) 2015-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import argparse
from builtins import range
import datetime
import logging
import importlib
import os.path
import commentjson as json
from pathlib import Path
import sys
from mpi4py import MPI
try:
from mpi4py import MPI
mpi_comm = MPI.COMM_WORLD
mpi_size = mpi_comm.Get_size()
mpi_rank = mpi_comm.Get_rank()
except ImportError:
MPI = None
mpi_comm = None
mpi_size = 1
mpi_rank = 0
pmsco_root = Path(__file__).resolve().parent.parent
if str(pmsco_root) not in sys.path:
sys.path.insert(0, str(pmsco_root))
import pmsco.dispatch as dispatch
import pmsco.files as files
@ -71,40 +80,36 @@ def setup_logging(enable=False, filename="pmsco.log", level="WARNING"):
@param enable: (bool) True=enable logging to the specified file,
False=do not generate a log (null handler).
@param filename: (string) path and name of the log file.
@param filename: (Path-like) path and name of the log file.
if this process is part of an MPI communicator,
the function inserts a dot and the MPI rank of this process before the extension.
if the filename is empty, logging is disabled.
@param level: (string) name of the log level.
must be the name of one of "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL".
if empty or invalid, the function raises a ValueError.
if empty, logging is disabled.
if not a valid level, defaults to "WARNING".
@return None
"""
numeric_level = getattr(logging, level.upper(), None)
if not isinstance(numeric_level, int):
raise ValueError('Invalid log level: %s' % level)
logger = logging.getLogger("")
logger.setLevel(numeric_level)
logformat = '%(asctime)s (%(name)s) %(levelname)s: %(message)s'
formatter = logging.Formatter(logformat)
enable = enable and str(filename) and level
numeric_level = getattr(logging, level.upper(), logging.WARNING)
root_logger = logging.getLogger()
root_logger.setLevel(numeric_level)
if enable:
mpi_comm = MPI.COMM_WORLD
mpi_size = mpi_comm.Get_size()
if mpi_size > 1:
mpi_rank = mpi_comm.Get_rank()
root, ext = os.path.splitext(filename)
filename = root + "." + str(mpi_rank) + ext
p = Path(filename)
filename = p.with_suffix(f".{mpi_rank}" + p.suffix)
log_format = '%(asctime)s (%(name)s) %(levelname)s: %(message)s'
formatter = logging.Formatter(log_format)
handler = logging.FileHandler(filename, mode="w", delay=True)
handler.setLevel(numeric_level)
handler.setFormatter(formatter)
else:
handler = logging.NullHandler()
logger.addHandler(handler)
root_logger.addHandler(handler)
def set_common_args(project, args):
@ -124,67 +129,58 @@ def set_common_args(project, args):
@return: None
"""
log_file = "pmsco.log"
if args.data_dir:
project.data_dir = args.data_dir
if args.output_file:
project.set_output(args.output_file)
log_file = args.output_file + ".log"
project.output_file = args.output_file
if args.db_file:
project.db_file = args.db_file
if args.log_file:
log_file = args.log_file
setup_logging(enable=args.log_enable, filename=log_file, level=args.log_level)
logger.debug("creating project")
mode = args.mode.lower()
if mode in {'single', 'grid', 'swarm', 'genetic', 'table'}:
project.mode = mode
else:
logger.error("invalid optimization mode '%s'.", mode)
if args.pop_size:
project.optimizer_params['pop_size'] = args.pop_size
if args.seed_file:
project.optimizer_params['seed_file'] = args.seed_file
if args.seed_limit:
project.optimizer_params['seed_limit'] = args.seed_limit
if args.table_file:
project.optimizer_params['table_file'] = args.table_file
project.log_file = args.log_file
if args.log_level:
project.log_level = args.log_level
if not args.log_enable:
project.log_file = ""
project.log_level = ""
if args.mode:
project.mode = args.mode.lower()
if args.time_limit:
project.set_timedelta_limit(datetime.timedelta(hours=args.time_limit))
project.time_limit = args.time_limit
if args.keep_files:
if "all" in args.keep_files:
cats = set([])
else:
cats = files.FILE_CATEGORIES - set(args.keep_files)
cats -= {'report'}
if mode == 'single':
cats -= {'model'}
project.files.categories_to_delete = cats
if args.keep_levels > project.keep_levels:
project.keep_levels = args.keep_levels
if args.keep_best > project.keep_best:
project.keep_best = args.keep_best
project.keep_files = args.keep_files
if args.keep_levels:
project.keep_levels = max(args.keep_levels, project.keep_levels)
if args.keep_best:
project.keep_best = max(args.keep_best, project.keep_best)
def run_project(project):
"""
run a calculation project.
@param project:
@return:
the function sets up logging, validates the project, chooses the handler classes,
and passes control to the pmsco.dispatch module to run the calculations.
@param project: fully initialized project object.
the validate method is called as part of this function after setting up the logger.
@return: None
"""
# log project arguments only in rank 0
mpi_comm = MPI.COMM_WORLD
mpi_rank = mpi_comm.Get_rank()
log_file = Path(project.log_file)
if not log_file.name:
log_file = Path(project.job_name).with_suffix(".log")
if log_file.name:
log_file.parent.mkdir(exist_ok=True)
log_level = project.log_level
else:
log_level = ""
setup_logging(enable=bool(log_level), filename=log_file, level=log_level)
if mpi_rank == 0:
project.log_project_args()
project.validate()
optimizer_class = None
if project.mode == 'single':
optimizer_class = handlers.SingleModelHandler
@ -221,6 +217,34 @@ def run_project(project):
logger.error("undefined project, optimizer, or calculator.")
def schedule_project(project, run_dict):
"""
schedule a calculation project.
the function validates the project and submits a job to the scheduler.
@param project: fully initialized project object.
the validate method is called as part of this function.
@param run_dict: dictionary holding the contents of the run file.
@return: None
"""
assert mpi_rank == 0
setup_logging(enable=False)
project.validate()
schedule_dict = run_dict['schedule']
module = importlib.import_module(schedule_dict['__module__'])
schedule_class = getattr(module, schedule_dict['__class__'])
schedule = schedule_class(project)
schedule.set_properties(module, schedule_dict, project)
schedule.run_dict = run_dict
schedule.validate()
schedule.submit()
class Args(object):
"""
arguments of the main function.
@ -233,7 +257,7 @@ class Args(object):
values as the command line parser.
"""
def __init__(self, mode="single", output_file="pmsco_data"):
def __init__(self):
"""
constructor.
@ -242,12 +266,8 @@ class Args(object):
other parameters may be required depending on the project
and/or the calculation mode.
"""
self.mode = mode
self.pop_size = 0
self.seed_file = ""
self.seed_limit = 0
self.data_dir = ""
self.output_file = output_file
self.output_file = ""
self.db_file = ""
self.time_limit = 24.0
self.keep_files = files.FILE_CATEGORIES_TO_KEEP
@ -256,13 +276,9 @@ class Args(object):
self.log_level = "WARNING"
self.log_file = ""
self.log_enable = True
self.table_file = ""
def get_cli_parser(default_args=None):
if not default_args:
default_args = Args()
def get_cli_parser():
KEEP_FILES_CHOICES = files.FILE_CATEGORIES | {'all'}
parser = argparse.ArgumentParser(
@ -290,56 +306,45 @@ def get_cli_parser(default_args=None):
# for simplicity, the parser does not check these requirements.
# all parameters are optional and accepted regardless of mode.
# errors may occur if implicit requirements are not met.
parser.add_argument('project_module',
parser.add_argument('project_module', nargs='?',
help="path to custom module that defines the calculation project")
parser.add_argument('-m', '--mode', default=default_args.mode,
parser.add_argument('-r', '--run-file',
help="path to run-time parameters file which contains all program arguments. " +
"must be in JSON format.")
parser.add_argument('-m', '--mode',
choices=['single', 'grid', 'swarm', 'genetic', 'table'],
help='calculation mode')
parser.add_argument('--pop-size', type=int, default=default_args.pop_size,
help='population size (number of particles) in swarm or genetic optimization mode. ' +
'default is the greater of 4 or the number of calculation processes.')
parser.add_argument('--seed-file',
help='path and name of population seed file. ' +
'population data of previous optimizations can be used to seed a new optimization. ' +
'the file must have the same structure as the .pop or .dat files.')
parser.add_argument('--seed-limit', type=int, default=default_args.seed_limit,
help='maximum number of models to use from the seed file. ' +
'the models with the best R-factors are selected.')
parser.add_argument('-d', '--data-dir', default=default_args.data_dir,
parser.add_argument('-d', '--data-dir',
help='directory path for experimental data files (if required by project). ' +
'default: working directory')
parser.add_argument('-o', '--output-file', default=default_args.output_file,
parser.add_argument('-o', '--output-file',
help='base path for intermediate and output files.')
parser.add_argument('-b', '--db-file', default=default_args.db_file,
parser.add_argument('-b', '--db-file',
help='name of an sqlite3 database file where the results should be stored.')
parser.add_argument('--table-file',
help='path and name of population table file for table optimization mode. ' +
'the file must have the same structure as the .pop or .dat files.')
parser.add_argument('-k', '--keep-files', nargs='*', default=default_args.keep_files,
parser.add_argument('-k', '--keep-files', nargs='*',
choices=KEEP_FILES_CHOICES,
help='output file categories to keep after the calculation. '
'by default, cluster and model (simulated data) '
'of a limited number of best models are kept.')
parser.add_argument('--keep-best', type=int, default=default_args.keep_best,
parser.add_argument('--keep-best', type=int,
help='number of best models for which to keep result files '
'(at each node from root down to keep-levels).')
parser.add_argument('--keep-levels', type=int, choices=range(5),
default=default_args.keep_levels,
help='task level down to which result files of best models are kept. '
'0 = model, 1 = scan, 2 = domain, 3 = emitter, 4 = region.')
parser.add_argument('-t', '--time-limit', type=float, default=default_args.time_limit,
parser.add_argument('-t', '--time-limit', type=float,
help='wall time limit in hours. the optimizers try to finish before the limit.')
parser.add_argument('--log-file', default=default_args.log_file,
parser.add_argument('--log-file',
help='name of the main log file. ' +
'under MPI, the rank of the process is inserted before the extension.')
parser.add_argument('--log-level', default=default_args.log_level,
parser.add_argument('--log-level',
help='minimum level of log messages. DEBUG, INFO, WARNING, ERROR, CRITICAL.')
feature_parser = parser.add_mutually_exclusive_group(required=False)
feature_parser.add_argument('--log-enable', dest='log_enable', action="store_true",
help="enable logging. by default, logging is on.")
feature_parser.add_argument('--log-disable', dest='log_enable', action='store_false',
help="disable logging. by default, logging is on.")
parser.set_defaults(log_enable=default_args.log_enable)
parser.set_defaults(log_enable=True)
return parser
@ -350,52 +355,135 @@ def parse_cli():
@return: Namespace object created by the argument parser.
"""
default_args = Args()
parser = get_cli_parser(default_args)
parser = get_cli_parser()
args, unknown_args = parser.parse_known_args()
return args, unknown_args
def import_project_module(path):
def import_module(module_name):
"""
import the custom project module.
import a custom module by name.
imports the project module given its file path.
the path is expanded to its absolute form and appended to the python path.
import a module given its file path or module name (like in an import statement).
@param path: path and name of the module to be loaded.
path is optional and defaults to the python path.
if the name includes an extension, it is stripped off.
preferably, the module name should be given as in an import statement.
as the top-level pmsco directory is on the python path,
the module name will begin with `projects` for a custom project module or `pmsco` for a core pmsco module.
in this case, the function just calls importlib.import_module.
if a file path is given, i.e., `module_name` links to an existing file and has a `.py` extension,
the function extracts the directory path,
inserts it into the python path,
and calls importlib.import_module on the stem of the file name.
@note the file path remains in the python path.
this option should be used carefully to avoid breaking file name resolution.
@param module_name: file path or module name.
file path is interpreted relative to the working directory.
@return: the loaded module as a python object
"""
path, name = os.path.split(path)
name, __ = os.path.splitext(name)
path = os.path.abspath(path)
sys.path.append(path)
project_module = importlib.import_module(name)
return project_module
p = Path(module_name)
if p.is_file() and p.suffix == ".py":
path = p.parent.resolve()
module_name = p.stem
if path not in sys.path:
sys.path.insert(0, path)
module = importlib.import_module(module_name)
return module
def main_dict(run_params):
"""
main function with dictionary run-time parameters
this starts the whole process with all direct parameters.
the command line is not parsed.
no run-file is loaded (just the project module).
@param run_params: dictionary with the same structure as the JSON run-file.
@return: None
"""
project_params = run_params['project']
module = importlib.import_module(project_params['__module__'])
try:
project_class = getattr(module, project_params['__class__'])
except KeyError:
project = module.create_project()
else:
project = project_class()
project._module = module
project.directories['pmsco'] = Path(__file__).parent
project.directories['project'] = Path(module.__file__).parent
project.set_properties(module, project_params, project)
run_project(project)
def main():
"""
main function with command line parsing
this function starts the whole process with parameters from the command line.
if the command line contains a run-file parameter, it determines the module to load and the project parameters.
otherwise, the command line parameters apply.
the project class can be specified either in the run-file or the project module.
if the run-file specifies a class name, that class is looked up in the project module and instantiated.
otherwise, the module's create_project is called.
@return: None
"""
args, unknown_args = parse_cli()
if args:
module = import_project_module(args.project_module)
try:
project_args = module.parse_project_args(unknown_args)
except NameError:
project_args = None
try:
with open(args.run_file, 'r') as f:
rf = json.load(f)
except AttributeError:
rfp = {'__module__': args.project_module}
else:
rfp = rf['project']
module = import_module(rfp['__module__'])
try:
project_args = module.parse_project_args(unknown_args)
except AttributeError:
project_args = None
try:
project_class = getattr(module, rfp['__class__'])
except (AttributeError, KeyError):
project = module.create_project()
set_common_args(project, args)
try:
module.set_project_args(project, project_args)
except NameError:
pass
else:
project = project_class()
project_args = None
project._module = module
project.directories['pmsco'] = Path(__file__).parent
project.directories['project'] = Path(module.__file__).parent
project.set_properties(module, rfp, project)
set_common_args(project, args)
try:
if project_args:
module.set_project_args(project, project_args)
except AttributeError:
pass
try:
schedule_enabled = rf['schedule']['enabled']
except KeyError:
schedule_enabled = False
if schedule_enabled:
schedule_project(project, rf)
else:
run_project(project)

View File

@ -19,36 +19,32 @@ the ModelSpace and CalculatorParams classes are typically used unchanged.
@author Matthias Muntwiler, matthias.muntwiler@psi.ch
@copyright (c) 2015 by Paul Scherrer Institut @n
@copyright (c) 2015-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import collections
import copy
import datetime
import git
import logging
import numpy as np
import os.path
from pathlib import Path
import socket
import sys
from pmsco.calculators.calculator import InternalAtomicCalculator
from pmsco.calculators.edac import EdacCalculator
import pmsco.cluster as mc
import pmsco.cluster
import pmsco.config as config
from pmsco.compat import open
import pmsco.data as md
import pmsco.database as database
import pmsco.dispatch as dispatch
import pmsco.files as files
import pmsco.handlers as handlers
import pmsco.database
import pmsco.dispatch
import pmsco.files
import pmsco.handlers
from pmsco.helpers import BraceMessage as BMsg
logger = logging.getLogger(__name__)
@ -157,6 +153,34 @@ class ModelSpace(object):
"""
return ParamSpace(self.start[name], self.min[name], self.max[name], self.step[name])
def set_param_dict(self, d):
"""
initialize model space from dictionary.
@param d: dictionary with two levels:
the top level are parameter names,
the second level the space descriptors 'start', 'min', 'max', 'step' and 'width'.
see add_param() for possible combinations.
@return: None
"""
self.__init__()
for k, v in d.items():
self.add_param(k, **v)
def get_param_dict(self):
"""
return model space parameters in dictionary form
the top level are parameter names,
the second level the space descriptors 'start', 'min', 'max' and 'step'.
@return: dict
"""
d = {}
for name in self.start:
d[name] = {self.start[name], self.min[name], self.max[name], self.step[name]}
return d
class CalculatorParams(object):
"""
@ -568,9 +592,166 @@ class Scan(object):
self.raw_data[dim] = grid[i].reshape(-1)
self.raw_data['i'] = 1
def load(self):
return self
class ScanKey(config.ConfigurableObject):
"""
create a Scan object based on a project-supplied dictionary
this class can be used in a run file to create a scan object based on the scan_dict attribute of the project.
this may be convenient if you're project should selectively use scans out of a long list of data files
and you don't want to clutter up the run file with parameters that don't change.
to do so, set the key property to match an item of scan_dict.
the load method will look up the corresponding scan_dict item and construct the final Scan object.
"""
def __init__(self, project=None):
super().__init__()
self.key = ""
self.project = project
def load(self, dirs=None):
"""
load the selected scan as specified in the project's scan dictionary
the method uses ScanLoader or ScanCreator as an intermediate.
@return a new Scan object which contains the loaded data.
"""
scan_spec = self.project.scan_dict[self.key]
if hasattr(scan_spec, 'positions'):
loader = ScanCreator()
else:
loader = ScanLoader()
for k, v in scan_spec.items():
setattr(loader, k, v)
scan = loader.load(dirs=dirs)
return scan
class ScanLoader(config.ConfigurableObject):
"""
create a Scan object from a data file reference
this class can be used in a run file to create a scan object from an experimental data file.
to do so, fill the properties with values as documented.
the load() method is called when the project is run.
"""
## @var filename (string)
# file name from which the scan should be loaded.
# the file name can contain a format specifier like {project} to include the base path.
## @var emitter (string)
# chemical symbol and, optionally following, further specification (chemical state, environment, ...)
# of photo-emitting atoms.
# the interpretation of this string is up to the project and its cluster generator.
# it should, however, always start with a chemical element symbol.
#
# examples: 'Ca' (calcium), 'CA' (carbon A), 'C a' (carbon a), 'C 1' (carbon one), 'N=O', 'FeIII'.
## @var initial_state (string)
# nl term of initial state
#
# in the form expected by EDAC, for example: '2p1/2'
## @var is_modf (bool)
# declares whether the data file contains the modulation function rather than intensity values
#
# if false, the project will calculate a modulation function from the raw data
def __init__(self):
super().__init__()
self.filename = ""
self.emitter = ""
self.initial_state = "1s"
self.is_modf = False
def load(self, dirs=None):
"""
load the scan according to specification
create a new Scan object and load the file by calling Scan.import_scan_file().
@return a new Scan object which contains the loaded data file.
"""
scan = Scan()
filename = config.resolve_path(self.filename, dirs)
scan.import_scan_file(filename, self.emitter, self.initial_state)
if self.is_modf:
scan.modulation = scan.raw_data
return scan
class ScanCreator(config.ConfigurableObject):
"""
create a Scan object from string expressions
this class can be used in a run file to create a scan object from python expressions,
such as lists, ranges or numpy functions.
to do so, fill the properties with values as documented.
the load() method is called when the project is run.
@note the raw_data property of the scan cannot be filled this way.
thus, the class is useful in `single` calculation mode only.
"""
## @var filename (string)
# name of the file which should receive the scan data.
# the file name can contain a format specifier like {project} to include the base path.
## @var positions (dict)
# dictionary specifying the scan positions
#
# the dictionary must contain four keys: 'e', 't', 'p', 'a' representing the four scan axes.
# each key holds a string that contains a python expression.
# the string is evaluated using python's built-in eval() function.
# the expression must evaluate to an iterable object or numpy ndarray of the scan positions.
# the `np` namespace can be used to access numpy functions.
#
# example:
# the following dictionary generates a hemispherical scan
# self.position = {'e': '100', 't': 'np.linspace(0, 90, 91)', 'p': 'range(0, 360, 2)', 'a': '0'}
## @var emitter (string)
# chemical symbol and, optionally following, further specification (chemical state, environment, ...)
# of photo-emitting atoms.
# the interpretation of this string is up to the project and its cluster generator.
# it should, however, always start with a chemical element symbol.
#
# examples: 'Ca' (calcium), 'CA' (carbon A), 'C a' (carbon a), 'C 1' (carbon one), 'N=O', 'FeIII'.
## @var initial_state (string)
# nl term of initial state
#
# in the form expected by EDAC, for example: '2p1/2'
def __init__(self):
super().__init__()
self.filename = ""
self.positions = {'e': None, 't': None, 'p': None, 'a': None}
self.emitter = ""
self.initial_state = "1s"
def load(self, dirs=None):
"""
create the scan according to specification
@return a new Scan object which contains the created scan array.
"""
scan = Scan()
positions = {}
for axis in self.positions.keys():
positions[axis] = np.atleast_1d(np.asarray(eval(self.positions[axis])))
scan.define_scan(positions, self.emitter, self.initial_state)
scan.filename = config.resolve_path(self.filename, dirs)
return scan
# noinspection PyMethodMayBeStatic
class Project(object):
class Project(config.ConfigurableObject):
"""
base class of a calculation project.
@ -609,17 +790,18 @@ class Project(object):
#
## @var scans (list of Scan objects)
# list of experimental or scan files for which calculations are to be run.
# list of experimental scans for which calculations are to be run.
#
# the list must be populated by calling the add_scan() method.
# this should be done in the create_project() function, or through the command line arguments.
# during project initialization, this list must be populated with Scan, ScanLoader or ScanCreator objects.
# while Scan objects contain all scan data, the latter two classes contain only scan specifications
# which are expanded (i.e. files are loaded or arrays are calculated) just before the calculations start.
# the Project.add_scan() method is a short-cut to create the respective scan object from few arguments.
# before the calculation starts, all objects are converted into fully specified Scan objects
# and scan data is loaded or calculated.
#
# the modulation function is calculated internally.
# if your scan files contain the modulation function (as opposed to intensity),
# you must add the files in the create_project() function.
# the command line does not support loading modulation functions.
#
# @c scans must be considered read-only. use project methods to change it.
# there are two ways to fill this list:
# either the project code fills it as a part of its initialization (create_project),
# or the list is populated via the run-file.
## @var domains (list of arbitrary objects)
# list of domains for which calculations are to be run.
@ -661,28 +843,22 @@ class Project(object):
# set this argument to False only if the calculation is a continuation of a previous one
# without any changes to the code.
## @var data_dir
# directory path to experimental data.
## @var directories
# dictionary for various directory paths.
#
# the project should load experimental data (scan files) from this path.
# this attribute receives the --data-dir argument from the command line
# if the project parses the common arguments (pmsco.set_common_args).
#
# it is up to the project to define where to load scan files from.
# if the location of the files may depend on the machine or user account,
# the user may want to specify the data path on the command line.
## @var output_dir (string)
# directory path for data files produced during the calculation, including intermediate files.
# home: user's home directory.
# data: where to load experimental data (scan files) from.
# project: directory of the project module.
# output: where to write output and intermediate files.
# temp: for temporary files.
#
# output_dir and output_file are set at once by @ref set_output.
## @var output_file (string)
## @var output_file (Path)
# file name root for data files produced during the calculation, including intermediate files.
#
# the file name should include the path. the path must also be set in @ref output_dir.
#
# output_dir and output_file are set at once by @ref set_output.
# this is the concatenation of self.directories['output'] and self.job_name.
# assignment to this property will update the two basic attributes.
## @var db_file (string)
# name of an sqlite3 database file where the calculation results should be stored.
@ -694,14 +870,17 @@ class Project(object):
#
# the actual wall time may be longer by the remaining time of running calculations.
# running calculations will not be aborted.
#
# the time_limit property is an alternative representation as hours.
# reading and writing accesses timedelta_limit.
## @var combined_scan
# combined raw data from scans.
# updated by add_scan().
# updated by self.load_scans().
## @var combined_modf
# combined modulation function from scans.
# updated by add_scan().
# updated by self.load_scans().
## @var files
# list of all generated data files with metadata.
@ -741,14 +920,17 @@ class Project(object):
#
def __init__(self):
super().__init__()
self._module = None
self.mode = "single"
self.job_name = ""
self.job_name = "pmsco0"
self.job_tags = {}
self.git_hash = ""
self.description = ""
self.features = {}
self.cluster_format = mc.FMT_EDAC
self.cluster_generator = mc.LegacyClusterGenerator(self)
self.cluster_format = pmsco.cluster.FMT_EDAC
self.cluster_generator = pmsco.cluster.LegacyClusterGenerator(self)
self._model_space = None
self.scans = []
self.domains = []
self.optimizer_params = {
@ -758,39 +940,170 @@ class Project(object):
'recalc_seed': True,
'table_file': ""
}
self.data_dir = ""
self.output_dir = ""
self.output_file = "pmsco_data"
self.directories = {
"home": Path.home(),
"work": Path.cwd(),
"data": "",
"project": "",
"output": "",
"temp": ""}
self.log_file = ""
self.log_level = "WARNING"
self.db_file = ':memory:'
self.timedelta_limit = datetime.timedelta(days=1)
self.combined_scan = None
self.combined_modf = None
self.files = files.FileTracker()
self.files = pmsco.files.FileTracker()
self.keep_files = list(pmsco.files.FILE_CATEGORIES_TO_KEEP)
self.keep_levels = 1
self.keep_best = 10
self.handler_classes = {
'model': handlers.SingleModelHandler,
'scan': handlers.ScanHandler,
'domain': handlers.DomainHandler,
'emit': handlers.EmitterHandler,
'region': handlers.SingleRegionHandler
'model': pmsco.handlers.SingleModelHandler,
'scan': pmsco.handlers.ScanHandler,
'domain': pmsco.handlers.DomainHandler,
'emit': pmsco.handlers.EmitterHandler,
'region': pmsco.handlers.SingleRegionHandler
}
self.atomic_scattering_factory = InternalAtomicCalculator
self.multiple_scattering_factory = EdacCalculator
self._tasks_fields = []
self._db = database.ResultsDatabase()
self._db = pmsco.database.ResultsDatabase()
def validate(self):
"""
validate the project parameters before starting the calculations
the method checks and fixes attributes that may cause trouble or go unnoticed if they are wrong.
in addition, it fixes attributes which may be incomplete after loading a run-file.
failed critical checks raise an exception (AssertionError, AttributeError, KeyError, ValueError).
checks that cause an attribute do revert to default, are logged as warning.
the following attributes are fixed silently:
- scattering factories that are declared as string are looked up in the project module.
- place holders in the directories attribute are resolved.
- place holders in the output_file attribute are resolved.
- output_file and output_dir are made consistent (so that output_file includes output_dir).
- the create_model_space() method is called if the model_space attribute is undefined.
- scan data are loaded.
@note to check the syntax of a run-file, set the calculation mode to 'validate' and run pmsco.
this will pass the validate method but will stop execution before calculations are started.
@raise AssertionError if a parameter is not correct.
@raise AttributeError if a class name cannot be resolved.
"""
assert self.mode in {"single", "swarm", "genetic", "grid", "table", "test", "validate"}
if isinstance(self.atomic_scattering_factory, str):
self.atomic_scattering_factory = getattr(self._module, self.atomic_scattering_factory)
if isinstance(self.multiple_scattering_factory, str):
self.multiple_scattering_factory = getattr(self._module, self.multiple_scattering_factory)
self.directories = {k: config.resolve_path(Path(v), self.directories) for k, v in self.directories.items()}
assert len(str(self.output_file))
d = config.resolve_path(self.directories['output'], self.directories)
f = config.resolve_path(self.output_file, self.directories)
self.output_file = Path(d, f)
self.directories['output'] = self.output_file.parent
if self._model_space is None or not self._model_space.start:
logger.warning("undefined model_space attribute, trying project's create_model_space")
self._model_space = self.create_model_space()
self.load_scans()
@property
def data_dir(self):
return self.directories['data']
@data_dir.setter
def data_dir(self, path):
self.directories['data'] = Path(path)
@property
def output_dir(self):
return self.directories['output']
@output_dir.setter
def output_dir(self, path):
self.directories['output'] = Path(path)
@property
def output_file(self):
return Path(self.directories['output'], self.job_name)
@output_file.setter
def output_file(self, filename):
"""
set path and base name of output file.
path is copied to the output_dir attribute.
the file stem is copied to the job_name attribute.
@param filename: (PathLike)
"""
p = Path(filename)
s = str(p.parent)
if s and s != ".":
self.directories['output'] = p.parent
s = str(p.stem)
if s:
self.job_name = s
else:
raise ValueError("invalid output file name")
@property
def time_limit(self):
return self.timedelta_limit.total_seconds() / 3600 / 24
@time_limit.setter
def time_limit(self, hours):
self.timedelta_limit = datetime.timedelta(hours=hours)
def create_model_space(self):
"""
create a project.ModelSpace object which defines the allowed range for model parameters.
this method must be implemented by the actual project class.
the ModelSpace object must declare all model parameters used in the project.
there are three ways for a project to declare the model space:
1. implement the @ref create_model_space method.
this is the older way and may become deprecated in a future version.
2. assign a ModelSpace to the self.model_space property directly
(in the @ref validate method).
3. declare the model space in the run-file.
this method is called by the validate method only if self._model_space is undefined.
@return ModelSpace object
"""
return None
@property
def model_space(self):
"""
ModelSpace object that defines the allowed range for model parameters.
there are three ways for a project to declare the model space:
1. implement the @ref create_model_space method.
this is the older way and may become deprecated in a future version.
2. assign a ModelSpace to the self.model_space property directly
(in the @ref validate method).
3. declare the model space in the run-file.
initially, this property is None.
"""
return self._model_space
@model_space.setter
def model_space(self, value):
if isinstance(value, ModelSpace):
self._model_space = value
elif hasattr(value, 'items'):
self._model_space = ModelSpace()
self._model_space.set_param_dict(value)
else:
raise ValueError("incompatible object type")
def create_params(self, model, index):
"""
create a CalculatorParams object given the model parameters and calculation index.
@ -816,11 +1129,15 @@ class Project(object):
self.combined_scan = None
self.combined_modf = None
def add_scan(self, filename, emitter, initial_state, is_modf=False, modf_model=None, positions=None):
def add_scan(self, filename, emitter, initial_state, is_modf=False, positions=None):
"""
add the file name of reference experiment and load it.
the extension must be one of msc_data.DATATYPES (case insensitive)
add a scan specification to the scans list.
this is a shortcut for adding a ScanCreator or ScanLoader object to the self.scans list.
the creator or loader are converted into full Scan objects just before the calculation starts
(in the self.setup() method).
the extension must be one of pmsco.data.DATATYPES (case insensitive)
corresponding to the meaning of the columns in the file.
caution: EDAC can only calculate equidistant, rectangular scans.
@ -831,9 +1148,6 @@ class Project(object):
* intensity vs theta, phi, or alpha
* intensity vs theta and phi (hemisphere or hologram scan)
the method calculates the modulation function if @c is_modf is @c False.
it also updates @c combined_scan and @c combined_modf which may be used as R-factor comparison targets.
@param filename: (string) file name of the experimental data, possibly including a path.
the file is not loaded when the optional positions argument is present,
but the filename may serve as basename for output files (e.g. modulation function).
@ -852,57 +1166,64 @@ class Project(object):
@param is_modf: (bool) declares whether the file contains the modulation function (True),
or intensity (False, default). In the latter case, the modulation function is calculated internally.
@param modf_model: (dict) model parameters to be passed to the modulation function.
@return (Scan) the new scan object (which is also a member of self.scans).
"""
scan = Scan()
if positions is not None:
scan.define_scan(positions, emitter, initial_state)
scan.filename = filename
scan = ScanCreator()
scan.positions = positions
else:
scan.import_scan_file(filename, emitter, initial_state)
scan = ScanLoader()
scan.is_modf = is_modf
scan.filename = filename
scan.emitter = emitter
scan.initial_state = initial_state
self.scans.append(scan)
if modf_model is None:
modf_model = {}
return scan
if scan.raw_data is not None:
if is_modf:
scan.modulation = scan.raw_data
else:
def load_scans(self):
"""
load all scan data.
initially, the self.scans list may contain objects of different classes (Scan, ScanLoader, ScanCreator)
depending on the project initialization.
this method loads all data, so that the scans list contains only Scan objects.
also, the self.combined_scan and self.combined_modf fields are calculated from the scans.
"""
has_raw_data = True
has_mod_func = True
loaded_scans = []
for idx, scan in enumerate(self.scans):
scan = scan.load(dirs=self.directories)
loaded_scans.append(scan)
if scan.modulation is None:
try:
scan.modulation = self.calc_modulation(scan.raw_data, modf_model)
scan.modulation = self.calc_modulation(scan.raw_data, self.model_space.start)
except ValueError:
logger.error("error calculating the modulation function of experimental data.")
scan.modulation = None
else:
scan.modulation = None
logger.error(f"error calculating the modulation function of scan {idx}.")
has_raw_data = has_raw_data and scan.raw_data is not None
has_mod_func = has_mod_func and scan.modulation is not None
self.scans = loaded_scans
if scan.raw_data is not None:
if self.combined_scan is not None:
dt = md.common_dtype((self.combined_scan, scan.raw_data))
d1 = md.restructure_data(self.combined_scan, dt)
d2 = md.restructure_data(scan.raw_data, dt)
self.combined_scan = np.hstack((d1, d2))
else:
self.combined_scan = scan.raw_data.copy()
if has_raw_data:
stack1 = [scan.raw_data for scan in self.scans]
dtype = md.common_dtype(stack1)
stack2 = [md.restructure_data(data, dtype) for data in stack1]
self.combined_scan = np.hstack(tuple(stack2))
else:
self.combined_scan = None
if scan.modulation is not None:
if self.combined_modf is not None:
dt = md.common_dtype((self.combined_modf, scan.modulation))
d1 = md.restructure_data(self.combined_modf, dt)
d2 = md.restructure_data(scan.modulation, dt)
self.combined_modf = np.hstack((d1, d2))
else:
self.combined_modf = scan.modulation.copy()
if has_mod_func:
stack1 = [scan.modulation for scan in self.scans]
dtype = md.common_dtype(stack1)
stack2 = [md.restructure_data(data, dtype) for data in stack1]
self.combined_modf = np.hstack(tuple(stack2))
else:
self.combined_modf = None
return scan
def clear_domains(self):
"""
clear domains.
@ -933,42 +1254,6 @@ class Project(object):
"""
self.domains.append(domain)
def set_output(self, filename):
"""
set path and base name of output file.
path and name are copied to the output_file attribute.
path is copied to the output_dir attribute.
if the path is missing, the destination is the current working directory.
"""
self.output_file = filename
path, name = os.path.split(filename)
self.output_dir = path
self.job_name = name
def set_timedelta_limit(self, timedelta, margin_minutes=10):
"""
set the walltime limit with a safety margin.
this method sets the internal self.timedelta_limit attribute.
by default, a safety margin of 10 minutes is subtracted to the main argument
in order to increase the probability that the process ends in time.
if this is not wanted, the project class may override the method and provide its own margin.
the method is typically called with the command line time limit from the main module.
@note the safety margin could be applied at various levels.
it is done here because it can easily be overridden by the project subclass.
to keep run scripts simple, the command line can be given the same time limit
as the job scheduler of the computing cluster.
@param timedelta: (datetime.timedelta) max. duration of the calculation process (wall time).
@param margin_minutes: (int) safety margin in minutes to subtract from timedelta.
"""
self.timedelta_limit = timedelta - datetime.timedelta(minutes=margin_minutes)
def log_project_args(self):
"""
send some common project attributes to the log.
@ -981,6 +1266,14 @@ class Project(object):
@return: None
"""
try:
for key in self.directories:
val = self.directories[key]
lev = logging.WARNING if val else logging.DEBUG
logger.log(lev, f"directories['{key}']: {val}")
logger.warning("output file: {0}".format(self.output_file))
logger.warning("database: {0}".format(self.db_file))
logger.warning("atomic scattering: {0}".format(self.atomic_scattering_factory))
logger.warning("multiple scattering: {0}".format(self.multiple_scattering_factory))
logger.warning("optimization mode: {0}".format(self.mode))
@ -990,15 +1283,11 @@ class Project(object):
lev = logging.WARNING if val else logging.DEBUG
logger.log(lev, "optimizer_params['{k}']: {v}".format(k=key, v=val))
logger.warning("data directory: {0}".format(self.data_dir))
logger.warning("output file: {0}".format(self.output_file))
logger.warning("database: {0}".format(self.db_file))
_files_to_keep = files.FILE_CATEGORIES - self.files.categories_to_delete
_files_to_keep = pmsco.files.FILE_CATEGORIES - self.files.categories_to_delete
logger.warning("intermediate files to keep: {0}".format(", ".join(_files_to_keep)))
for idx, scan in enumerate(self.scans):
logger.warning(f"scan {idx}: {scan.filename} ({scan.emitter} {scan.initial_state}")
logger.warning(f"scan {idx}: {scan.filename} ({scan.emitter} {scan.initial_state})")
for idx, dom in enumerate(self.domains):
logger.warning(f"domain {idx}: {dom}")
@ -1247,16 +1536,26 @@ class Project(object):
"""
self.git_hash = self.get_git_hash()
fields = ["rfac"]
fields.extend(dispatch.CalcID._fields)
fields.extend(pmsco.dispatch.CalcID._fields)
fields.append("secs")
fields = ["_" + f for f in fields]
mspace = self.create_model_space()
model_fields = list(mspace.start.keys())
model_fields = list(self.model_space.start.keys())
model_fields.sort(key=lambda name: name.lower())
fields.extend(model_fields)
self._tasks_fields = fields
with open(self.output_file + ".tasks.dat", "w") as outfile:
if 'all' in self.keep_files:
cats = set([])
else:
cats = pmsco.files.FILE_CATEGORIES - set(self.keep_files)
cats -= {'report'}
if self.mode == 'single':
cats -= {'model'}
self.files.categories_to_delete = cats
Path(self.output_file).parent.mkdir(parents=True, exist_ok=True)
tasks_file = Path(self.output_file).with_suffix(".tasks.dat")
with open(tasks_file, "w") as outfile:
outfile.write("# ")
outfile.write(" ".join(fields))
outfile.write("\n")
@ -1311,7 +1610,8 @@ class Project(object):
values_dict['_rfac'] = parent_task.rfac
values_dict['_secs'] = parent_task.time.total_seconds()
values_list = [values_dict[field] for field in self._tasks_fields]
with open(self.output_file + ".tasks.dat", "a") as outfile:
tasks_file = Path(self.output_file).with_suffix(".tasks.dat")
with open(tasks_file, "a") as outfile:
outfile.write(" ".join(format(value) for value in values_list) + "\n")
db_id = self._db.insert_result(parent_task.id, values_dict)
@ -1548,11 +1848,11 @@ class Project(object):
"""
_files = {}
xyz_filename = filename + ".xyz"
cluster.save_to_file(xyz_filename, fmt=mc.FMT_XYZ)
cluster.save_to_file(xyz_filename, fmt=pmsco.cluster.FMT_XYZ)
_files[xyz_filename] = 'cluster'
xyz_filename = filename + ".emit.xyz"
cluster.save_to_file(xyz_filename, fmt=mc.FMT_XYZ, emitters_only=True)
cluster.save_to_file(xyz_filename, fmt=pmsco.cluster.FMT_XYZ, emitters_only=True)
_files[xyz_filename] = 'cluster'
return _files

309
pmsco/schedule.py Normal file
View File

@ -0,0 +1,309 @@
"""
@package pmsco.schedule
job schedule interface
this module defines common infrastructure to submit a pmsco calculation job to a job scheduler such as slurm.
the schedule can be defined as part of the run-file (see pmsco module).
users may derive sub-classes in a separate module to adapt to their own computing cluster.
the basic call sequence is:
1. create a schedule object.
2. initialize its properties with job parameters.
3. validate()
4. submit()
@author Matthias Muntwiler, matthias.muntwiler@psi.ch
@copyright (c) 2015-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
import collections.abc
import commentjson as json
import datetime
import logging
from pathlib import Path
import shutil
import subprocess
import pmsco.config
logger = logging.getLogger(__name__)
class JobSchedule(pmsco.config.ConfigurableObject):
"""
base class for job schedule
this class defines the abstract interface and some utilities.
derived classes may override any method, but should call the inherited method.
usage:
1. create object, assigning a project instance.
2. assign run_file.
3. call validate.
4. call submit.
this class' properties should not be listed in the run file - they will be overwritten.
"""
## @var enabled (bool)
#
# this parameter signals whether pmsco should schedule a job or run the calculation.
# it is not directly used by the schedule classes but by the pmsco module.
# it must be defined in the run file and set to true to submit the job to a scheduler.
# it is set to false in the run file copied to the job directory so that the job script starts the calculation.
def __init__(self, project):
super(JobSchedule, self).__init__()
self.project = project
self.enabled = False
self.run_dict = {}
self.job_dir = Path()
self.job_file = Path()
self.run_file = Path()
# directory that contains the pmsco and projects directories
self.pmsco_root = Path(__file__).parent.parent
def validate(self):
"""
validate the job parameters.
make sure all object attributes are correct for submission.
@return: None
"""
self.pmsco_root = Path(self.project.directories['pmsco']).parent
output_dir = Path(self.project.directories['output'])
assert self.pmsco_root.is_dir()
assert (self.pmsco_root / "pmsco").is_dir()
assert (self.pmsco_root / "projects").is_dir()
assert output_dir.is_dir()
assert self.project.job_name
self.job_dir = output_dir / self.project.job_name
self.job_dir.mkdir(parents=True, exist_ok=True)
self.job_file = (self.job_dir / self.project.job_name).with_suffix(".sh")
self.run_file = (self.job_dir / self.project.job_name).with_suffix(".json")
def submit(self):
"""
submit the job to the scheduler.
as of this class, the method does to following:
1. copy source files
2. copy a patched version of the run file.
3. write the job file (_write_job_file must be implemented by a derived class).
@return: None
"""
self._copy_source()
self._fix_run_file()
self._write_run_file()
self._write_job_file()
def _copy_source(self):
"""
copy the source files to the job directory.
the source_dir and job_dir attributes must be correct.
the job_dir directory must not exist and will be created.
this is a utility method used internally by derived classes.
job_dir/pmsco/pmsco/**
job_dir/pmsco/projects/**
job_dir/job.sh
job_dir/job.json
@return: None
"""
source = self.pmsco_root
dest = self.job_dir / "pmsco"
ignore = shutil.ignore_patterns(".*", "~*", "*~")
shutil.copytree(source / "pmsco", dest / "pmsco", ignore=ignore)
shutil.copytree(source / "projects", dest / "projects", ignore=ignore)
def _fix_run_file(self):
"""
fix the run file.
patch some entries of self.run_dict so that it can be used as run file.
the following changes are made:
1. set schedule.enabled to false so that the calculation is run.
2. set the output directory to the job directory.
3. set the log file to the job directory.
@return: None
"""
self.run_dict['schedule']['enabled'] = False
self.run_dict['project']['directories']['output'] = str(self.job_dir)
self.run_dict['project']['log_file'] = str((self.job_dir / self.project.job_name).with_suffix(".log"))
def _write_run_file(self):
"""
copy the run file.
this is a JSON dump of self.run_dict to the self.run_file file.
@return: None
"""
with open(self.run_file, "wt") as f:
json.dump(self.run_dict, f, indent=2)
def _write_job_file(self):
"""
create the job script.
this method must be implemented by a derived class.
the script must be written to the self.job_file file.
don't forget to make the file executable.
@return: None
"""
pass
class SlurmSchedule(JobSchedule):
"""
job schedule for a slurm scheduler.
this class implements commonly used features of the slurm scheduler.
host-specific features and the creation of the job file should be done in a derived class.
derived classes must, in particular, implement the _write_job_file method.
they can override other methods, too, but should call the inherited method first.
1. copy the source trees (pmsco and projects) to the job directory
2. copy a patched version of the run file.
3. call the submission command
the public properties of this class should be assigned from the run file.
"""
def __init__(self, project):
super(SlurmSchedule, self).__init__(project)
self.host = ""
self.nodes = 1
self.tasks_per_node = 8
self.wall_time = datetime.timedelta(hours=1)
self.signal_time = 600
self.manual = True
@staticmethod
def parse_timedelta(td):
"""
parse time delta input formats
converts a string or dictionary from run-file into datetime.timedelta.
@param td:
str: [days-]hours[:minutes[:seconds]]
dict: days, hours, minutes, seconds - at least one needs to be defined. values must be numeric.
datetime.timedelta - native type
@return: datetime.timedelta
"""
if isinstance(td, str):
dt = {}
d = td.split("-")
if len(d) > 1:
dt['days'] = float(d.pop(0))
t = d[0].split(":")
try:
dt['hours'] = float(t.pop(0))
dt['minutes'] = float(t.pop(0))
dt['seconds'] = float(t.pop(0))
except (IndexError, ValueError):
pass
td = datetime.timedelta(**dt)
elif isinstance(td, collections.abc.Mapping):
td = datetime.timedelta(**td)
return td
def validate(self):
super(SlurmSchedule, self).validate()
self.wall_time = self.parse_timedelta(self.wall_time)
assert self.job_dir.is_absolute()
def submit(self):
"""
call the sbatch command
if manual is true, the job files are generated but the job is not submitted.
@return: None
"""
super(SlurmSchedule, self).submit()
args = ['sbatch', str(self.job_file)]
print(" ".join(args))
if self.manual:
print("manual run - job files created but not submitted")
else:
cp = subprocess.run(args)
cp.check_returncode()
class PsiRaSchedule(SlurmSchedule):
"""
job shedule for the Ra cluster at PSI.
this class selects specific features of the Ra cluster,
such as the partition and node type (24 or 32 cores).
it also implements the _write_job_file method.
"""
## @var partition (str)
#
# the partition is selected based on wall time and number of tasks by the validate() method.
# it should not be listed in the run file.
def __init__(self, project):
super(PsiRaSchedule, self).__init__(project)
self.partition = "shared"
def validate(self):
super(PsiRaSchedule, self).validate()
assert self.nodes <= 2
assert self.tasks_per_node <= 24 or self.tasks_per_node == 32
assert self.wall_time.total_seconds() >= 60
if self.wall_time.total_seconds() > 24 * 60 * 60:
self.partition = "week"
elif self.tasks_per_node < 24:
self.partition = "shared"
else:
self.partition = "day"
assert self.partition in ["day", "week", "shared"]
def _write_job_file(self):
lines = []
lines.append('#!/bin/bash')
lines.append('#SBATCH --export=NONE')
lines.append(f'#SBATCH --job-name="{self.project.job_name}"')
lines.append(f'#SBATCH --partition={self.partition}')
lines.append(f'#SBATCH --time={int(self.wall_time.total_seconds() / 60)}')
lines.append(f'#SBATCH --nodes={self.nodes}')
lines.append(f'#SBATCH --ntasks-per-node={self.tasks_per_node}')
if self.tasks_per_node > 24:
lines.append('#SBATCH --cores-per-socket=16')
# 0 - 65535 seconds
# currently, PMSCO does not react to signals properly
# lines.append(f'#SBATCH --signal=TERM@{self.signal_time}')
lines.append(f'#SBATCH --output="{self.project.job_name}.o.%j"')
lines.append(f'#SBATCH --error="{self.project.job_name}.e.%j"')
lines.append('module load psi-python36/4.4.0')
lines.append('module load gcc/4.8.5')
lines.append('module load openmpi/3.1.3')
lines.append('source activate pmsco')
lines.append(f'cd "{self.job_dir}"')
lines.append(f'mpirun python pmsco/pmsco -r {self.run_file.name}')
lines.append(f'cd "{self.job_dir}"')
lines.append('rm -rf pmsco')
lines.append('exit 0')
self.job_file.write_text("\n".join(lines))
self.job_file.chmod(0o755)

View File

@ -0,0 +1,93 @@
{
// line comments using // or # prefix are allowed as an extension of JSON syntax
"project": {
"__module__": "projects.twoatom.twoatom",
"__class__": "TwoatomProject",
"job_name": "twoatom0002",
"job_tags": [],
"description": "",
"mode": "single",
"directories": {
"data": "",
"output": ""
},
"keep_files": [
"cluster",
"model",
"scan",
"report",
"population"
],
"keep_best": 10,
"keep_levels": 1,
"time_limit": 24,
"log_file": "",
"log_level": "WARNING",
"cluster_generator": {
"__class__": "TwoatomCluster",
"atom_types": {
"A": "N",
"B": "Ni"
},
"model_dict": {
"dAB": "dNNi",
"th": "pNNi",
"ph": "aNNi"
}
},
"atomic_scattering_factory": "InternalAtomicCalculator",
"multiple_scattering_factory": "EdacCalculator",
"model_space": {
"dNNi": {
"start": 2.109,
"min": 2.0,
"max": 2.25,
"step": 0.05
},
"pNNi": {
"start": 15.0,
"min": 0.0,
"max": 30.0,
"step": 1.0
},
"V0": {
"start": 21.966,
"min": 15.0,
"max": 25.0,
"step": 1.0
},
"Zsurf": {
"start": 1.449,
"min": 0.5,
"max": 2.0,
"step": 0.25
}
},
"domains": [
{
"default": 0.0
}
],
"scans": [
{
"__class__": "mp.ScanCreator",
"filename": "twoatom_energy_alpha.etpai",
"emitter": "N",
"initial_state": "1s",
"positions": {
"e": "np.arange(10, 400, 5)",
"t": "0",
"p": "0",
"a": "np.linspace(-30, 30, 31)"
}
}
],
"optimizer_params": {
"pop_size": 0,
"seed_file": "",
"seed_limit": 0,
"recalc_seed": true,
"table_file": ""
}
}
}

View File

@ -0,0 +1,90 @@
{
// line comments using // or # prefix are allowed as an extension of JSON syntax
"project": {
"__module__": "projects.twoatom.twoatom",
"__class__": "TwoatomProject",
"job_name": "twoatom0001",
"job_tags": [],
"description": "",
"mode": "single",
"directories": {
"data": "",
"output": ""
},
"keep_files": [
"cluster",
"model",
"scan",
"report",
"population"
],
"keep_best": 10,
"keep_levels": 1,
"time_limit": 24,
"log_file": "",
"log_level": "WARNING",
"cluster_generator": {
"__class__": "TwoatomCluster",
"atom_types": {
"A": "N",
"B": "Ni"
},
"model_dict": {
"dAB": "dNNi",
"th": "pNNi",
"ph": "aNNi"
}
},
"atomic_scattering_factory": "InternalAtomicCalculator",
"multiple_scattering_factory": "EdacCalculator",
"model_space": {
"dNNi": {
"start": 2.109,
"min": 2.0,
"max": 2.25,
"step": 0.05
},
"pNNi": {
"start": 15.0,
"min": 0.0,
"max": 30.0,
"step": 1.0
},
"V0": {
"start": 21.966,
"min": 15.0,
"max": 25.0,
"step": 1.0
},
"Zsurf": {
"start": 1.449,
"min": 0.5,
"max": 2.0,
"step": 0.25
}
},
"domains": [
{
"default": 0.0
}
],
"scans": [
{
// class name as it would be used in the project module
"__class__": "mp.ScanLoader",
// any placeholder key from project.directories can be used
"filename": "{project}/twoatom_hemi_250e.etpi",
"emitter": "N",
"initial_state": "1s",
"is_modf": false
}
],
"optimizer_params": {
"pop_size": 0,
"seed_file": "",
"seed_limit": 0,
"recalc_seed": true,
"table_file": ""
}
}
}

View File

@ -308,14 +308,12 @@ def set_project_args(project, project_args):
@param project_args: (Namespace object) project arguments.
"""
scans = ['tp250e']
scans = []
try:
if project_args.scans:
scans = project_args.scans
else:
logger.warning(BMsg("missing scan argument, using {0}", scans[0]))
except AttributeError:
logger.warning(BMsg("missing scan argument, using {0}", scans[0]))
pass
for scan_key in scans:
scan_spec = project.scan_dict[scan_key]
@ -337,7 +335,7 @@ def parse_project_args(_args):
parser = argparse.ArgumentParser()
# main arguments
parser.add_argument('-s', '--scans', nargs="*", default=['tp250e'],
parser.add_argument('-s', '--scans', nargs="*",
help="nick names of scans to use in calculation (see create_project function)")
parsed_args = parser.parse_args(_args)

View File

@ -1,3 +1,4 @@
python >= 3.6
attrdict
fasteners
numpy >= 1.13
@ -11,3 +12,4 @@ matplotlib
future
swig
gitpython
commentjson

View File

@ -10,20 +10,17 @@ to run the tests, change to the directory which contains the tests directory, an
@author Matthias Muntwiler, matthias.muntwiler@psi.ch
@copyright (c) 2015-18 by Paul Scherrer Institut @n
@copyright (c) 2015-21 by Paul Scherrer Institut @n
Licensed under the Apache License, Version 2.0 (the "License"); @n
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
"""
from __future__ import absolute_import
from __future__ import division
from __future__ import print_function
import mock
import numpy as np
import os
from pathlib import Path
import unittest
import pmsco.data as data
@ -31,6 +28,103 @@ import pmsco.dispatch as dispatch
import pmsco.project as project
class TestModelSpace(unittest.TestCase):
def setUp(self):
self.d1 = {
"A": {"start": 2.1, "min": 2.0, "max": 3.0, "step": 0.05},
"B": {"start": 15.0, "min": 0.0, "max": 30.0, "step": 1.0}}
self.d2 = {
"C": {"start": 22.0, "min": 15.0, "max": 25.0, "step": 1.0},
"D": {"start": 1.5, "min": 0.5, "max": 2.0, "step": 0.25}}
def test_add_param(self):
ms = project.ModelSpace()
ms.start['A'] = 2.1
ms.min['A'] = 2.0
ms.max['A'] = 3.0
ms.step['A'] = 0.05
ms.add_param("E", 5.0, 1.0, 9.0, 0.2)
ms.add_param("F", 8.0, width=6.0, step=0.5)
d_start = {'A': 2.1, 'E': 5.0, 'F': 8.0}
d_min = {'A': 2.0, 'E': 1.0, 'F': 5.0}
d_max = {'A': 3.0, 'E': 9.0, 'F': 11.0}
d_step = {'A': 0.05, 'E': 0.2, 'F': 0.5}
self.assertDictEqual(ms.start, d_start)
self.assertDictEqual(ms.min, d_min)
self.assertDictEqual(ms.max, d_max)
self.assertDictEqual(ms.step, d_step)
def test_get_param(self):
ms = project.ModelSpace()
ms.add_param("A", **self.d1['A'])
ms.add_param("B", **self.d1['B'])
result = ms.get_param('B')
expected = {'start': 15.0, 'min': 0.0, 'max': 30.0, 'step': 1.0}
self.assertIsInstance(result, project.ParamSpace)
self.assertEqual(result.start, expected['start'])
self.assertEqual(result.min, expected['min'])
self.assertEqual(result.max, expected['max'])
self.assertEqual(result.step, expected['step'])
def test_set_param_dict(self):
ms = project.ModelSpace()
ms.set_param_dict(self.d1)
ms.set_param_dict(self.d2)
d_start = {'C': 22.0, 'D': 1.5}
d_min = {'C': 15.0, 'D': 0.5}
d_max = {'C': 25.0, 'D': 2.0}
d_step = {'C': 1.0, 'D': 0.25}
self.assertDictEqual(ms.start, d_start)
self.assertDictEqual(ms.min, d_min)
self.assertDictEqual(ms.max, d_max)
self.assertDictEqual(ms.step, d_step)
class TestScanCreator(unittest.TestCase):
"""
test case for @ref pmsco.project.ScanCreator class
"""
def test_load_1(self):
"""
test the load method, case 1
test for:
- correct array expansion of an ['e', 'a'] scan.
- correct file name expansion with place holders and pathlib.Path objects.
"""
sc = project.ScanCreator()
sc.filename = Path("{test_p}", "twoatom_energy_alpha.etpai")
sc.positions = {
"e": "np.arange(10, 400, 5)",
"t": "0",
"p": "0",
"a": "np.linspace(-30, 30, 31)"
}
sc.emitter = "Cu"
sc.initial_state = "2p3/2"
p = Path(__file__).parent / ".." / "projects" / "twoatom"
dirs = {"test_p": p,
"test_s": str(p)}
result = sc.load(dirs=dirs)
self.assertEqual(result.mode, ['e', 'a'])
self.assertEqual(result.emitter, sc.emitter)
self.assertEqual(result.initial_state, sc.initial_state)
e = np.arange(10, 400, 5)
a = np.linspace(-30, 30, 31)
t = p = np.asarray([0])
np.testing.assert_array_equal(result.energies, e)
np.testing.assert_array_equal(result.thetas, t)
np.testing.assert_array_equal(result.phis, p)
np.testing.assert_array_equal(result.alphas, a)
self.assertTrue(Path(result.filename).is_file(), msg=f"file {result.filename} not found")
class TestScan(unittest.TestCase):
"""
test case for @ref pmsco.project.Scan class