Payu: a workflow manager for some ACCESS models

23/08/2024

:rocket: Release: 1.1.5

https://github.com/payu-org/payu/releases/tag/1.1.5

Getting Started:

This version of payu can now be accessed on gadi:

module use /g/data/vk83/modules
module load payu/1.1.5

Updates

:white_check_mark: Get module executables from paths added by loaded environment modules (#439, #482)
:white_check_mark: Allow generic tracers with cesm_cmeps driver (#433)
:white_check_mark: Remove unnecessary UM config files (#455)
:white_check_mark: Replace UM um_env.py configuration file with yaml file (#459)
:white_check_mark: User scripts and postscript updates to error handling, environment variables and shell features (#452, #467, #445)
:white_check_mark: Adding date based pruning for ACCESS-ESM1.5 (#465)
:white_check_mark: Changes to manifest logic (#475)
:white_check_mark: Enable seperate ice_history.nml & cice_in.nml settings (#483)
:white_check_mark: Replace CICE start date calculations (#484)
:white_check_mark: Add command line flag to disable metadata generation and related commits (447)

For a full list of pull requests that includes additional bug-fixes, see the Payu Github 1.1.5 release.

Notes

Python version 3.10

Released payu environments modules from payu/1.1.5 onwards are based on python version 3.10.

UM um_env.py files

UM um_env.py configuration files are no longer supported and should be replaced with um_env.yaml files. Existing um_env.py files can be converted to yaml files, using this script.

Loading model executables using model modules

Payu can now find model executables by searching paths added to $PATH by model environment modules. This simplifies the config.yaml, as only the name of the executable is required and changing model versions is simpler and less error prone. For example,

# Modules for loading model executables
modules:
  use:
      - /g/data/vk83/modules
  load:
      - access-esm1p5/2024.05.0

...
-  name: ocean
   model: mom
   exe: fms_ACCESS-CM.x

Previously, the executable had to be specified as a full path. For executables built by spack, these included a hash in the path, and were long and complicated. For example:

    exe: /g/data/vk83/apps/spack/0.22/restricted/ukmo/release/linux-rocky8-x86_64_v4/intel-19.0.3.199/mom5-git.access-esm1.5_2024.06.20_access-esm1.5-wxxrc3ivrjz76yx565ddkuuiwoqpalko/bin/fms_ACCESS-CM.x

Loaded modules in config.yaml must be unique to ensure the correct model executables is used. This means modules must be specified with a version, and modules of the same name and version can not be found in multiple module directories.

Updates to user processing scripts

Payu now exports some current run information to environment variables so they can be accessed in post-processing scripts:

  • PAYU_CURRENT_RUN - The current run number, e.g. 0 for first run, 1 for the second run
  • PAYU_ARCHIVE_DIR - Full path to the archive directory - this contains all the outputs and restarts subdirectories
  • PAYU_CURRENT_OUTPUT_DIR, PAYU_CURRENT_RESTART_DIR - Full path to the current output and restart directories, e.g. for first run, it would be /path/to/archive/output000 and path/to/archive/restart000

Userscript and postscript commands calls can now also include shell-specific values such as file re-directions, pipes and environment variables (which are expanded). So it’s now possible to run commands such as:

runscript:
    setup: echo "some_data" > input.txt
    archive: some_script.sh $PAYU_CURRENT_OUTPUT_DIR

If users scrips exit with an error payu run execution halts. If this is not desirable, error handling will need to be added to post-processing scripts. Previously only warnings were issued if a user scripts exited with an error.

Changes to manifest logic

Payu manifests store information of files in the work directory of an experiment. They are used to track changes to files over an experiment for experiment provenance, and ensure an experiment run can be reproduced. There are three manifests types: executable files, input and restart files (exe.yaml, input.yaml and restart.yaml respectively).

The logic for updating manifests and enforcing reproducibility has been greatly simplified:

  • Stored manifests from previous runs are used as the source of truth for full (md5) hashes, for all manifest types. Previously this was the case for only input manifests.
  • Fast change sensitive hashes, by default binhash, are calculated at each payu setup. If a fast hash matches the value in the stored manifest the full hash from the stored manifest is used. This avoids re-calculating slow md5 hashes where possible.
  • All changes to config.yaml are now correctly detected. For example a different executable path or new input file paths.
  • scaninputs option has been removed. This allowed existing file paths to change but not scan for new inputs. This means only paths configured in config.yaml or found through searching input and restart directories, are added to the work directory.

Enforcing reproducibility

Setting reproduce to true in config.yaml or via a command-line option will check and make sure files have not changed since the previous run. It is possible to set reproduce for each manifest type separately.

When reproduce for a manifest type is set to true, payu will refuse to run if:

  • Full hash changes: calculated md5 hashes differ to full hash in stored manifest
  • New files: if files are found in the work directory that were not in the stored manifest
  • Missing files: files in the stored manifest are not present in the work directory

If a full path to a file has changed or a fast hash has been changed, but there’s a match with the stored full hash (so it is effectively the same file), the manifest will be updated.

Changes to config.yaml are now correctly picked up. For example a different executable path or new input file paths. Previously specifying reproducibility would only add paths in the manifests to the work directory and raise errors if those files were modified.

For more information on manifests, see the payu documentation for configuring your experiment and manifests content and tracking.

Disabling metadata + UUID generation and commits

To update manifests without auto-updating metadata:

payu setup --metadata-off 

The --metadata-off/-m command line flag was added to make it more convenient to update released configurations which do not include a UUID.

Previously the only way to disable generating a new UUID and updated metadata.yaml file and related git commits was via config.yaml:

metadata:
    enable: false

This option is only available with the payu setup and payu sweep commands. Disabling metadata for payu run still needs to be done via config.yaml.

Support

Replies to this topic are disabled.

If you have specific questions about this release follow the guidelines for requesting help from ACCESS-NRI.

If you have questions about payu create a topic in a category that best matches the model you are using, or in the Technical category and tag it with payu. If you require assistance follow the guidelines for requesting help from ACCESS-NRI.

Credits

Development was by @jo-basevi, @spencerwong, @dougiesquire, @anton, @Aidan and @TommyGatti.

3 Likes