Error with payu and loading modules

SeanBryan51 · 18 April 2023 02:11

I’m trying to run CABLE with payu using an experiment @clairecarouge had made a while back: GitHub - ccarouge/cbl_bench_carbon: Inputs to run CABLE spatial benchmarking.

When running the payu run command, the following error was produced: (in /home/189/sb8430/cbl_bench_carbon/benchmark.e79641801)

Currently Loaded Modulefiles:
 1) pbs   2) openmpi/4.1.4(default)  
MODULE ERROR DETECTED: GLOBALERR intel-mpi/2019.5.281 cannot be loaded due to a conflict.
(Detailed error information and backtrace has been suppressed, set $MODULES_ERROR_BACKTRACE to unsuppress.)

Loading intel-mpi/2019.5.281
  ERROR: intel-mpi/2019.5.281 cannot be loaded due to a conflict.
    HINT: Might try "module unload openmpi" first.
payu: Model exited with error code 127; aborting.

Does anyone know what is causing the error?

Steps taken:

Clone the payu repository and checkout the CABLE model driver branch. Then install payu locally.
Checkout the experiment and edit paths to exe, input and restart in the config file. This is what the config file looks like:

# PBS flags
queue: normal
walltime: 1:00:00
ncpus: 16
# mem: 64GB

# mpirun:
#    - --bind-to none

jobname: benchmark

# Model config
model: cable
exe: /home/189/sb8430/cable/trunk/offline/cable-mpi
input:
    - /g/data/tm70/ccc561/MetFiles/CRUNCEP
    - /home/189/sb8430/cbl_bench_carbon/inputs


collate: False

runspersub: 2

restart: /home/189/sb8430/cbl_bench_carbon/inputs

# Required option! Ensure the same run is repeated everytime the benchmark is done
repeat: True

Make a cable.res.yaml file in the cbl_bench_carbon/inputs directory and copy the following contents into cable.res.yaml:

year: 1904

Run payu setup from the experiment directory.
Run payu sweep and then payu run

SeanBryan51 · 18 April 2023 02:12

@Aidan do you know what is causing the error?

Aidan · 18 April 2023 02:18

payu inspects your executable to try and guess what MPI library is required to run it. Effectively it is running ldd:

$ ldd /home/189/sb8430/cable/trunk/offline/cable-mpi | grep libmpi
	libmpifort.so.12 => /apps/intel-mpi/2019.5.281/intel64/lib/libmpifort.so.12 (0x000014ce80833000)
	libmpi.so.12 => /apps/intel-mpi/2019.5.281/intel64/lib/release/libmpi.so.12 (0x000014ce7f82e000)

The error says:

Currently Loaded Modulefiles:
 1) pbs   2) openmpi/4.1.4(default)

and

ERROR: intel-mpi/2019.5.281 cannot be loaded due to a conflict.

so it seems that it doesn’t like that you’ve already got openmpi loaded. Can you try unloading it first? If that doesn’t work, check you aren’t doing any module load calls in your ~/.bashrc or equivalent.

SeanBryan51 · 18 April 2023 02:54

I’m not sure how openmpi/4.1.4 is getting loaded before hand. I’ve purged my modules and I don’t have any module load calls in my ~/.bashrc file. Everything in my ~/.bashrc seems harmless:

# .bashrc

# Source global definitions (Required for modules)
if [ -f /etc/bashrc ]; then
	. /etc/bashrc
fi

if in_interactive_shell; then
    # This is where you put settings that you'd like in
    # interactive shells. E.g. prompts, or aliases
    # The 'module' command offers path manipulation that
    # will only modify the path if the entry to be added
    # is not already present. Use these functions instead of e.g.
    # PATH=${HOME}/bin:$PATH

    prepend_path PATH ${HOME}/bin
    prepend_path PATH ${HOME}/.local/bin
    
    if in_login_shell; then
	# This is where you place things that should only
	# run when you login. If you'd like to run a
	# command that displays the status of something, or
	# load a module, or change directory, this is the
	# place to put it
	# module load pbs
    module use /g/data/hh5/public/modules
    export SVN_EDITOR=vim
    prepend_path PATH /g/data/hh5/public/apps/nci_scripts # to use scripts such as uqstat
	# cd /scratch/${PROJECT}/${USER}
    fi

fi

# Anything here will run whenever a new shell is launched, which
# includes when running commands like 'less'. Commands that
# produce output should not be placed in this section.
#
# If you need different behaviour depending on what machine you're
# using to connect to Gadi, you can use the following test:
#
# if [[ $SSH_CLIENT =~ 11.22.33.44 ]]; then
#     Do something when I connect from the IP 11.22.33.44
# fi
#
# If you want different behaviour when entering a PBS job (e.g.
# a default set of modules), test on the $in_pbs_job variable.
# This will run when any new shell is launched in a PBS job,
# so it should not produce output
#
# if in_pbs_job; then
#     module purge
# fi

# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
    eval "$__conda_setup"
else
    if [ -f "/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/etc/profile.d/conda.sh" ]; then
        . "/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/etc/profile.d/conda.sh"
    else
        export PATH="/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/bin:$PATH"
    fi
fi
unset __conda_setup
# <<< conda initialize <<<

I have also tried adding a module purge line to my ~/.bashrc by uncommenting

# if in_pbs_job; then
#     module purge
# fi

but that doesn’t prevent the issue.

Aidan · 18 April 2023 03:49

Try payu-run rather than payu run and report back what happens. This will run payu directly on the login node, which at least tests it without submitting to the queue and spawning a whole new process. It will probably run ok seeing as you’re only asking for 16 cpus, but don’t let it run to completion if the runtime is greater than a few minutes.

In addition you could also try setting the MODULES_ERROR_BACKTRACE environment variable to get more verbose output.

SeanBryan51 · 18 April 2023 04:38

I’ve tried setting the MODULES_ERROR_BACKTRACE environment variable by running:

export MODULES_ERROR_BACKTRACE=unsuppress

But this didn’t seem to produce a more verbose error message when running payu-run, here is the output:

$ payu-run
laboratory path:  /scratch/tm70/sb8430/cable
binary path:  /scratch/tm70/sb8430/cable/bin
input path:  /scratch/tm70/sb8430/cable/input
work path:  /scratch/tm70/sb8430/cable/work
archive path:  /scratch/tm70/sb8430/cable/archive
nruns: 1 nruns_per_submit: 2 subrun: 1
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
Setting up cable
Checking exe and input manifests
Updating full hashes for 1 files in manifests/exe.yaml
Creating restart manifest
Updating full hashes for 11 files in manifests/restart.yaml
Writing manifests/restart.yaml
Writing manifests/exe.yaml
payu: Found modules in /opt/Modules/v4.3.0
mod conda/analysis3-22.10
ERROR: Multiple (2) conda environments have been loaded, cannot unload with module
ERROR: Try 'conda deactivate' first

Unloading conda/analysis3-22.10
  ERROR: Module evaluation aborted
Currently Loaded Modulefiles:
 1) conda/analysis3-22.10(analysis:analysis3:default)   2) pbs   3) openmpi/4.1.4(default)
MODULE ERROR DETECTED: GLOBALERR intel-mpi/2019.5.281 cannot be loaded due to a conflict.
(Detailed error information and backtrace has been suppressed, set $MODULES_ERROR_BACKTRACE to unsuppress.)

Loading intel-mpi/2019.5.281
  ERROR: intel-mpi/2019.5.281 cannot be loaded due to a conflict.
    HINT: Might try "module unload openmpi" first.
git add /home/189/sb8430/cbl_bench_carbon/config.yaml
git add /home/189/sb8430/cbl_bench_carbon/cable.nml
git add manifests/input.yaml
git add manifests/restart.yaml
git add manifests/exe.yaml
git commit -am "2023-04-18 14:26:48: Run 0"
TODO: Check if commit is unchanged
mpirun  -np 16  /scratch/tm70/sb8430/cable/work/cbl_bench_carbon/cable-mpi
/home/189/sb8430/cbl_bench_carbon/cable.err /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/cable.3494a6.err
/home/189/sb8430/cbl_bench_carbon/cable.out /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/cable.3494a6.out
/home/189/sb8430/cbl_bench_carbon/job.yaml /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/job.3494a6.yaml
/home/189/sb8430/cbl_bench_carbon/env.yaml /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/env.3494a6.yaml
payu: Model exited with error code 127; aborting.

Aidan · 18 April 2023 05:31

SeanBryan51:

ERROR: Multiple (2) conda environments have been loaded, cannot unload with module
ERROR: Try 'conda deactivate' first

Unloading conda/analysis3-22.10
  ERROR: Module evaluation aborted

Is this a new error?

Regardless I reproduced your error and fixed it by adding this to the config.yaml:

mpi:
   module: intel-mpi

I’ve not tried using intel-mpi (we usually use openmpi) and payu does some stuff to “normalise” the module setup, and assumes openmpi is the MPI implementation, and so loads it. Some of that stuff needs to be looked into I’d say, but for the moment this is the work-around.

Created a payu issue about this

github.com/payu-org/payu

Error using intel-mpi

opened 05:37AM - 18 Apr 23 UTC

aidanheerdegen

Seems by default `payu` will not work with an executable compiled with `intel-mp…i` on `gadi`: https://forum.access-hive.org.au/t/error-with-payu-and-loading-modules/679 This error is thrown: ``` Currently Loaded Modulefiles: 1) pbs 2) openmpi/4.1.4(default) MODULE ERROR DETECTED: GLOBALERR intel-mpi/2019.5.281 cannot be loaded due to a conflict. (Detailed error information and backtrace has been suppressed, set $MODULES_ERROR_BACKTRACE to unsuppress.) Loading intel-mpi/2019.5.281 ERROR: intel-mpi/2019.5.281 cannot be loaded due to a conflict. HINT: Might try "module unload openmpi" first. payu: Model exited with error code 127; aborting. ``` It's because `payu` assume the MPI library is `openmpi` and adds that to the list of automatically loaded modules: https://github.com/payu-org/payu/blob/master/payu/experiment.py#L239 Work-around is to add this to `config.yaml` ```yaml mpi: module: intel-mpi ``` but should probably do something a bit better than this by default

SeanBryan51 · 18 July 2023 05:30

After revisiting this issue, it turns out that the Intel MPI library is not supported by payu as payu requires a number of argument flags based on the OpenMPI implementation (this is stated here).

The solution was to recompile the executable using the OpenMPI implementation.

Topic		Replies	Views
PAYU issues on Setonix Technical payu	17	427	26 September 2023
PAYU issues on Leonardo Technical help , payu , inscope	32	260	23 April 2025
Payu error Technical help	4	21	9 May 2025
Payu: a workflow manager for some ACCESS models ACCESS-NRI Releases nci , release , payu , nri-updates	7	761	23 July 2025
ESM1.6 Development using NRI repos and PAYU CMIP7 development	26	92	27 November 2024

Error with payu and loading modules

Steps taken:

Related topics