When running the payu run command, the following error was produced: (in /home/189/sb8430/cbl_bench_carbon/benchmark.e79641801)
Currently Loaded Modulefiles:
1) pbs 2) openmpi/4.1.4(default)
MODULE ERROR DETECTED: GLOBALERR intel-mpi/2019.5.281 cannot be loaded due to a conflict.
(Detailed error information and backtrace has been suppressed, set $MODULES_ERROR_BACKTRACE to unsuppress.)
Loading intel-mpi/2019.5.281
ERROR: intel-mpi/2019.5.281 cannot be loaded due to a conflict.
HINT: Might try "module unload openmpi" first.
payu: Model exited with error code 127; aborting.
Currently Loaded Modulefiles:
1) pbs 2) openmpi/4.1.4(default)
and
ERROR: intel-mpi/2019.5.281 cannot be loaded due to a conflict.
so it seems that it doesn’t like that you’ve already got openmpi loaded. Can you try unloading it first? If that doesn’t work, check you aren’t doing any module load calls in your ~/.bashrc or equivalent.
I’m not sure how openmpi/4.1.4 is getting loaded before hand. I’ve purged my modules and I don’t have any module load calls in my ~/.bashrc file. Everything in my ~/.bashrc seems harmless:
# .bashrc
# Source global definitions (Required for modules)
if [ -f /etc/bashrc ]; then
. /etc/bashrc
fi
if in_interactive_shell; then
# This is where you put settings that you'd like in
# interactive shells. E.g. prompts, or aliases
# The 'module' command offers path manipulation that
# will only modify the path if the entry to be added
# is not already present. Use these functions instead of e.g.
# PATH=${HOME}/bin:$PATH
prepend_path PATH ${HOME}/bin
prepend_path PATH ${HOME}/.local/bin
if in_login_shell; then
# This is where you place things that should only
# run when you login. If you'd like to run a
# command that displays the status of something, or
# load a module, or change directory, this is the
# place to put it
# module load pbs
module use /g/data/hh5/public/modules
export SVN_EDITOR=vim
prepend_path PATH /g/data/hh5/public/apps/nci_scripts # to use scripts such as uqstat
# cd /scratch/${PROJECT}/${USER}
fi
fi
# Anything here will run whenever a new shell is launched, which
# includes when running commands like 'less'. Commands that
# produce output should not be placed in this section.
#
# If you need different behaviour depending on what machine you're
# using to connect to Gadi, you can use the following test:
#
# if [[ $SSH_CLIENT =~ 11.22.33.44 ]]; then
# Do something when I connect from the IP 11.22.33.44
# fi
#
# If you want different behaviour when entering a PBS job (e.g.
# a default set of modules), test on the $in_pbs_job variable.
# This will run when any new shell is launched in a PBS job,
# so it should not produce output
#
# if in_pbs_job; then
# module purge
# fi
# >>> conda initialize >>>
# !! Contents within this block are managed by 'conda init' !!
__conda_setup="$('/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/bin/conda' 'shell.bash' 'hook' 2> /dev/null)"
if [ $? -eq 0 ]; then
eval "$__conda_setup"
else
if [ -f "/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/etc/profile.d/conda.sh" ]; then
. "/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/etc/profile.d/conda.sh"
else
export PATH="/g/data/hh5/public/apps/miniconda3/envs/analysis3-22.07/bin:$PATH"
fi
fi
unset __conda_setup
# <<< conda initialize <<<
I have also tried adding a module purge line to my ~/.bashrc by uncommenting
# if in_pbs_job; then
# module purge
# fi
but that doesn’t prevent the issue.
aidanheerdegen
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
5
Try payu-run rather than payu run and report back what happens. This will run payu directly on the login node, which at least tests it without submitting to the queue and spawning a whole new process. It will probably run ok seeing as you’re only asking for 16 cpus, but don’t let it run to completion if the runtime is greater than a few minutes.
In addition you could also try setting the MODULES_ERROR_BACKTRACE environment variable to get more verbose output.
I’ve tried setting the MODULES_ERROR_BACKTRACE environment variable by running:
export MODULES_ERROR_BACKTRACE=unsuppress
But this didn’t seem to produce a more verbose error message when running payu-run, here is the output:
$ payu-run
laboratory path: /scratch/tm70/sb8430/cable
binary path: /scratch/tm70/sb8430/cable/bin
input path: /scratch/tm70/sb8430/cable/input
work path: /scratch/tm70/sb8430/cable/work
archive path: /scratch/tm70/sb8430/cable/archive
nruns: 1 nruns_per_submit: 2 subrun: 1
Loading input manifest: manifests/input.yaml
Loading restart manifest: manifests/restart.yaml
Loading exe manifest: manifests/exe.yaml
Setting up cable
Checking exe and input manifests
Updating full hashes for 1 files in manifests/exe.yaml
Creating restart manifest
Updating full hashes for 11 files in manifests/restart.yaml
Writing manifests/restart.yaml
Writing manifests/exe.yaml
payu: Found modules in /opt/Modules/v4.3.0
mod conda/analysis3-22.10
ERROR: Multiple (2) conda environments have been loaded, cannot unload with module
ERROR: Try 'conda deactivate' first
Unloading conda/analysis3-22.10
ERROR: Module evaluation aborted
Currently Loaded Modulefiles:
1) conda/analysis3-22.10(analysis:analysis3:default) 2) pbs 3) openmpi/4.1.4(default)
MODULE ERROR DETECTED: GLOBALERR intel-mpi/2019.5.281 cannot be loaded due to a conflict.
(Detailed error information and backtrace has been suppressed, set $MODULES_ERROR_BACKTRACE to unsuppress.)
Loading intel-mpi/2019.5.281
ERROR: intel-mpi/2019.5.281 cannot be loaded due to a conflict.
HINT: Might try "module unload openmpi" first.
git add /home/189/sb8430/cbl_bench_carbon/config.yaml
git add /home/189/sb8430/cbl_bench_carbon/cable.nml
git add manifests/input.yaml
git add manifests/restart.yaml
git add manifests/exe.yaml
git commit -am "2023-04-18 14:26:48: Run 0"
TODO: Check if commit is unchanged
mpirun -np 16 /scratch/tm70/sb8430/cable/work/cbl_bench_carbon/cable-mpi
/home/189/sb8430/cbl_bench_carbon/cable.err /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/cable.3494a6.err
/home/189/sb8430/cbl_bench_carbon/cable.out /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/cable.3494a6.out
/home/189/sb8430/cbl_bench_carbon/job.yaml /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/job.3494a6.yaml
/home/189/sb8430/cbl_bench_carbon/env.yaml /scratch/tm70/sb8430/cable/archive/cbl_bench_carbon/error_logs/env.3494a6.yaml
payu: Model exited with error code 127; aborting.
aidanheerdegen
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
7
Is this a new error?
Regardless I reproduced your error and fixed it by adding this to the config.yaml:
mpi:
module: intel-mpi
I’ve not tried using intel-mpi (we usually use openmpi) and payu does some stuff to “normalise” the module setup, and assumes openmpi is the MPI implementation, and so loads it. Some of that stuff needs to be looked into I’d say, but for the moment this is the work-around.