[I hope it’s ok, I moved the replies from the original post to here]
Thanks for providing that detail @sschroeter.
When I tried tried to reproduce your error I also found and reported the payu
bug when checking out an experiment with an incompatible project/laboratory
, so thanks!
I located the problem. payu
examines the openmpi
libraries that the executables are linked against and tries to load the modules necessary to “find” those important libraries.
The executables in that config are linked against libraries that don’t have a corresponding module that can be loaded:
$ ldd /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM_97e3429_libaccessom2_1bb8904.x | grep -i libmpi.so
libmpi.so.40 => /apps/openmpi-mofed4.7-pbs19.2/4.0.1/lib/libmpi.so.40 (0x000014c292564000)
So payu
reports an error when it runs
module load openmpi-mofed4.7-pbs19.2
as @Dhruv_Bhagtani surmised.
But I think this is a red herring. When I do as you suggest and use the most recent commit in the ryf9091_gadi
branch
git clone https://github.com/rmholmes/025deg_jra55_ryf my_new_example
cd my_new_example/
payu checkout -b new_branch_name ryf9091_gadi -r /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_gadi/restart250/
It is still using the same executable that references the problematic library path
$ ldd /g/data/ik11/inputs/access-om2/bin/fms_ACCESS-OM_97e3429_libaccessom2_1bb8904.x | grep libmpi.so
libmpi.so.40 => /apps/openmpi-mofed4.7-pbs19.2/4.0.1/lib/libmpi.so.40 (0x000014a873abd000)
I can confirm it also runs for me but I do see the error messages in the output
Currently Loaded Modulefiles:
1) pbs 2) openmpi/4.1.4(default)
ERROR: Unable to locate a modulefile for 'openmpi-mofed4.7-pbs19.2/4.0.1'
ERROR: Unable to locate a modulefile for 'openmpi-mofed4.7-pbs19.2/4.0.1'
ERROR: Unable to locate a modulefile for 'openmpi-mofed4.7-pbs19.2/4.0.1'
Note that these are just messages, the model runs fine.
I think the real difference is the later commit has changes to the coupling remapping weights:
diff --git a/namcouple b/namcouple
index ec7dd51..5cebfbe 100644
--- a/namcouple
+++ b/namcouple
@@ -94,7 +94,7 @@ P 0 P 0
#
LOCTRANS MAPPING SCRIPR
INSTANT
-../INPUT/rmp_jra55_cice_conserve.nc dst
+../INPUT/rmp_jra55_cice_1st_conserve.nc dst
CONSERV LR SCALAR LATLON 10 FRACNNEI FIRST
#########
# Field 02 : lwflx down
@@ -105,7 +105,7 @@ P 0 P 0
#
LOCTRANS MAPPING SCRIPR
INSTANT
-../INPUT/rmp_jra55_cice_conserve.nc dst
+../INPUT/rmp_jra55_cice_1st_conserve.nc dst
CONSERV LR SCALAR LATLON 10 FRACNNEI FIRST
##########
# Field 03 : rainfall
@@ -116,7 +116,7 @@ P 0 P 0
#
LOCTRANS MAPPING SCRIPR
INSTANT
-../INPUT/rmp_jra55_cice_conserve.nc dst
+../INPUT/rmp_jra55_cice_1st_conserve.nc dst
CONSERV LR SCALAR LATLON 10 FRACNNEI FIRST
##########
# Field 04 : snowfall
@@ -127,7 +127,7 @@ P 0 P 0
#
LOCTRANS MAPPING SCRIPR
INSTANT
-../INPUT/rmp_jra55_cice_conserve.nc dst
+../INPUT/rmp_jra55_cice_1st_conserve.nc dst
CONSERV LR SCALAR LATLON 10 FRACNNEI FIRST
##########
# Field 05 : surface pressure
@@ -138,7 +138,7 @@ P 0 P 0
#
LOCTRANS MAPPING SCRIPR
INSTANT
-../INPUT/rmp_jra55_cice_smooth.nc dst
+../INPUT/rmp_jra55_cice_patch.nc dst
CONSERV LR SCALAR LATLON 10 FRACNNEI FIRST
##########
# Field 06 : runoff. Runoff is passed on the destination grid.
@@ -158,7 +158,7 @@ P 0 P 0
#
LOCTRANS MAPPING SCRIPR
INSTANT
-../INPUT/rmp_jra55_cice_smooth.nc dst
+../INPUT/rmp_jra55_cice_patch.nc dst
CONSERV LR SCALAR LATLON 10 FRACNNEI FIRST
##########
# Field 08 : 2m air humidity
@@ -169,7 +169,7 @@ P 0 P 0
#
LOCTRANS MAPPING SCRIPR
INSTANT
-../INPUT/rmp_jra55_cice_smooth.nc dst
+../INPUT/rmp_jra55_cice_patch.nc dst
CONSERV LR SCALAR LATLON 10 FRACNNEI FIRST
##########
# Field 09 : 10m wind (u)
@@ -179,7 +179,7 @@ jrat cict LAG=0 SEQ=+1
P 0 P 0
#
MAPPING SCRIPR
-../INPUT/rmp_jra55_cice_smooth.nc dst
+../INPUT/rmp_jra55_cice_patch.nc dst
DISTWGT LR VECTOR LATLON 10 4 vwnd_ai
##########
# Field 10 : 10m wind (v)
@@ -189,7 +189,7 @@ jrat cict LAG=0 SEQ=+1
P 0 P 0
#
MAPPING SCRIPR
-../INPUT/rmp_jra55_cice_smooth.nc dst
+../INPUT/rmp_jra55_cice_patch.nc dst
DISTWGT LR VECTOR LATLON 10 4 uwnd_ai
#############################################################################
I’m doing a run now from restart250
, but I would be very surprised if this is bitwise identical with @rmholmes’ control experiment at this point using the modified remapping weights.
This means a perturbation experiment forked at this point utilising the later configuration is not a clean comparison, as there likely will be differences due to the changed configuration.