Test Case Coupled Step Error with ACCESS-CM2

Hi all,

I’m trying to run the u-cy339 tutorial test case for ACCESS-CM2 for the first time. I ran into this error in the coupled step that I’m unfamiliar with (job.err output):

????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 19
?  Error from routine: CHECK_IOSTAT
?  Error message:
?        Error reading namelist domain
?        IoMsg: invalid reference to variable in NAMELIST input, unit 14, file /scratch/ng72/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/STASHC, line 2565, position 10
?        Please check input list against code.
?  Error from processor: 0
?  Error number: 10
????????????????????????????????????????????????????????????????????????????????

[0] exceptions: An non-exception application exit occured.
[0] exceptions: whilst in a serial region
[0] exceptions: Task had pid=2950149 on host gadi-cpu-clx-1947.gadi.nci.org.au
[0] exceptions: Program is "/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/toyatm"
Warning in umPrintMgr: umPrintExceptionHandler : Handler Invoked
gc_abort (Processor     0): Job aborted from ereport.
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 9.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.

Is there an invalid input in the UM coupling configuration?

Here is the start of the job.err output if that helps:

Using the cylc session localhost

Loading cylc7/24.03
  Loading requirement: mosrs-setup/2.0.1
- Package -----------------------------.- Versions --------.- Last mod. -------
Currently Loaded Modulefiles:
mosrs-setup/2.0.1                       default             2024/05/19 23:31:13
cylc7/24.03                             default             2024/05/06 03:30:21
python2-as-python                                           2019/11/04 03:02:35
openmpi/4.0.2                                               2022/02/14 19:20:11
fcm/2019.09.0                                               2020/12/14 04:10:00
pythonlib/f90nml/1.0.2                                      2020/12/14 04:15:56
[WARN] file:/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/IDEALISE: skip missing optional source: namelist:idealise
[WARN] file:/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/STASHC: skip missing optional source: namelist:exclude_package(:)
[WARN] file:/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/RECONA: skip missing optional source: namelist:trans(:)
[WARN] file:/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/IOSCNTL: skip missing optional source: namelist:lustre_control
[WARN] file:/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/IOSCNTL: skip missing optional source: namelist:lustre_control_custom_files
+ echo 'ACCESS COUPLED MODEL DRIVER,' /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled
+ [[ -z oasis3_mct ]]
+ [[ -z /home/272/jt0319/cylc-run/u-cy339/share/mom/exec/access-cm2/ACCESS-CM/fms_ACCESS-CM.x ]]
+ ATMOS_LINK=toyatm
+ OCEAN_LINK=mom5xx
+ ICE_LINK=cicexx
+ ln -sf /home/272/jt0319/cylc-run/u-cy339/share/fcm_make_um/build-atmos/bin/um-atmos.exe toyatm
+ ln -sf /home/272/jt0319/cylc-run/u-cy339/share/mom/exec/access-cm2/ACCESS-CM/fms_ACCESS-CM.x mom5xx
+ ln -sf /home/272/jt0319/cylc-run/u-cy339/share/cice/bin/cice5.exe cicexx
+ ATMOS_EXEC=toyatm
+ OCEAN_EXEC=mom5xx
+ export UM_NPES=576
+ UM_NPES=576
+ NPROC_MAX=576
+ export OCN_NPES=80
+ OCN_NPES=80
+ TOT_NPES=672
+ HIST_FILE=cy339.xhist
+ fix_cice_namelist.py /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR/cice_in.nml /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR/input_ice.nml
+ [[ 0 != 0 ]]
+ fix_mom_namelist.py /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR/input.nml
+ [[ 0 != 0 ]]
+ chmod u+w /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/CPL_RUNDIR/namcouple
+ fix_namcouple.py /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/CPL_RUNDIR/namcouple
+ [[ 0 != 0 ]]
+ chmod u+w /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR/INPUT/diag_table
+ fix_diag_table.py /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR/INPUT/diag_table
+ [[ 0 != 0 ]]
+ mkdir -p /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR/RESTART /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR/HISTORY
+ mkdir -p /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR/RESTART /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR/HISTORY
+ [[ 09500101 != 09500101 ]]
+ [[ true == \t\r\u\e ]]
+ [[ true == \f\a\l\s\e ]]
+ [[ true == \t\r\u\e ]]
+ [[ 09500101 == 09500101 ]]
+ echo 'Setting CONTINUE=false for WARM_RESTART_RUN'
+ export CONTINUE=false
+ CONTINUE=false
+ create_rankfile.py
+ export 'ACCESSRUNCMD=--rankfile /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/rankfile                      -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR -n 576 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/toyatm :                      -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR -n 80 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/mom5xx :                      -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR -n 16 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/cicexx'
+ ACCESSRUNCMD='--rankfile /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/rankfile                      -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR -n 576 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/toyatm :                      -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR -n 80 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/mom5xx :                      -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR -n 16 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/cicexx'
+ echo RUNCOMMAND, --rankfile /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/rankfile -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR -n 576 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/toyatm : -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR -n 80 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/mom5xx : -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR -n 16 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/cicexx
+ cd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR
+ [[ -z 10.6 ]]
+ [[ ! 10.6 =~ ^([0-9])+\.([0-9])+$ ]]
+ export DR_HOOK=false
+ DR_HOOK=false
+ export DR_HOOK_OPT=noself
+ DR_HOOK_OPT=noself
+ export PRINT_STATUS=PrStatus_Normal
+ PRINT_STATUS=PrStatus_Normal
+ export UM_THREAD_LEVEL=MULTIPLE
+ UM_THREAD_LEVEL=MULTIPLE
+ export HISTORY=/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/History_Data/cy339.xhist
+ HISTORY=/home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/History_Data/cy339.xhist
+ [[ false == \f\a\l\s\e ]]
+ rm -f /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR/History_Data/cy339.xhist
+ export HISTORY_TEMP=thist
+ HISTORY_TEMP=thist
+ export UM_NPES=576
+ UM_NPES=576
+ export NPROC=576
+ NPROC=576
+ export HOUSEKEEP=hkfile
+ HOUSEKEEP=hkfile
+ export STASHC=STASHC
+ STASHC=STASHC
+ export ATMOSCNTL=ATMOSCNTL
+ ATMOSCNTL=ATMOSCNTL
+ export SHARED_NLIST=SHARED
+ SHARED_NLIST=SHARED
+ export ERROR_FLAG=errflag
+ ERROR_FLAG=errflag
+ export STASHMASTER=..
+ STASHMASTER=..
+ export IDEALISE=IDEALISE
+ IDEALISE=IDEALISE
+ export IOSCNTL=IOSCNTL
+ IOSCNTL=IOSCNTL
+ export STDOUT_FILE=pe_output/cy339.fort6.pe
+ STDOUT_FILE=pe_output/cy339.fort6.pe
++ dirname pe_output/cy339.fort6.pe
+ mkdir -p pe_output
+ rm -f pe_output/cy339.fort6.pe0 pe_output/cy339.fort6.pe000
+ SIGNALS=EXIT
+ for S in $SIGNALS
+ trap FINALLY EXIT
+ export OMPI_MCA_hwloc_base_mem_alloc_policy=local_only
+ OMPI_MCA_hwloc_base_mem_alloc_policy=local_only
+ export OMPI_MCA_rmaps_base_mapping_policy=
+ OMPI_MCA_rmaps_base_mapping_policy=
+ mpirun --rankfile /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/rankfile -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ATM_RUNDIR -n 576 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/toyatm : -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/OCN_RUNDIR -n 80 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/mom5xx : -wd /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/ICE_RUNDIR -n 16 /home/272/jt0319/cylc-run/u-cy339/work/09500101/coupled/cicexx
oasis_init_comp: Calling MPI_Init

Cheers,

Joel

What does this file look like around this line number?

Here is what I found (I’ve added line numbers to the left side). Line 2565 that’s causing issues is

l_spml_ts=.false.

Here is it in context:

2557/
2558&domain
2559dom_name='PLEV19',
2560imn=0,
2561imsk=1,
2562iopa=1,
2563iopl=3,
2564iwt=0,
2565l_spml_ts=.false.,
2566plt=0,
2567rlevlst=1000.000,925.000,850.000,700.000,600.000,500.000,400.000,
2568300.000,250.000,200.000,150.000,100.000,70.000,50.000,
256930.000,20.000,10.000,5.0,1.0,
2570spml_bot=0,
2571spml_ew=0,
2572spml_ns=0,
2573spml_top=0,
2574ts=.false.,
2575/

Nothing obviously wrong there - the variable’s present in the modern UM at um/src/control/top_level/rdbasis.F90#L155. Probably will need someone familiar with CM2.

1 Like

Updating to say I refreshed the suite and didn’t click ‘auto-fix configurations’ this time which enabled it to run. The auto-fix seemed to have made some unnecessary changes. Thanks

1 Like

Good to know @Jatreutlein . I was just about to say I had run the test case successfully for more than a month and didn’t get any error.

1 Like