Transferring UMUI experiments to gadi

spencerwong · 31 January 2024 02:46

Hi everyone,

I’ve been working to transfer some UMUI experiments from Accessdev to Gadi before Accessdev is decommissioned over the coming weeks, however have run into some issues after following the instructions. After some trial and error, the UMUI seems to go through the processing and submission steps and does not show any errors. Unfortunately the jobs don’t actually run, and instead the “umuisubmit_clr” script leaves behind the following errors:

Base build: failed

/local/spool/pbs/mom_priv/jobs/107267198.gadi-pbs.SC[10]: fcm: not found [No such file or directory]

The file that appears to cause the error in umuisubmit_clr ($UM_ROUTDIR/umbase/cfg/bld.cfg) exists when I follow the path, and so I’m unsure what’s causing the error.

I was just hoping to check whether anyone else has still been using the UMUI models (specifically version 7.3), and might have run into similar issues?

Many thanks,
Spencer

atteggiani · 31 January 2024 03:09

Hi @spencerwong,

I have not used UMUI on Gadi, but the error seems to be related to fcm not being found.

Is the fcm module loaded in your configuration?
It would also need to have gdata/h22 in the PBS storage directive for that to work.

Cheers
Davide

spencerwong · 1 February 2024 23:46

Hi @atteggiani,

Thank you for having a look into this! That’s a good point about fcm. I’ve been loading in fcm before opening up the UMUI, but the UMUI ssh’s into localhost when submitting the run and so I’m guessing that the modules don’t stay loaded when it does that.

Interestingly copies of umuisubmit_clr from runs I’ve submitted from ACCESSdev start of by loading in the modules and specifying the PBS directives:

#!/bin/ksh
#PBS -l walltime=7200
#PBS -l mem=4000MB
#PBS -l ncpus=2
#PBS -o /home/565/sw6175/um_output/vavxb000.vavxb.d23288.t162235.comp.leave
#PBS -j oe
#PBS -q normal
#PBS -N vavxb.compile
#PBS -P w40
#PBS -W umask=0022
#PBS -l software=intel-compiler
#PBS -l storage=scratch/access+gdata/access+scratch/w40+gdata/w40

# Modules for Gadi
source /etc/profile.d/modules.sh
module purge
module use ~access/modules
module load intel-compiler/2019.3.199
module load openmpi/4.0.1
module load gcom/7.0_ompi.4.0.1
module load fcm
module load netcdf
module load oasis3/dummy-access1

export CPATH=${NETCDF_ROOT}/include/Intel:$CPATH

mkdir -p $(dirname /home/565/sw6175/um_output/vavxb000.vavxb.d23288.t162235.comp.leave)
 
export RUNID=vavxb
export USERID=$USER
export UM_ROUTDIR=/scratch/$PROJECT/$USER/um_builds/$USERID/$RUNID
export UM_RDATADIR=/scratch/$PROJECT/$USER/umui/$RUNID
export UM_EXENAME=$RUNID.exe

# Base build
# ~~~~~~~~~~
fcm build -v 3 -f -j 2 $UM_ROUTDIR/umbase/cfg/bld.cfg
...

but the copies of the same script from the runs I’ve tried to submit from gadi don’t try to load in any of the modules and don’t set any of the PBS directives:

export RUNID=vavxb
export USERID=$USER
export UM_ROUTDIR=/scratch/$PROJECT/$USER/um_builds/$USERID/$RUNID
export UM_RDATADIR=/scratch/$PROJECT/$USER/umui/$RUNID
export UM_EXENAME=$RUNID.exe

# Base build
# ~~~~~~~~~~
fcm build -v 3 -f -j 2 $UM_ROUTDIR/umbase/cfg/bld.cfg

I’m not sure why the two versions of the scripts are different, but am wondering whether that would be causing the errors when submitting from gadi.

Many thanks,
Spencer

atteggiani · 2 February 2024 02:10

Hmm that is very odd indeed.

I am not completely aware of the workflow for submitting UMUI runs from Gadi.
There may be a pre-script where all the module loads are carried out.

However, to check if the problem is the fcm module (as it seems to be), I would add a couple of lines line at the beginning of the umuisubmit_clr script (after all the export lines), something like:

cat <<EOF
++++++++++++++++++++++++++++++
START FCM TEST
EOF

fcm --version
cat <<EOF
END FCM TEST
++++++++++++++++++++++++++++++

Then submit the script again, and check the output.
I think you will get the same error, but you will still get the output:

++++++++++++++++++++++++++++++
START FCM TEST

(Note you should check this output in the output log file, not in the error log file).

In that case, you can try removing the previously added lines and, instead, add a line where you import the fcm module:

module load fcm

In this case, if it still fails, it should be with a different error.
Let me know how it goes.

Cheers
Davide

spencerwong · 12 February 2024 06:06

Hi @atteggiani,

Thank you for your suggestions with this, and apologies in the delay in getting back to you!

I’ve done a bit more digging and have been able to get the model running. As you suggested, the issue was caused by the submission scripts not loading in fcm.

I wasn’t able to edit the umuisubmit_clr script directly, as the UMUI generates it and runs it all in one go when you submit a simulation. After some digging though, the difference between the gadi and accessdev versions of umuisubmit_clr appear come from differences in another script called SUBMIT. SUBMIT is generated when you click “process” in the UMUI, and it’s used later to generate the other submission scripts when you click “submit” in the UMUI.

The SUBMIT script from an Accessdev run includes all the PBS and module instructions to paste into umuisubmit_clr:

...
# Added to SUBMIT to give Gadi PBS info
  cat >$tmp_compile<<EOF
#!/bin/ksh
#PBS -l walltime=$COMPTLIM
#PBS -l mem=$CMEMORY
#PBS -l ncpus=$NPROC
#PBS -o $OUTPUT_FILE
#PBS -j oe
#PBS -q $QUEUE
#PBS -N ${RUNID}.compile
#PBS -P $PROJECT
#PBS -W umask=0022
#PBS -l software=intel-compiler
#PBS -l storage=scratch/access+gdata/access+scratch/$PROJECT+gdata/$PROJECT

# Modules for Gadi
source /etc/profile.d/modules.sh
module purge
module use ~access/modules
module load intel-compiler/2019.3.199
module load openmpi/4.0.1
module load gcom/7.0_ompi.4.0.1
module load fcm
module load netcdf
module load oasis3/dummy-access1

export CPATH=\${NETCDF_ROOT}/include/Intel:\$CPATH

mkdir -p \$(dirname $OUTPUT_FILE)
EOF

but they are missing in the version from gadi.

There’s a file located on Accessdev at /projects/access/gadi/inserts/um7.3/compile.pbs which contains these instructions, and so I’m wondering whether it’s just that the UMUI on gadi doesn’t look for or find it. I’ve attached it below for future reference.

Anyway, the workaround that ended up being ok was to replace the gadi SUBMIT script with the accessdev one (attached) and adjusting its experiment ID settings in between the processing and submission steps. I’m surprised it actually worked but am glad it did!

Cheers,
Spencer

compile.pbs.txt (710 Bytes)
SUBMIT_vavxb_accessdev.txt (10.5 KB)

atteggiani · 13 February 2024 03:37

Hi @spencerwong,

For the future, you can still edit the umuisubmit_clr script (created by UMUI), and then submit it by running:

qsub ~/path/to/the/umuisubmit_clr

This way, you don’t necessarily have to set up your experiment from UMUI (for testing for example).

Yes, all umuisubmit scripts are created from the SUBMIT script (you could even run the SUBMIT script yourself to generate the umuisubmit scripts)

Anyway, I am happy to know you found a solution for your experiment.
Also know that there is the option to add top/bottom scripts run at the beginning/end of your experiment run.
These could be used, for example, to load any modules you require in your experiment.

You can add these scripts in UMUI > edit experiment > Input/Output Control and Resources > Scripts Inserts and Modifications
In the panel click “Using bottom and top script inserts” and you will be able to insert the directory and path to your top/bottom scripts.

Cheers
Davide

Topic		Replies	Views
ACCESS-ESM Source Repository Earth System Model	9	293	4 April 2023
Submit failed on rose stem test for Jules V7.1 Land Surface land , jules	2	301	15 December 2022
Email notifications for failed ACCESS-CM2 jobs Coupled Model access-cm2	1	29	11 June 2025
Accessdev is down? Infrastructure	11	200	12 September 2024
Payu can't find deployed modules General help	3	18	25 February 2025

Transferring UMUI experiments to gadi

Related topics