I’m trying to extend the ACCESS-OM2 0.25 deg RYF run that’s been run for 600+ years, with outputs and restarts present in /g/data/ik11/outputs/access-om2-025/025deg_jra55_ryf9091_gadi
I am using the configuration files in /g/data/ik11/configs/access-om2-025/025deg_jra55_ryf9091_gadi. I had to make one change in the config.yaml file: instead of /g/data4/ik11/inputs/access-om2/input_rc/, I am using /g/data4/ik11/inputs/access-om2/input_rc-DELETE/. Everything else is exactly the same.
This updated config version is kept in /home/156/db6174/access-om2/025deg_jra55_ryf_control-test for anyone to have a look. This config gives the following error:
Currently Loaded Modulefiles:
pbs 2) openmpi/4.1.4(default)
ERROR: Unable to locate a modulefile for 'openmpi-mofed4.7-pbs19.2/4.0.1'
ERROR: Unable to locate a modulefile for 'openmpi-mofed4.7-pbs19.2/4.0.1'
ERROR: Unable to locate a modulefile for 'openmpi-mofed4.7-pbs19.2/4.0.1'
payu: Model exited with error code 1; aborting.
My guess is that since this config is 4-5 years old, module versions have been updated, which this config is unable to load. Has anyone encountered this error before or have suggestions?
@Dhruv_Bhagtani I’m not sure the configuration files in /g/data/ik11/configs/access-om2-025/025deg_jra55_ryf9091_gadi are kept up to date.
On the ACCESS-OM2 control runs hive forum post it says the configuration used is this one. That doesn’t seem to refer to input_rc, rather referring to input_20200530. Can you have a go with those configs?
Hi @Dhruv_Bhagtani if you want to extend the run you should clone the final configuration of the previous run so that everything is the same. You could use the ryf9091_gadi branch from here GitHub - rmholmes/025deg_jra55_ryf at ryf9091_gadi or @rmholmes’s control directory on Gadi if that’s still around. To be really sure, it’s a good idea to re-do the final run and check that the restarts in the manifests in your new run are the same as before. See Tutorials · COSIMA/access-om2 Wiki · GitHub
@rmholmes I’m having a similar issue with this module error - in my case, I’m trying for a warm start to do some perturbation experiments. The intention is to start at restart250, and I’ve updated the configs to point at the input_20200530 as per your suggestion. The manifests show the correct filepaths as far as I can tell, but the job is still returning the same module error that @Dhruv_Bhagtani was originally finding. What else can I try?
I’ve never encountered that module issue before. I wonder if there are other differences between your configuration files and the ones used for the original run? I would suggest doing a comparison. By working forwards from the original configuration files I’ve linked above (which seem to have worked for @Dhruv_Bhagtani ) you might be able to narrow down the issue? Let me know if that doesn’t help and I can look into it in more detail.