ACCESS-CM2 suite-run error

Hi there,

I followed the instructions at this link (Run ACCESS-CM - ACCESS-Hive Docs) to set up and run ACCESS-CM2 using suite u-dh101 (in my case the name is u-dl308). The model have been successfully build. But I encountered the following error in “coupled” step.

/local/spool/pbs/mom_priv/jobs/129736469.gadi-pbs.SC: line 141: /home/561/hl1052/cylc-run/u-dl308/share/data/etc/um_ancils_gl: No such file or directory
2024-11-29T00:06:56Z CRITICAL - failed/EXIT

I checked the suite settings in “install_ancil – file – $ROSE_DATA/etc/um_ancils_gl”, the source is “/projects/access/data/ancil/access_cm2_n96e/O1/ancils_GA7.1_PD”.
But I’m not sure what I need to do to fix it. I would greatly appreciate any guidance on how to resolve this issue.
Thank you in advance!

I’m new to ACCESS-CM2. I’m also struggling with which suite should I choose for my research. Is there any document or instruction that cover all the suite name list with descriptions? I have noticed in the post ACCESS-CM2 model suite names, the old website Log In - Confluence is not working anymore. Is there any new website display those information? Thank you in advance.

Thanks,
Huazhen

Hi @huazhen, I’m on triage today and will take a look into your issue, routing it to the appropriate team member if needed.

In the interim, oftentimes an error which leads with “/local/spool/pbs/mom_priv/jobs/129736469.gadi-pbs.SC” is symptomatic of transient HPC errors that may just be the result of being assigned to a node that is under stress - but I do not think that is your issue here.

While we work on a solution, I would suggest re-running to see if any further logging becomes available.

Cheers, Ben

@huazhen, it is not immediately apparent that there is anything awry with the ancil file which I can access. It is possible that one of the paths in there is causing the problem.

The first port of call would be to ensure that you are indeed adding gdata/access to your job storage directives. Otherwise, a path to an accessible PBS out or error file for me to look at would be most useful in this situation.

Thank you for your suggestions. I just re-ran the model, and the same error message appeared during the “coupled” step. I have double checked that gdata/access have been added to my VDI desktop storage.
The path to the error file is /scratch/m35/hl1052/cylc-run/u-dl308/log.20241129T044952Z/job/09900101/coupled/01/

Hi @huazhen, could you please open up read permissions on that file/directory. I do not have access.

Hi Ben,

I realise that you will have to be a member of project m35 to access those files. I could not change project permissions, so I copied files to the following path
/scratch/public/hl1052/log.20241129T044952Z/job/09900101/coupled/01

Please let me know if it still not working.
Thanks,
Huazhen

1 Like

Yes I can access it now, thank you.

It looks like there may be an issue with configuration and I have spoken to my colleague @spencerwong, who has some ideas as to why you are seeing this error. I will reassign your ticket over to him for further resolution.

Cheers, Ben

Hi @huazhen,

I have had a go at running the suite u-dl308 and have been able to reproduce the error where the model fails to find the um_ancils_gl file.

My understanding is that the um_ancils_gl file provides the model with paths to various ancillary files. This file is copied to the .../share/data/etc/um_ancils_gl location by the install_ancil task, which is usually run just before the coupled task.

The Cylc dependency graph near the beginning of the suite.rc file controls when the different tasks are run. In the suite u-dl308, the dependency graph contains the following:

{% if RECON %}
install_ancil => recon => coupled
{% endif %}
{% if UPDATE_SST %} install_ancil => update_sst => coupled {% endif %}

Here, install_ancil appears in the graph inside the {% if RECON %} and {% if UPDATE_SST %} if statements. This means that the install_ancil step will only run when either RECON=true or UPDATE_SST=true, which are controlled in the rose-suite.conf configuration file.

In your copy of the suite, I believe UPDATE_SST=false (and the same for RECON), meaning that the install_ancil step is skipped, leading to the No such file or directory error. To run the suite with these settings, you can modify the Cylc dependency graph in the suite.rc file. Replacing the earlier section of the graph with:

{% if RECON %}
install_ancil => recon => coupled
{% endif %}
{% if UPDATE_SST %} install_ancil => update_sst => coupled {% endif %}
install_ancil => coupled

will ensure that install_ancil is alway run before coupled regardless of the settings.

Let me know if this helps to run the simulation!

Cheers,
Spencer

Hi Spencer @spencerwong ,

Thank you so much for your help. Yes, you are right. I have double checked that I have set RECON=false and UPDATE_SST=false in my case. I followed your suggestions, and it worked with install_ancil problem. But I got a new error message during “coupled” step. I have copied error files to path /scratch/public/hl1052/log.20241203T062225Z/job/09900101/coupled/01. I think the following error message is the critical problem for now. It looks like there still have some issue related to UPDATE_SST. Do you have any ideas on how to fix it? Thank you in advance!

FATAL from PE     1: ==>ocean_sbc_mod: temp_restore_tscale > 0.0 but cannot find INPUT/temp_sfc_restore.nc

Cheers,
Huazhen