Regional Ancillary Suite no longer running (again)

Hi all.

This appears to the same problem that afflicted Sonya Fiddes back in August - see Regional Ancillary Suite no longer running

I checked out u-dg767 yesterday and it ran in the morning. I made some changes to the domain location in the afternoon, and hasn’t worked since.

The error is a repeat of Sonya’s error back in August)

/g/data/access/TIDS/UM/ancil/atmos/master/vegetation/cover/cci/v3/vegetation_fraction.nc does not exist.

These files no longer exist. Only v1 does.

$ ls -lt /g/data/access/TIDS/UM/ancil/atmos/master/vegetation/cover/cci/
total 4
drwxr-xr-x+ 2 mrd599 access.admin 4096 Feb 26  2020 v1

The request for v3 is provided in app/ancil_lct/rose-app.conf

[env]
source=${ANCIL_MASTER}/vegetation/cover/cci/v3/vegetation_fraction.nc

Is it possible that I deleted the files in /g/data/access/ ?

Cheers,

Paul

Not sure how the files got deleted, but trying the rsync from JASMIN again at the moment.

Thanks Martin.

The files have returned.

 ls -lt /g/data/access/TIDS/UM/ancil/atmos/master/vegetation/cover/cci/
total 12
drwxrwxr-x+ 2 mrd599 p66          4096 Apr  4  2020 v3
drwxrwxr-x+ 2 mrd599 access.admin 4096 Mar  5  2020 v1
drwxrwxr-x+ 2 mrd599 p66          4096 Sep 29  2018 v2

EDIT. Spoke too soon. The directories have synced but the files haven’t yet.

$ ls -lt /g/data/access/TIDS/UM/ancil/atmos/master/vegetation/cover/cci/v3/
total 0

@MartinDix Any update on the rsync status? I was hoping by this morning the files would have synced, but there is still nothing in /g/data/access/TIDS/UM/ancil/atmos/master/vegetation/cover/cci/v3/

All the master ancillary files should be restored now.

Wonderfull! Thanks very much Martin.

Hi Martin,
Thanks for your work on this! I’m unfortunately seeing similar issues in: /g/data/access/TIDS/UM/ancil/data/parameters/
(i.e. data rather than atmos/master)

Emma

Hi Emma

I can confirm the same issue. The job ancil_cap_vegfrac fails with

forrtl: severe (29): file not found, unit 38, file /scratch/gb02/pag548/cylc-run/u-dg767/work/1/Gippsland_era5_ancil_cap_vegfrac/fort.38

The underlying link to v3 TIDS ancillary data directory is missing.

$ ls -l /g/data/access/TIDS/UM/ancil/data/parameters/IGBP_to_MOSES/latest

lrwxrwxrwx 1 mrd599 access.admin 2 Dec 14 2015 /g/data/access/TIDS/UM/ancil/data/parameters/IGBP_to_MOSES/latest -> **v3**

I’m interested in why the install_cold task succeeds if this data is missing.

I will run through the task tomorrow to try and find out why.

This should be fixed now.

Still a mystery how these files got deleted but I’m changing the permissions so it shouldn’t happen again

Hi Martin - sorry, one more that seems to be missing now?
We need these ones for the global driving model…

/g/data/access/projects/access/umdir/ancil/atmos/n216e/orca025/land_sea_mask/etop01/v3/

It’s a real mystery! Hopefully once the permissions are changed it won’t happen again!

Some of the n216 files went missing but no other resolutions :confused:

Downloading now

Should be complete now.

Jasmin is now recommending globus for data transfers JASMIN Help Site - Migration to Rocky Linux 9 2024. I wonder if this could be used for the tids data sync, not knowing too much about how globus works

BUMP

I’m running some RAS tasks again and ancil_lct fails with:

 line 118: /home/548/pag548/cylc-run/rCM3-ancil-suite/share/data/etc/ancil_master_ants/
/vegetation/cover/cci/v3/vegetation_fraction.nc: Permission denied

These are the permissions in my local folder.

 ls -l share/data/etc/ancil_master_ants/vegetation/cover/cci/v3/
total 8203860
-rw-r--r--+ 1 mrd599 access.admin     285917 Aug 26  2025 c4_percent_1d.nc
-rw-r--r--+ 1 mrd599 access.admin       1985 Aug 26  2025 README.md
-rw-r--r--+ 1 mrd599 access.admin 8400436749 Aug 26  2025 vegetation_fraction.nc
-rw-r--r--+ 1 mrd599 access.admin        549 Aug 26  2025 vegetation_fraction.nc.attribution
-rw-r--r--+ 1 mrd599 access.admin        512 Aug 26  2025 vegetation_fraction.nc.license
-rw-r--r--+ 1 mrd599 access.admin         90 Aug 26  2025 vegetation_fraction.nc.restrictions

@Martin - can you try an rsync again, or whatever you need to do, to fix the permissions again?

Thanks.

Nothing should have changed with the permissions here. They should be readable by anyone in the access group.

Can you read ~access/umdir/ancil/atmos/master/vegetation/cover/cci/v3/README.md

Hi Martin. Thanks for the prompt reply.

Yes I can read that README file.

Ok this is weird. I’ve diffed the job scripts b/w

rCM3-test-UM-ancil/log/job/1/Lismore_era5_ancil_lct/NN/job

and the failed job

 rCM3-ancil-suite/log/job/1/Lismore_era5_ancil_lct/NN/job 

and I can’t see any structural differences. I’ll keep digging, I may have broken something else in the task, somewhere else in the rose/cylc environment.

Hi @Paul.Gregory,

Are the jobs still loading +gdata/access??

When files that exist can’t be read it is usually a disk loading issue …

Apologies if you have already checked that.

Yeah that was my first hunch too.

But the storage flags in rCM3-ancil-suite are identical to the defaults:

#PBS -l storage=gdata/access+gdata/hr22+gdata/ki32+scratch/access+gdata/gb02

I’ll keep digging.

Have you retried in case it was a transient disk issue?

Yes. The error repeated.

I’ve traced the source of the error to an upstream task. The permissions error on the master vegetation file is a red herring. The issue is a missing grid.nl file.

This was caused by me accidentally removing

rg01_std_model="custom"

from the rose-suite.conf file. The suite then assumes you are using a pre-defined grid located and swaps the ancil_lct_top task for an ancil_lct_nocustom task which attempts to link grid.nl to a specified location.

If you insert this line back into the rose-suite.conf file, the ancil_lct_top file will place the grid.nl file in the correct directory and the ancil_lct task will proceed as normal.

Why the ancil_lct.py file complains about permissions on the master vegetation file, when the missinggrid.nl is the actual issue, is a curious question.