OAS and RAS suites stalling for ~1 day on Gadi and then failing

Yes I already did that at the beginning. So my same persistent session is up and running.

I already got that heads up from @Paul.Gregory through 21st Century Weather.

OK. Yes well done for following @Paul.Gregory’s heads up.

Have you tried running the suite before? I would perhaps go in and do a rose suite-clean.

@Moulik see Qsub: cannot connect to server - #3 by Scott

Your problem might be related.

Thanks!

Yes I did try. But today itself I first clean the suite and then re-run.

Okay, then I will wait and watch. Thanks @cbengel

But just FYI, on that same RAS GUI there were other tasks as well (e.g., ANCIL_ANTS), and those were run successfully through Gadi PBS job submission.

Given the timing and what is going on it is likely to be a Gadi’ type issue. Otherwise, to get feedback on that specific problem you would need to post the contents of the file that can’t be found

It might be an intermittent problem. I would wait and try again. If the problem persists then reach out tomorrow.

Okay, got it! Thanks @cbengel

Hi @cbengel

This time I got this error when trying to run the RAS suite (install_ugants task):

Using the cylc session mlftcr.mm6452.if69.ps.gadi.nci.org.au

Loading cylc7/24.03
Loading requirement: mosrs-setup/2.0.1
[FAIL] [Errno 2] No such file or directory: ‘/home/581/mm6452/cylc-run/u-bu503/share’
mkdir: cannot create directory ‘/home/581/mm6452/cylc-run/u-bu503’: File exists
mkdir: cannot create directory ‘/home/581/mm6452/cylc-run/u-bu503’: File exists
mkdir: cannot create directory ‘/home/581/mm6452/cylc-run/u-bu503’: File exists
2026-05-06T02:00:09Z CRITICAL - failed/ERR

do you have any idea what the problem actually is?

Hi Moulik.

Have you been able to run any of the rose/cylc tutorial suites?

e.g Introduction to Rose/Cylc — 21st Century Weather Software Wiki

or

If not. Can you please try and run those suites and check the perform as expected? This is just to rule out any underlying environment issues you might have.

Can you also confirm you’ve followed all the steps here?

Hi @Paul.Gregory thanks for sharing those. No, I didn’t go through these, sorry! I started from the model set-up directly as mentioned in the ACCESS website. I will go through those and get back to here. Thanks again.

This indicates a storage flag issue. Check where your cylc run is putting its files with

readlink -f /home/581/mm6452/cylc-run/u-bu503/share

Make sure whichever disk this points to (e.g. /scratch/ab12) is in the PBS storage flags (you can set the storage flags in rose-suite.conf using e.g. NCI_STORAGE=scratch/ab12+gdata/de34 or add them in site/nci-gadi/flow-adds.cylc)

Thanks @Scott great idea. Got me wondering – @Moulik are you trying to run ACCESS-rAM3 from not your default project? If so, it doesn’t work. Please reach out and we will tell you how to change your default project.

Hi @Scott thanks! May be that’s the fault - I will check and get back to here.

Re @cbengel - I believe I am running from my default project. As I am starting my VDI session with if69 - so it would be my default project, isn’t it? But could you please let me know how can I check which my default project is that my suites are taking?

@Moulik There is a better way but I can’t remember the command. You can run env | grep PROJECT

The reason why I asked the question before is that /scratch/$PROJECT is added into the suite by default and that is where the data should be written out. Anyway, please follow @Scott 's advice.

Default project is set in ~/.config/gadi-login.conf

FYI, this is how my rose-suite.conf looks like:

Just a quick question - do I need to add gdata/hr22 as well in the NCI_STORAGE?

Everything the workflow needs apart from your input data/working project should already be getting added in the site file

Okay! So that’s not a problem then.