ACCESS-rAM3 Beta Feedback

About

This topic is a catch-all location for feedback for the ACCESS-rAM3 Beta Release.

Please reply to this topic if you have feedback on the ACCESS-rAM3 Beta. Feedback can be to point out problems encountered, or positive to highlight what worked well.

If your feedback is involved and will require specific help feel free to
make a topic on the ACCESS-Hive Forum.

If you’re not sure, reply here and your query can be moved to a separate topic if required.

Thank you NRI for this release, it is already very useful. One of the known issues in the alpha release is:

Large amounts of forcing data are created.

One potential solution is to delete initial conditions (ics), lateral boundaries (lbcs/cb) on the fly using the inbuilt housekeep app. In my free-running branch I’ve included the following lines to remove ics/cb files:

=$HK_CYCLE:'*/*/*/ics/*'
=$HK_CYCLE:'*/*/*/lbcs/*_alabc_*'
=$HK_CYCLE:'*/*/ec_cb*'

These files are then deleted at the successful completion of any individual cycle within the simulation. In my test this reduced daily scratch storage from 200 to 30 GB, without removing any actual um outputs.

You can see my branch changeset on MOSRS here.

2 Likes

Sometimes I need to redefine which project the suite will use for a) compute b) scratch.

For a) my solution to date has been to change my default $PROJECT definition in ~/.config/gadi-login.conf, re-login and run.

Then for b) (with thanks to Dale Roberts post here) I have included the following at the top of my rose-suite.conf

root-dir=*=/scratch/$PROJECT/$USER

Where $PROJECT can be updated as required, or left as default.

Perhaps there are cleaner ways to do this through the GUI?

2 Likes

Note there is also a command switchproj that will create a new shell with a modified $PROJECT

$ switchproj -h
switchproj: run a copy of your shell or a command after 
changing the effective group id
Usage: switchproj [-h] [-a <argv0>] [-l] <group> [<command> [args ...]]
<group> may be a group name or gid number.
<group> must be in the user's list of groups.
Options:
  -a <argv0>
    Set argv[0] to <argv0>. If this option is specified and <command> is not
    present, its value will be ignored
  -l
    Prepend argv[0] of the command to run with a hyphen
  Note that if both -a and -l are used, they must be in the order shown above.
  -h
    Print this

Though I can’t find any documentation on the NCI Opus docs.

1 Like

Thank you @mlipson for your suggestion about the housekeeping command - I agree it would be useful for longer runs.

We decided to leave the intermediate working files in the directories for beginner users so they can trace the inputs and outputs being created by RNS jobs. This decision was made for learning purposes, but I can definitely see the merit for the files being routinely removed once these steps have been learnt.

We can discuss having different branches of the suite with the output routinely removed or left behind and investigate the possibility of adding a switch to the RNS.

Thanks again for your feedback.

1 Like

Quick question, it seems the beta release minimum run length is 24 hours. Is this correct? On the alpha I’ve been running 1 hour cases to test configurations to save SU.
“Support is provided for 24 hour run length. Minimum run length is 24 hours.”

If you are comfortable changing the run length then you’re welcome to do so.

The intention with that statement is to make it quite clear what we support users to do. We thought this was unlikely to be something many users wanted to do, and would have involved adding more complexity to the instructions, so on balance we decided not to include it in this initial release.

If this is a feature that many users want we could revisit that decision, but with the timeframes we have committed to and considering current resourcing constraints, this would not be included in the initial release.

1 Like

OK thanks! I have already tried running a 1 hour case but received the error
below, just checked and this occurs if I select
CYCLE_INT_HR to be either 1 or 3 hours but the error goes away if it is 6 or more hours. Just noting, please ignore on a Friday :slight_smile:

[FAIL] cylc validate -o /scratch/v46/cc6171/tmp/tmpUCb_ag --strict u-dg768 # return-code=1, stderr=
[FAIL] ERROR, trailing arrow: Lismore_d1000_GAL9_um_recon => Lismore_d0198_RAL3P2_um_recon =>

1 Like

This error I was having on 1 or 3 hour runs may be because: suite conf - Nesting Suite - General run options - CRUN_LEN = 6
Setting CRUN_LEN = 1 allows for runs less than 6 hours I think(?).

Chermelle is on leave otherwise I would defer to her. When we discussed supporting modifying the run length “chunks” Chermelle mentioned some complications that were non-trivial for inexperienced users, hence the decision to not support modifying it.

If this works for your testing use case, great!

1 Like

Hi @mlipson . In your changeset, why do you use $HK_CYCLE rather than -PT0H? Which cycle directory do you mean to refer to, the current one or (e.g.) the previous one?

The reason that I ask is that I need to understand the exact effects of this change so that I can test it.

Hi Paul, the reason to use the variable $HK_CYCLE is to support free-running as well as “normal” cycling modes. Free-running needs some files from the previous cycle for the next’s initial conditions, so the housekeep actions have to be one cycle behind ($HK_CYCLE… but I’m not sure what HK stands for! edit: ahhh HouseKeep)

HK_CYCLE is defined in the suite.rc file with:

    {% if FREE_RUN %}
        HK_CYCLE = $CYCLE_OFFSET
    {% else %}
        HK_CYCLE = -PT0H
    {% endif %}

Edit: Paul on reflection I might have applied this deletion delay to some of the variables unnecessarily, e.g. ICS shouldn’t need to be kept for the next cycle. So if you prefer to replace with -PT0H and that works, then please do so, thanks for spotting this.

@mlipson Thanks for the info. I am re-running the test of the changes on u-dg768 now.

@mlipson Could you please clarify what housekeeping can be done for the current cycle vs what needs to be done for the previous one? I am currently testing with

[pcl851@gadi-login-07 u-dg768]$ cat app/housekeep/opt/rose-app-nci-gadi.conf 
[prune]
prune{share/cycle}=-PT0H:'*/*/*_da*'
	              =-PT0H:'*/*/*/ics/*'
                  =$HK_CYCLE:'*/*/*/lbcs/*_alabc_*'
                  =$HK_CYCLE:'*/*/ec_cb*'
                  =$HK_CYCLE:'*/*/*/*/*_da*'

Hi Paul,

I think you might be able to do

prune{share/cycle}=-PT0H:'*/*/*_da*'
	              =-PT0H:'*/*/*/ics/*'
                  =-PT0H::'*/*/*/lbcs/*_alabc_*'
                  =$HK_CYCLE:'*/*/ec_cb*'
                  =$HK_CYCLE:'*/*/*/*/*_da*'

as the linkprev task says it only acts on the boundaries of the driving model, which in our case is ec. But you would need to test with free-running mode on.

The last line I left unchanged, that is as it is in the RAS.

Thanks. For the moment, I have left it as I described above. Let’s see what the test comes up with.

@mlipson The job failed in 20220227T0000Z/Lismore_d1000_GAL9_um_recon_cyclen with

ls: cannot access '/home/851/pcl851/cylc-run/u-dg768/share/cycle/20220226T0000Z/Lismore/d1000/GAL9/ics/*a_da*': No such file or directory
2025-04-15T09:20:22Z CRITICAL - failed/EXIT
1 Like

The project vk83 has been added to PBS storage flags, which is stopping people who are not a member of that project from running the workflow

2 Likes

Just wondering if it is OK to still be running the alpha version? I’m getting the following error when I try and run my old alpha suite. The beta works fine so may be a mistake on my part or something in the configuration.