Issues when changing domain in ACCESS-rAM3

Hi,

I’m new to running ACCESS-rAM3, and encouter some issues when I changed the domain.

The domain is set as follows:
CENTRE=[-35,130]
ERA_RES=[0.11,0.11]
ERA_SIZE=[320,420]
d1100_RES=ERA_RES
d1100_SIZE=[300,400]

The RAS works fine, generating some ancillary files. I used Check_UM_ancillaries from UM_configuration_tools/notebooks at main · 21centuryweather/UM_configuration_tools · GitHub to check the ancillaries; they seem to be fine (with output of INFO : Your ancillaries appear to be okay over land)

The issue occurs when running RNS:

I was also struggling to find job.err for this task.

Can you help me with this?

Thanks in advance!

Hi @chenhui.jin,

If you click the little triangle on the left side, it will expand the list and show you which specific job failed. You can right click on that specific job and view the job.err file.

1 Like

Fantastic! Thanks @reyhan.respati! I managed to find the job.err for the task.

I managed to find the job.err.

The last two lines of `job.err’

[FAIL] um-atmos # return-code=9.
2026-02-19T05:59:27Z CRITICAL - failed/EXIT

job.err is attached below
job.err.txt (491.6 KB)

Hi @chenhui.jin

The error is given here in job.err

????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
???!!!???!!!???!!!???!!!???!!!       ERROR        ???!!!???!!!???!!!???!!!???!!!
?  Error code: 4
?  Error code: 4
?  Error from routine: DECOMPOSE_FULL
?  Error from routine: DECOMPOSE_FULL
?  Error message: Too many processors in the North-South direction.The maximum permitted is 16
?  Error message: Too many processors in the North-South direction.The maximum permitted is 16
?  Error from processor: 15
?  Error from processor: 272
?  Error number: 16
?  Error number: 16
????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????

Can you tell me how many processors you are attempting to run with?

Have a look in ~/roses/u-by395/rose-suite.conf and search for rg01_rs01_m01_nproc i.e. the number of processors allocated to region 1, resolution 1, model 1.

In the default setup these values are

rg01_rs01_m01_nproc=18,16

What are yours set to?

Hi @Paul.Gregory,

Mine is:

Ok your domain is slightly smaller than the default Lismore outer domain which is 450x450.

Try reducing your number of processors, say 16x12. So set

rg01_rs01_m01_nproc=16,12

in rose-suite.conf. Or you can use rose edit and change the values in the GUI.

Remember to reload your suite. You will have to run the _GAL9_LBCS and _GAL9_um_recon tasks again with the new decomposition before running the forecast tasks again.

Generally when running on the normal compute nodes on gadi (which have 48 cores) we want the total number of cores to be a multiple of 48.

When the UM decomposes your grid across multiple cores, you need to reserve a certain number of grid rows for the halo - the points that are shared between each sub-domain. The UM has limits on the numbers of points in the halo region and the sub-domain itself.

1 Like

Thanks, reducing the number of processors fixed the problem! I noticed there is some discontinuity in the output. I assume that corresponds to the domain of d1100?
wind_gust_20160927T00-20160928T00

For curiosity, if my domain is very large (say CENTRE=[-42.5,92.5]; ERA_RES=[0.2,0.2]; ERA_SIZE=[196,646]; d2000_RES=ERA_RES; d20000_SIZE=[176,626])


Should I keep the number of processors as 18x16, or increase it?

1 Like

There are slight ‘edge effects’ at the edges of your domain - that is normal.

Yes you should increase the number of cores if the domain becomes larger.

Sometimes the UM doesn’t like very rectangular domains and it prefers the regional models to be ‘squarer’. But you might be able that domain running.

For larger domain sizes (i.e. > 1000 x 1000) you should talk to someone in the Centre about using the optimised rAM3 config which runs an I/O server and contains various other optimisations. Otherwise, the standard configuration will start to hang on I/O throughput for large domain sizes when you throw lots of cores at it.

I made a summary of the UM documentation which you can read through here:

Hi @Paul.Gregory, many thanks for the information!

Here is a better explanation of the NPROC error:

1 Like

Hi @Paul.Gregory,

I started testing run ACCESS-rAM3 for the larger domain:

I managed to create some ancillary files; however, I have one bad ancillary file canopy_height

When running RNS, an error occurs at nci_hres_eccb:

job.err shows:

Is this error due to the bad ancillary file canopa_height, or due to the very rectangular domain?

Hi Chenhui.

Check the southernmost extents of your domain. If they exceed the limits of the BARRA-R domain - you will generate errors.

The southern boundary of BARRA-R is -57.97 S.

See this post

https://21centuryweather.discourse.group/t/matching-gridpoints-in-barrar-and-outer-domain-in-access-ram3/2105/19

Can you identify where the missing value of canopy height is? The last few cells of this notebook will help you track down their location: UM_configuration_tools/notebooks/Check_UM_ancillaries.ipynb at main · 21centuryweather/UM_configuration_tools · GitHub

Just change the lat/lon values in those cells to include the location of NaNs over land.

If that NaNs are located over a small isolated island, these instructions should fix the problem.

https://21centuryweather.discourse.group/t/errors-in-ram3-set-up/2053/47

Hi @Paul.Gregory,

I checked the southernmost extent of era5 and d1100

For era5, the southmost is -58.900, and for d1100, the southmost is -57.800. I was wondering if the limit for the BARRA-R (-57.97) is for era or d1100. It seems my domain exceeds the limit? Do I have to re-run RAS to recreate ancilaries?

Regarding the NaN in canopy height:

  • It seems NaN occurs near (-55, 160)

  • Zoom-in shows it at around (-55, 159)

  • It looks like the NaN also occurs at Macquarie Island.

  • I followed the instructions and edited the line of the file in /home/565/cj0591/cylc-run/u-bu503/share/contrib _apps/CanopyHeights/canopy_heights.py as:
    loop_lim_y = index_nearest_neighbour.ydist2index(trees, 2000)

If you’re using the BARRA-R for the land surface, you’ll need to regenerate any ancil domains that exceed -57.97. Hence you will need to shift era5 northwards.

I would shift the southern edge of era5 AND d1000 northwards by the same amount to keep the same spacing between the edges of both domains.

You can use these notebooks created by Scott Wales to help size your domain : UM_configuration_tools/notebooks/UM_plot_domain.ipynb at main · 21centuryweather/UM_configuration_tools · GitHub

The Macquarie Island canopy height issue is a known bug that can be fixed by increasing the search distance to 2,000.

I double-checked the settings in RNS - I actually used ERA5-land for the land surface, when the error occurs

Do I also need to mind the southern boundary of -57.97?

If you are using ERA5-land, your outer mesh must have a resolution of 0.1 degrees.

The default rAM3 suite uses 0.11 in the outer mesh because that matches the BARRA-R land surface grids.

From the NRI rAM3 hive docs

Currently, ACCESS-rAM3 only supports specific nest configurations that meet the following criteria:

The grid points of the RAS first inner nest (i.e., Resolution 2, because Resolution 1always corresponds to the outer ERA5 domain) must align with those of the land-surface initial conditions dataset. Thus, the configuration of the RAS first inner nest (Resolution 2), including its position, dimension and resolution, need to be modified accordingly. Note that the position of a nest is also influenced by the nested region position.

Have a read here : https://21centuryweather.discourse.group/t/matching-gridpoints-in-barrar-and-outer-domain-in-access-ram3/2105/2

1 Like

Thanks @Paul.Gregory for your suggestion. The model is now able to run and generate some output:

I will be in touch with someone in the Centre later to talk about using the optimised ACCESS-rAM3 config for the large domain and my ideas in running ACCESS-rAM3 with some climatological mean ERA5 datasets.

Thanks again!

1 Like