If you click the little triangle on the left side, it will expand the list and show you which specific job failed. You can right click on that specific job and view the job.err file.
????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
???!!!???!!!???!!!???!!!???!!! ERROR ???!!!???!!!???!!!???!!!???!!!
? Error code: 4
? Error code: 4
? Error from routine: DECOMPOSE_FULL
? Error from routine: DECOMPOSE_FULL
? Error message: Too many processors in the North-South direction.The maximum permitted is 16
? Error message: Too many processors in the North-South direction.The maximum permitted is 16
? Error from processor: 15
? Error from processor: 272
? Error number: 16
? Error number: 16
????????????????????????????????????????????????????????????????????????????????
????????????????????????????????????????????????????????????????????????????????
Can you tell me how many processors you are attempting to run with?
Have a look in ~/roses/u-by395/rose-suite.conf and search for rg01_rs01_m01_nproc i.e. the number of processors allocated to region 1, resolution 1, model 1.
Ok your domain is slightly smaller than the default Lismore outer domain which is 450x450.
Try reducing your number of processors, say 16x12. So set
rg01_rs01_m01_nproc=16,12
in rose-suite.conf. Or you can use rose edit and change the values in the GUI.
Remember to reload your suite. You will have to run the _GAL9_LBCS and _GAL9_um_recon tasks again with the new decomposition before running the forecast tasks again.
Generally when running on the normal compute nodes on gadi (which have 48 cores) we want the total number of cores to be a multiple of 48.
When the UM decomposes your grid across multiple cores, you need to reserve a certain number of grid rows for the halo - the points that are shared between each sub-domain. The UM has limits on the numbers of points in the halo region and the sub-domain itself.
Thanks, reducing the number of processors fixed the problem! I noticed there is some discontinuity in the output. I assume that corresponds to the domain of d1100?
For curiosity, if my domain is very large (say CENTRE=[-42.5,92.5]; ERA_RES=[0.2,0.2]; ERA_SIZE=[196,646]; d2000_RES=ERA_RES; d20000_SIZE=[176,626])
There are slight ‘edge effects’ at the edges of your domain - that is normal.
Yes you should increase the number of cores if the domain becomes larger.
Sometimes the UM doesn’t like very rectangular domains and it prefers the regional models to be ‘squarer’. But you might be able that domain running.
For larger domain sizes (i.e. > 1000 x 1000) you should talk to someone in the Centre about using the optimised rAM3 config which runs an I/O server and contains various other optimisations. Otherwise, the standard configuration will start to hang on I/O throughput for large domain sizes when you throw lots of cores at it.
I made a summary of the UM documentation which you can read through here:
I checked the southernmost extent of era5 and d1100
For era5, the southmost is -58.900, and for d1100, the southmost is -57.800. I was wondering if the limit for the BARRA-R (-57.97) is for era or d1100. It seems my domain exceeds the limit? Do I have to re-run RAS to recreate ancilaries?
It looks like the NaN also occurs at Macquarie Island.
I followed the instructions and edited the line of the file in /home/565/cj0591/cylc-run/u-bu503/share/contrib _apps/CanopyHeights/canopy_heights.py as: loop_lim_y = index_nearest_neighbour.ydist2index(trees, 2000)
If you’re using the BARRA-R for the land surface, you’ll need to regenerate any ancil domains that exceed -57.97. Hence you will need to shift era5 northwards.
I would shift the southern edge of era5 AND d1000 northwards by the same amount to keep the same spacing between the edges of both domains.
Currently, ACCESS-rAM3 only supports specific nest configurations that meet the following criteria:
The grid points of the RAS first inner nest (i.e., Resolution 2, because Resolution 1always corresponds to the outer ERA5 domain) must align with those of the land-surface initial conditions dataset. Thus, the configuration of the RAS first inner nest (Resolution 2), including its position, dimension and resolution, need to be modified accordingly. Note that the position of a nest is also influenced by the nested region position.
Thanks @Paul.Gregory for your suggestion. The model is now able to run and generate some output:
I will be in touch with someone in the Centre later to talk about using the optimised ACCESS-rAM3 config for the large domain and my ideas in running ACCESS-rAM3 with some climatological mean ERA5 datasets.