ESMF mesh and MOM6 domain masks are inconsistent

This check occurs before the model even runs - this is during payu setup. So I don’t see how it can be related to changing the masktable.

Its a bit concerning is says the max availabe PEs is 0.

What is ocn_ntasks / ocn_nthreads / ocn_pestride / ocn_rootpe in the PELAYOUT_attributes section set to ?

If you push your config to github ill have a look.

1 Like

Ok just summarising my latest progress.

Remember I had trouble running @Helen’s original configuration (140 x 249 Tasmanian domain at 0.05 resolution requesting 100 CPUs on a 10x10 layout) so I attempted to incrementally change @mmr 's original config (70x125 at 0.1 with 16 CPUs on a 4x4)

I’ve been able to run @mmr0 's configuration for the following layouts:

Layout Runtime
4x4 09:48
8x8 05:12
10x10 04:07
12x12 03:49

All of these worked with no mask table in MOM_layout

I then regenerated the 10x10 layout but using the higher resolution grid (i.e 140x249 (0.05 resolution).

This first failed with :

FATAL from PE    20: Discrepancy detected between ESMF mesh and internal MOM6 domain sizes. Check mask table.

So I added back the MASKTABLE entry in MOM_layout

MASKTABLE = mask_table.2.10x10

Ran again. This failed with

FATAL from PE     3: ERROR: Difference between ESMF Mesh and MOM6 domain coords is greater than parameter EPS_OMESH. n, lonMesh(n), lon
(n), diff_lon, EPS_OMESH=        1    147.2500000000000       145.1250000000000             0.21250D+01          0.10000D-03

This implies the grids aren’t aligned.

As an aside, I did a diff on the two directories:

  • The first created using Helen’s instructions
  • The second created by incrementally moving from Madi’s configuration into Helen’s.

The diffs of consequence are:

diff access-rom3-PG/datm_in access-rom3-MR-10by10-hires/datm_in
11,12c11,12
<   model_maskfile = "./INPUT/access-rom3-ESMFmesh.nc"
<   model_meshfile = "./INPUT/access-rom3-ESMFmesh.nc"
---
>   model_maskfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"
>   model_meshfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"
Only in access-rom3-MR-10by10-hires/: docs
diff access-rom3-PG/drof_in access-rom3-MR-10by10-hires/drof_in
3,4c3,4
<   model_maskfile = "./INPUT/access-rom3-ESMFmesh.nc"
<   model_meshfile = "./INPUT/access-rom3-ESMFmesh.nc"
---
>   model_maskfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"
>   model_meshfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"

i.e. Madi’s config uses the nomask mesh file.

There are also differences in MOM_input regarding OBC_SEGMENT definitions

diff access-rom3-PG/MOM_input access-rom3-MR-10by10-hires/MOM_input
109a110,133
> OBC_SEGMENT_001 = "J=0,I=0:N,FLATHER,ORLANSKI,NUDGED,ORLANSKI_TAN,NUDGED_TAN" !
>                                 ! Documentation needs to be dynamic?????
> OBC_SEGMENT_001_VELOCITY_NUDGING_TIMESCALES = 0.3, 360.0 !   [days] default = 0.0
>                                 ! Timescales in days for nudging along a segment, for inflow, then outflow.
>                                 ! Setting both to zero should behave like SIMPLE obcs for the baroclinic
>                                 ! velocities.
> OBC_SEGMENT_002 = "J=N,I=N:0,FLATHER,ORLANSKI,NUDGED,ORLANSKI_TAN,NUDGED_TAN" !
>                                 ! Documentation needs to be dynamic?????
> OBC_SEGMENT_002_VELOCITY_NUDGING_TIMESCALES = 0.3, 360.0 !   [days] default = 0.0
>                                 ! Timescales in days for nudging along a segment, for inflow, then outflow.
>                                 ! Setting both to zero should behave like SIMPLE obcs for the baroclinic
>                                 ! velocities.
> OBC_SEGMENT_003 = "I=0,J=N:0,FLATHER,ORLANSKI,NUDGED,ORLANSKI_TAN,NUDGED_TAN" !
>                                 ! Documentation needs to be dynamic?????
> OBC_SEGMENT_003_VELOCITY_NUDGING_TIMESCALES = 0.3, 360.0 !   [days] default = 0.0
>                                 ! Timescales in days for nudging along a segment, for inflow, then outflow.
>                                 ! Setting both to zero should behave like SIMPLE obcs for the baroclinic
>                                 ! velocities.
> OBC_SEGMENT_004 = "I=N,J=0:N,FLATHER,ORLANSKI,NUDGED,ORLANSKI_TAN,NUDGED_TAN" !
>                                 ! Documentation needs to be dynamic?????
> OBC_SEGMENT_004_VELOCITY_NUDGING_TIMESCALES = 0.3, 360.0 !   [days] default = 0.0
>                                 ! Timescales in days for nudging along a segment, for inflow, then outflow.
>                                 ! Setting both to zero should behave like SIMPLE obcs for the baroclinic
>                                 ! velocities.
242a267,275
> OBC_SEGMENT_001_DATA = "U=file:forcing_obc_segment_001.nc(u),V=file:forcing_obc_segment_001.nc(v),SSH=file:forcing_obc_segment_001.nc(eta),TEMP=file:forcing_obc_segment_001.nc(temp),SALT=file:forcing_obc_segment_001.nc(salt)" !
>                                 ! OBC segment docs
> OBC_SEGMENT_002_DATA = "U=file:forcing_obc_segment_002.nc(u),V=file:forcing_obc_segment_002.nc(v),SSH=file:forcing_obc_segment_002.nc(eta),TEMP=file:forcing_obc_segment_002.nc(temp),SALT=file:forcing_obc_segment_002.nc(salt)" !
>                                 ! OBC segment docs
> OBC_SEGMENT_003_DATA = "U=file:forcing_obc_segment_003.nc(u),V=file:forcing_obc_segment_003.nc(v),SSH=file:forcing_obc_segment_003.nc(eta),TEMP=file:forcing_obc_segment_003.nc(temp),SALT=file:forcing_obc_segment_003.nc(salt)" !
>                                 ! OBC segment docs
> OBC_SEGMENT_004_DATA = "U=file:forcing_obc_segment_004.nc(u),V=file:forcing_obc_segment_004.nc(v),SSH=file:forcing_obc_segment_004.nc(eta),TEMP=file:forcing_obc_segment_004.nc(temp),SALT=file:forcing_obc_segment_004.nc(salt)" !
>                                 ! OBC segment docs

i.e. Helen’s MOM_layout is missing all OBC_SEGMENT information (is this correct? Did I delete this accidentally?)

Anyway - I’ll do a check on the lat/lon co-ordinates of the mesh files in Madi’s config and see if I can figure out the source of the error.

To run Madi’s config at 0.05 resolution with 140x249, do I need to change the mask file to use the `nomask’ version?

I believe both will run. I think both should give the same result. There are some cases the nomask version is needed for fields to be coupled conservatively.

Ok I checked the dimensions of my access-rom3-ESMFmesh.nc and they were still the ‘low-res’ versions (i.e. 70 x 149).

So I regenerated the ESMF mesh files and ran again. I now get the same error I had logged here : ACCESS-ROM3 setup instructions - #55

FATAL from PE    86: ERROR: ESMF mesh and MOM6 domain masks are inconsistent! - MOM n, maskMesh(n), mask(n) =      306        
 1         0

FATAL from PE    82: ERROR: ESMF mesh and MOM6 domain masks are inconsistent! - MOM n, maskMesh(n), mask(n) =       81        
 1         0

Image              PC                Routine            Line        Source             
access-om3-MOM6    0000000002E3B304  mpp_mod_mp_mpp_er          72  mpp_util_mpi.inc
access-om3-MOM6    0000000001E4F06F  mom_error_handler         154  MOM_error_handler.F90
access-om3-MOM6    0000000001B83460  mom_cap_mod_mp_in        1246  mom_cap.F90

So I’ve now generated the same error by building a config two seperate ways.

I’ve checked my ocean_mask.nc and access-rom3-ESMFmesh.nc file and they are identical.

Some questions:

  1. Does anyone know which file MOM6 domain mask refers to the have above error message?
  2. Does anyone have a config in git somewhere of the Tassie domain at 140x249 that will work out of the box?
  3. @dougiesquire - do you want to catch up some time to run me through the masking process? Will ‘automasking’ potentially solve this?

Thanks for reading

@Paul.Gregory I think it is important to use the nomask version of the ESMF mesh in certain places. Could you please take a look at this config and check that you are using masked and unmasked versions in the same places.

Here is where ‘ESMFmesh.nc’ is used in my current config directory
config.yaml:

- /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-datm-ESMFmesh.nc
- /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-drof-ESMFmesh.nc
- /scratch/gb02/pag548/regional_mom6_configs/tassie-access-om2-forced/access-rom3-ESMFmesh.nc
- /scratch/gb02/pag548/regional_mom6_configs/tassie-access-om2-forced/access-rom3-nomask-ESMFmesh.nc

datm_in:

model_maskfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"
model_meshfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"

drof_in:

model_maskfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"
model_meshfile = "./INPUT/access-rom3-nomask-ESMFmesh.nc"

nuopc.runconfig:

mesh_atm = ./INPUT/access-om2-1deg-nomask-ESMFmesh.nc
mesh_ice = ./INPUT/access-om2-1deg-ESMFmesh.nc
mesh_mask = ./INPUT/access-rom3-ESMFmesh.nc
mesh_ocn = ./INPUT/access-rom3-ESMFmesh.nc
mesh_rof = ./INPUT/access-om2-1deg-nomask-ESMFmesh.nc

If I search for just ‘mask’ the following strings also match

fd.yaml:     - standard_name: Sg_icemask
fd.yaml:     - standard_name: Sg_icemask_coupled_fluxes
fd.yaml:     - standard_name: Si_imask
fd.yaml:       alias: ice_mask
fd.yaml:       description: sea-ice export - ice mask
fd.yaml:     - standard_name: So_omask
fd.yaml:       alias: ocean_mask
fd.yaml:     - standard_name: mask
ice_in:  maskhalo_bound = .true.
ice_in:  maskhalo_dyn = .true.
ice_in:  maskhalo_remap = .true.
ice_in:  f_tmask        = .false. , f_umask        = .false.
ice_in:  f_nmask        = .false. , f_emask        = .false.

I’ve checked this config against

In the GitHub .yaml the JRA and access meshes are defined in a different order:

    - /g/data/vk83/configurations/inputs/access-om3/share/meshes/global.1deg/2024.01.25/access-om2-1deg-ESMFmesh.nc
    - /g/data/vk83/configurations/inputs/access-om3/share/meshes/global.1deg/2024.01.25/access-om2-1deg-nomask-ESMFmesh.nc
    - /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-datm-ESMFmesh.nc
    - /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-drof-ESMFmesh.nc

Those are the only differences I can spot? (Besides the presence of rom3 meshes somewhere and om2 meshes elsewhere.

Should all mesh files in nuopc.runconfig use the regional rom3 and not the globalom2 file ?

Hi @Paul.Gregory, neither of these should be an issue. Different versions of the notebook put the boundary conditions in MOM_override.

No, we will need still need the global one for, e.g. the atmosphere

I am fairly confident all of these should be the access-rom3 mesh

You hopefully can delete the mesh_ice line . It looks like you are using nomask and mask correctly, but they all should be the rom3 mesh files.

The meshfiles used for the input files for the data atmosphere and data runoff are set in the streams.xml files (e.g here). The data-model (CDEPS) then remaps the data from the input mesh, set in streams.xml to the model_meshfile set in datm_in. (I think … im not confident why there is a model_meshfile in datm_in and a mesh_atm in nuopc.runconfig)

I remembered … don’t try this. CMEPS does some checks of this file even it it’s not being used.

@Paul.Gregory I have put my latest working configuration here
I put it together about a week ago using the latest version of the instructions. If you upload your latest configuration than I am happy to have a go at it to see if I can get it to work.

Note that I have moved all these posts into a new thread to make it easier to find and follow for newcomers.

1 Like

Hi Helen.

I used your config and changed your paths in config.yaml from

/scratch/tm70/hm6113/regional/accessom3/accessom3-test/etc.

to point to my regional meshes at

/scratch/gb02/pag548/regional_mom6_configs/tassie-access-om2-forced

I now generate a different error. We have no longer have a mask-related issues, it’s a time-related issue with the boundary conditions.Some snippets.

FATAL from PE     0: field u_segment_001 Array size mismatch in time_interp_external. Array "data" is too small. shape(data)= 
  121    1   75

FATAL from PE    20: time_interp_external 2: time 731215 (20030101.000000 is before range of list 734868-734872(20130101.00000
0 - 20130105.000000),file=INPUT/forcing_obc_segment_003.nc,field=u_segment_003

Here are the time contents of my first obc_segment file:

$ ncdump -v time forcing_obc_segment_001.nc 
netcdf forcing_obc_segment_001 {
dimensions:
	time = UNLIMITED ; // (5 currently)
	double time(time) ;
		time:_FillValue = NaN ;
		time:calendar = "julian" ;
		time:units = "days since 2013-01-01 00:00:00" ;
data:

 time = 0, 1, 2, 3, 4 ;
}

I don’t have permissions to view /scratch/tm70/hm6113/regional/accessom3/accessom3-test/ to compare against your input obc_segment files.

Ahh, it looks like I have my dates as 2003 (which is not the 2013 in the instructions!)

I have placed my files on gadi:

/scratch/public/hm6113

but these will be deleted soon so copy them across to a folder somewhere.

To change the dates it is in nuopc.runconfig line 276:

start_ymd = 20030101

Change this to

start_ymd = 20130101

I’m not sure, but you may need to run

payu setup

after making this change.

1 Like

Hi Helen.

I made the change to config.yaml

$ grep start_ymd nuopc.runconfig 
     restart_ymd = -999
     start_ymd = 20130101

But the error remains.

I then repeated the run but linked to /scratch/public/hm6113 in my config.yaml, so that all my inputs were:

exe: access-om3-MOM6
input:
    - /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-datm-ESMFmesh.nc
    - /g/data/vk83/configurations/inputs/access-om3/share/meshes/share/2024.09.16/JRA55do-drof-ESMFmesh.nc
    - /g/data/vk83/configurations/inputs/access-om3/share/grids/global.1deg/2020.10.22/topog.nc
    - /g/data/vk83/configurations/inputs/access-om3/cice/grids/global.1deg/2024.05.14/grid.nc
    - /g/data/vk83/configurations/inputs/access-om3/mom/surface_salt_restoring/global.1deg/2020.05.30/salt_sfc_restore.nc
    - /g/data/vk83/configurations/inputs/access-om3/cice/grids/global.1deg/2024.05.14/kmt.nc
    - /g/data/vk83/configurations/inputs/access-om3/cice/initial_conditions/global.1deg/2023.07.28/iced.1900-01-01-10800.nc
    - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0
    - /scratch/public/hm6113/hgrid.nc
    - /scratch/public/hm6113/vcoord.nc
    - /scratch/public/hm6113/bathymetry.nc
    - /scratch/public/hm6113/init_tracers.nc
    - /scratch/public/hm6113/init_eta.nc
    - /scratch/public/hm6113/init_vel.nc
    - /scratch/public/hm6113/forcing_obc_segment_001.nc
    - /scratch/public/hm6113/forcing_obc_segment_002.nc
    - /scratch/public/hm6113/forcing_obc_segment_003.nc
    - /scratch/public/hm6113/forcing_obc_segment_004.nc
    - /scratch/public/hm6113/grid_spec.nc
    - /scratch/public/hm6113/ocean_mosaic.nc
    - /scratch/public/hm6113/access-rom3-ESMFmesh.nc
    - /scratch/public/hm6113/access-rom3-nomask-ESMFmesh.nc
    - /scratch/public/hm6113/land_mask.nc 

The error remains.

Ok, so this is strange. Remember now I’m running using your config at
git@github.com:helenmacdonald/ACCESS-rOM3-tas.git

Here is the git diff between what I’m running on disk, and what is in your repo for config.yaml and nuopcy.runconfig

$ git diff origin/expt -- .
diff --git a/config.yaml b/config.yaml
index 33e3780..67f2b70 100644
--- a/config.yaml
+++ b/config.yaml
@@ -28,21 +28,21 @@ input:
     - /g/data/vk83/configurations/inputs/access-om3/cice/grids/global.1deg/2024.05.14/kmt.nc
     - /g/data/vk83/configurations/inputs/access-om3/cice/initial_conditions/global.1deg/2023.07.28/iced.1900-01-01-10800.nc
     - /g/data/qv56/replicas/input4MIPs/CMIP6/OMIP/MRI/MRI-JRA55-do-1-4-0
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/hgrid.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/vcoord.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/bathymetry.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/init_tracers.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/init_eta.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/init_vel.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/forcing_obc_segment_001.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/forcing_obc_segment_002.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/forcing_obc_segment_003.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/forcing_obc_segment_004.nc  
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/grid_spec.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/ocean_mosaic.nc 
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/access-rom3-ESMFmesh.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/access-rom3-nomask-ESMFmesh.nc
-    - /scratch/tm70/hm6113/regional/accessom3/accessom3-test/land_mask.nc
+    - /scratch/public/hm6113/hgrid.nc
+    - /scratch/public/hm6113/vcoord.nc
+    - /scratch/public/hm6113/bathymetry.nc
+    - /scratch/public/hm6113/init_tracers.nc
+    - /scratch/public/hm6113/init_eta.nc
+    - /scratch/public/hm6113/init_vel.nc
+    - /scratch/public/hm6113/forcing_obc_segment_001.nc
+    - /scratch/public/hm6113/forcing_obc_segment_002.nc
+    - /scratch/public/hm6113/forcing_obc_segment_003.nc
+    - /scratch/public/hm6113/forcing_obc_segment_004.nc
+    - /scratch/public/hm6113/grid_spec.nc
+    - /scratch/public/hm6113/ocean_mosaic.nc
+    - /scratch/public/hm6113/access-rom3-ESMFmesh.nc
+    - /scratch/public/hm6113/access-rom3-nomask-ESMFmesh.nc
+    - /scratch/public/hm6113/land_mask.nc 
 collate: false
 runlog: true
 metadata: 

diff --git a/nuopc.runconfig b/nuopc.runconfig
index cf5f7a8..0162674 100644
--- a/nuopc.runconfig
+++ b/nuopc.runconfig
@@ -273,7 +273,7 @@ CLOCK_attributes::
      restart_ymd = -999
      rof_cpl_dt = 99999 #not used
      start_tod = 0
-     start_ymd = 20030101
+     start_ymd = 20130101
      stop_n = 2
      stop_option = ndays
      stop_tod = 0

Are we at the point where we think there is underlying configuration/environment issue?

Hi @Paul.Gregory, just checking - is “the error remains” referring to this error:

ESMF mesh and MOM6 domain masks are inconsistent!

or this one:
field u_segment_001 Array size mismatch in time_interp_external
?

I am using payu1.1.6 upgraded from payu1.1.5

Sorry.

It’s still generating warnings and errors related to time dimensions, e.g.

WARNING from PE     9: categorize_axes: Failed to identify x- and y- axes in the axis list (nx_segment_004, ny_segment_004, time) of a varia
ble being read from INPUT/forcing_obc_segment_004.nc

FATAL from PE     9: time_interp_external 2: time 734868 (20130101.000000 is after range of list 731215-731219(20030101.000000 - 20030105.00
0000),file=INPUT/forcing_obc_segment_001.nc,field=u_segment_001

FYI I’m inquiring at NCI for help in MPI fortran debugging session. I have lots of serial fortran debugging experience (using gdb and ifort) so I’m thinking debugging this regional MOM6 configuration will be helpful tracking down these errors.

Will also be good for my own MOM6 education.

I’ll follow the spack-MOM6 compilation instructions again, and make sure I add debug flags. Then I’ll use this exe for my own config.

But I’ll also try with payu 1.1.6

1 Like

In the meantime - I will redo my example so the time matches everyone else’s.

6 posts were split to a new topic: Setting build options for OM3

@Paul.Gregory
I haven’t tried recently, but I am expecting them to be the same.
We wanted to have the global and regional models both on the symmetric executable but there are some restart reproducibility issues so we need to leave the +mom_symmetric step in for now.
If you are using the most recent ACCESS-om3 version, you may need to follow the regional mom6 setup instructions from the start as there have been some changes in the global configurations that we base the regional configurations on (so the executable will not be compatible with the input files).

@Paul.Gregory, I now have a simulation with an error similar to yours. For me, it was caused by mixing and matching the input boundary condition (forcing_obc_segment_00*.nc) files and configuration files (MOM_*). There are different versions of the regional-mom6 package which has resulted in different versions of the boundary condition files floating around. The forcing_obc_segment_00*.nc files created in one version are incompatible with the MOM_* files created with a different version.

Basically, there are differences in the order that the boundary conditions are specified and the direction they go in.

This line:

OBC_SEGMENT_001 = "J=N,I=N:0,FLATHER,ORLANSKI,NUDGED,ORLANSKI_TAN,NUDGED_TAN" !None

(and others like it for each of the 4 segments) tell the model which edge forcing_obc_segment_001.nc is on, and which direction the boundary is read in. It is different in different versions to reflect the differences in the netcdf files.

Not too sure if this is what is causing your error – but it would be good to check that you are using the MOM_* files that were made with the same notebook as the input netcdf files.

1 Like