Forcing ACCESS-OM2 using ESM1.5 data

Hi @ongqingyee,

My understanding is that fld_s08i234: surface_runoff_flux comes from the land model, and refers to surface runoff over the land points rather than the runoff going into the ocean. If you have access to it, the variable fld_s26i004: water_flux_into_sea_water_from_rivers should contain the river runoff going into the ocean from the UM.

If the runoff from the UM hasn’t been saved, you might be able to take it from the ocean output. The runoff variable saved by MOM will contain the same data, but regridded onto the ocean grid.

Based on the coupling variables sent to the ocean in ESM1.5 here I don’t think there was any equivalent to licalvf in ESM1.5 (it’s being added in 1.6!). I believe all the freshwater flux from the ice sheets was distributed into the liquid runoff around the coastlines.

I followed the instructions here and ended up with this error, which I traced back to here.

I am slightly confused by how the MOM5 ESM1.5 component code is showing up when running ACCESS-OM2 with ESM ocean restarts.

There are a few different versions of the MOM5 code in different places, and the latest release of OM2 should be using the code from this repository. I think this error is coming from here in the FMS library used by OM2.

The MOM5 restarts contain checksums of each variable’s data, e.g:

	double temp(Time, zaxis_1, yaxis_1, xaxis_1) ;
		temp:long_name = "temp" ;
		temp:units = "none" ;
		temp:checksum = "DCA837A688447387" ;

When starting from a restart, the model reads in the restart variables and recalculates the checksums. It then compares these to the checksums stored in the restart file, exiting with an error if they don’t match.

Based on the earlier discussion, did you end up vertically interpolating the ESM1.5 restarts? As this would change the restart data, it would cause this check to fail. It looks like you can add

checksum_required=.false.

to ocean/input.nml in the fms_io_nml section to switch off this check

1 Like

Hi Spencer, thanks so much for the info.

Re the MOM5 restarts, I did not end up interpolating anything, as I only realised in hindsight that the 1 degree ACCESS-OM2 also has 50 vertical levels - i.e. by my understanding, the same grid as ACCESS ESM1.5. Thus I just added the line below to the config.yaml file, and I’m not sure where the error came from if I did not modify the restart files?

restart:
  /g/data/vk83/configurations/inputs/access-esm1p5/modern/pre-industrial/restart

Hi @ongqingyee, just noting that the ACCESS-OM2-1deg and ACCESS-ESM1.5 ocean grids are not exactly the same. Specifically:

  • the t-cell areas are slightly different over the tripole region (largest at/near the Nth pole)
  • the land-sea masks differ (OM2 has 27 more wet cells than ESM1.5)

We’ll fix up these annoying inconsistencies in future releases, but I suspect the second point above is why the checksums don’t match.

Have you tried @spencerwong’s suggestion:

2 Likes

Hi @dougiesquire, thank you for clarifying. Unfortunately @spencerwong 's suggestion did not work, throwing up these errors - which from what you said are due to OM2 now having some empty cells?

For the purposes of this project we wanted to replicate the ocean component of ESM1.5 as much as possible. However, my understanding is that regridding the restarts from ESM to the OM2 grid and changing the masking, while keeping the very small inconsistencies in mind, is much easier than changing the masks/t-cell areas in OM2 to match those of ESM1.5?

[gadi-cpu-clx-0240:984247:0:984247] Caught signal 8 (Floating point exception: floating-point divide by zero)
[gadi-cpu-clx-0248:2339332:0:2339332] Caught signal 8 (Floating point exception: floating-point divide by zero)
[gadi-cpu-clx-0248:2339362:0:2339362] Caught signal 8 (Floating point exception: floating-point divide by zero)
==== backtrace (tid: 984247) ====
 0 0x0000000000012990 __funlockfile()  :0
 1 0x0000000000e59c2d ocean_thickness_mod_mp_thickness_restart_()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiyhew$
 2 0x0000000000eab0d2 ocean_thickness_mod_mp_ocean_thickness_init_()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiy$
 3 0x000000000044b765 ocean_model_mod_mp_ocean_model_init_()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiyhewe4je6$
 4 0x000000000041daa3 MAIN__()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiyhewe4je6mifguz/spack-src/src/accessom_$
==== backtrace (tid:2339332) ====
 5 0x00000000004101e2 main()  ???:0
 0 0x0000000000012990 __funlockfile()  :0
 6 0x000000000003a7e5 __libc_start_main()  ???:0
 1 0x0000000000e59c2d ocean_thickness_mod_mp_thickness_restart_()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiyhew$
 7 0x00000000004100ee _start()  ???:0
 2 0x0000000000eab0d2 ocean_thickness_mod_mp_ocean_thickness_init_()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiy$
=================================
 3 0x000000000044b765 ocean_model_mod_mp_ocean_model_init_()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiyhewe4je6$
 4 0x000000000041daa3 MAIN__()  /scratch/tm70/tm70_ci/tmp/spack-stage/spack-stage-mom5-git.2023.11.09=2023.11.09-qji4nlmr6utrribaiyhewe4je6mifguz/spack-src/src/accessom_$
 5 0x00000000004101e2 main()  ???:0
 6 0x000000000003a7e5 __libc_start_main()  ???:0
 7 0x00000000004100ee _start()  ???:0
=================================
forrtl: error (75): floating point exception
...

I’d say so, yes. You shouldn’t need to do any regridding since the cell locations are the same between the two models. I think just extrapolating values at the additional OM2 wet cells should get you past the issue.

In case it’s useful, here’s some code to open and compare the two masks:

import xarray as xr
import cartopy.crs as ccrs
import matplotlib.pyplot as plt

# Load grid and mask data
esm_grid_spec = xr.open_dataset(
    "/g/data/vk83/experiments/inputs/access-esm1p5/modern/share/ocean/grids/mosaic/global.1deg/2020.05.19/grid_spec.nc"
).compute()

om2_mask = xr.open_dataset(
    "/g/data/vk83/experiments/inputs/access-om2/ocean/grids/bathymetry/global.1deg/2020.10.22/ocean_mask.nc"
).compute()
om2_hgrid = xr.open_dataset(
    "/g/data/vk83/experiments/inputs/access-om2/ocean/grids/mosaic/global.1deg/2020.05.30/ocean_hgrid.nc"
).compute()

# Add/rename dims/coordinates consistently
esm_mask = esm_grid_spec[["wet", "y_T", "x_T"]]
esm_mask = esm_mask.rename({"wet": "mask", "y_T": "geolat_t", "x_T": "geolon_t"})
esm_mask = esm_mask.drop_vars(("grid_y_T", "grid_x_T"))
esm_mask = esm_mask.set_coords(("geolat_t", "geolon_t"))

om2_mask = om2_mask.rename({"ny": "grid_y_T", "nx": "grid_x_T"})
om2_mask = om2_mask.assign_coords({
    "geolat_t": (("grid_y_T", "grid_x_T"), om2_hgrid["y"].values[1:-1:2, 1:-1:2]),
    "geolon_t": (("grid_y_T", "grid_x_T"), om2_hgrid["x"].values[1:-1:2, 1:-1:2])
})

# Plot the difference in the masks
fig, ax = plt.subplots(figsize=(13,5), subplot_kw=dict(projection=ccrs.PlateCarree()))
(om2_mask - esm_mask)["mask"].plot(x="geolon_t", y="geolat_t")
ax.coastlines()

This should produce a plot that shows where the additional OM2 wet cells are:

2 Likes

Thank you @spencerwong @dougiesquire for your help so far!

I filled in the discrepancies between OM2 and ESM mask and turned off the checksum, and ended up at this segmentation error.

[gadi-cpu-clx-0240:1169220:0:1169220] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x732b000)
fms_ACCESS-OM.x: malloc.c:2415: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) &&$
[gadi-cpu-clx-0240:1169214:0:1169214] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7cac000)
[gadi-cpu-clx-0240:1169218:0:1169218] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7c65000)
[gadi-cpu-clx-0240:1169208:0:1169208] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x6e65000)
[gadi-cpu-clx-0240:1169314:0:1169314] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7863000)
[gadi-cpu-clx-0240:1169217:0:1169217] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x6f58000)
[gadi-cpu-clx-0240:1169199:0:1169199] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7837000)
[gadi-cpu-clx-0240:1169216:0:1169216] Caught signal 11 (Segmentation fault: address not mapped to object at address 0x7f18000)
malloc(): corrupted top size
malloc(): corrupted top size
fms_ACCESS-OM.x: malloc.c:2415: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) &&$
fms_ACCESS-OM.x: malloc.c:2415: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) &&$
fms_ACCESS-OM.x: malloc.c:2415: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) &&$
malloc(): corrupted top size
...

I thought there might be an issue with the restart files to begin with as in here (the directories have changed since then so it is slightly unclear if the restarts I was using have been updated from PI02 or not). I was using the restart files Tilo directed me to ( /g/data/vk83/configurations/inputs/access-esm1p5/modern/pre-industrial/restart).

I did also do the filling in of the discrepancies for the PI02 restarts as in the directory mentioned here (/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/PI-02). Both before and after filling in the missing wet cells I got a ideal passive tracer error –


FATAL from PE   210: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.


FATAL from PE   197: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.

FATAL from PE   194: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.


FATAL from PE   212: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.


FATAL from PE   192: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.


FATAL from PE   211: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.


FATAL from PE   193: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.


FATAL from PE   209: temp is not initialized as an ideal passive tracer,  it is not initialized as constant, and it does not exist in the file INPUT/ocean_temp_salt.res.ncAll tracers must have initialization specified.  There is no default.

...

The temp variable exists in ocean_temp_salt.nc so I’m not sure what this means?

Hmmm, that’s weird. I’m not sure I can be much more help without access to your Payu configuration, inputs etc. In which project are things currently?

The experiment with PI-02 restarts (with wet cells filled in) is : /home/561/qo9901/access-om2/1deg_esm1p5_forc_PI02_restarts, while that of the vk83 restarts is /home/561/qo9901/access-om2/1deg_esm1p5_forc_vk83_restarts_filled. The restart files sit on /g/data/if69/qo9901/ocean_restart_frm_ESM_PI-02/restart_filled_om2
and /g/data/if69/qo9901/ocean_restart_frm_ESM_vk83/restart_filled_om2.

I did consider that using a different restart year was messing with things - I used yr 700 following Hannah’s post/Spencer’s instructions but the cice .nml files are different between OM2 and ESM so I couldn’t follow them exactly.

Thanks so much!

Sorry I didn’t chip in earlier to this, but actually the whole vertical grid in ESM1.5 is different to that of the 50 level ACCESS OM2. I know that ESM goes down to 6000 m depth while think OM2 goes to 5500. And indeed all of the grid levels below 200 m are different to some extent.

2 Likes

Thanks for clarifying @dkhutch, you’re right.

@ongqingyee (and you) did flag the need to re-interpolate the vertical structure already above. Are you including this step @ongqingyee?

1 Like

Thanks @dkhutch and Dougie, I did not re-interpolate and assumed the same number of vertical levels would be consistent across ESM and OM. My bad, could have checked, I’ll interpolate them now. Cheers!

Thanks @ongqingyee , note also that this might not necessarily be the cause of the crash, so please let us know if it’s still stuck.

1 Like

Thanks everyone for your input so far. I regridded vertically for the PI_02 restarts and another set of restarts (/g/data/p73/archive/CMIP6/ACCESS-ESM1-5/SSP-585-10-re1/restart/ocn), and both runs had the same segmentation error as here.

I am still working on this but it looks like there is a memory assigning error from interpolating the restarts, like in this stackoverflow post, which manifested as a segmentation error. Not entirely sure how this happens but will keep looking.

Hi Ellie,
Is the updated version located here?

Thanks Dave, the directories with that segmentation error are /home/561/qo9901/access-om2/1deg_esm1p5_forc_PI02_restarts, and /home/561/qo9901/access-om2/1deg_esm1p5_forc_restart (with the SSP restarts).

The directory you pointed to used the vk83 restarts, and ended up with a cice error after interpolation.

Thanks @ongqingyee,
I had a poke around with it and had the same error with malloc as you. For some reason the run doesn’t actually stop when this error occurs, which I found odd. In any case, I was hoping we could find the lines of code where the error occurs. I get this message:

fms_ACCESS-OM.x: malloc.c:2415: sysmalloc: Assertion `(old_top == initial_top (av) && old_size == 0) || ((unsigned long) (old_size) >= MINSIZE && prev_inuse (old_top) && ((unsigned long) old_end & (pagesize - 1)) == 0)' failed.
malloc(): corrupted top size

@dougiesquire Given this error comes from malloc.c (which I believe is a basic C-library thing), is there any way we can trace back to find where in the fms_ACCESS-OM.x code the error originates from? Is there an executable that is compiled for “debug” mode that we could try instead?

Hi @ongqingyee ,
Ok, so having a bit more of a play around with this. I have a couple of ideas. First, I tried replicating your directory:

/home/561/qo9901/access-om2/1deg_esm1p5_forc_restart

However, this also yielded the same malloc error as before. I tried instead just to run a simpler case where I still use all your remapping and forcing from the atmosphere, but begin from a cold start of the ocean as per the ACCESS-OM2 release configurations.
I tried running such a case here:

/home/157/dkh157/access-om2/1deg_esm1p5_coldstart

And this seems to run without error for at least the first 18 min, i.e. seems to kind of work?

I then tried taking one step towards your restart configuration, but substituting only the temperature-salinity restart file from your “filled” configuration, which I ran in this directory:

/home/157/dkh157/access-om2/1deg_esm1p5_coldstart_v2

Note, the only difference here is the T-S restart file. That run crashed, with errors saying the temperature and salinity are out of range etc. So, I’m wondering if you could please send your script that you used to interpolate the T-S file? It might be worth re-examining this in case some strange values are popping up in there somewhere.

Otherwise, I would note that in your “filled” restarts you are also interpolating some of the velocity restarts, such as:

ocean_thickness.res.nc
ocean_velocity_advection.res.nc
ocean_velocity.res.nc
ocean_density.res.nc
ocean_frazil.res.nc

I would note that I never try to interpolate those onto a different mask. I’ve generally found it too complicated to get it right! Instead, if I’m going between topography configurations (which you are here, because it’s going from ESM grid to OM2 grid), I always revert to a cold start where you don’t specify any velocities, and only use the T-S restart.

So, perhaps let’s try to interrogate what’s happening to the T-S restart file, and try to get a cold start running.

2 Likes

Looks like there’re now some other things to look at first, but let me know if this would still be useful and I can compile MOM5 with debug flags (might take me a day or two to get to though).

Hi Dougie,
Thanks I think that having a debug executable would be really useful for this kind of situation because I feel that it must be possible to get a better sense of where in the model code this error is coming from.
Thanks,
Dave

1 Like

Potentially this is fairly straightforward, and something anyone could do who has write privileges to the ACCESS-1.5 deployment repo.

Would be good to document the process as a guide to doing something similar for other models. We could have a semi-permanent debug pre-release build for all models in case someone needed to use it.