Regridding with land-sea masks advice

Hi everyone!

I’m currently regridding some artificial SST and sea ice datasets from a 1.25 x ~0.94 grid to a 1.875 x 1.25 grid in order to use them as ancillary files for the ESM1.5 AMIP model, and was wondering if anyone had any advice/wisdom on what a good approach might be.

As shown below, the files have some sort of artificial data over land
ER_SST_t_1870-01-16 12:00:00 (fig 1)

It’s been suggested that I mask these out so that they are not used during the regridding, and so I’ve obtained some land-sea masks for both the input and output grids.

I’ve been using xESMF to try and do the regridding, as it seems a bit more beginner friendly and has a bit more documentation than some of the other options I’ve seen. It’s also been suggested I just use bilinear interpolation. I tried a few different methods for masking out the land data, and found the results a bit strange.

1. Prior to regridding, setting all land values to nan, and regridding without supplying any masks to the xe.Regridder call
From what I understand, any output grid points which are next to an nan on the input data will be set to nan. This results in
regrid_input_nand (Fig 2)

2. Supplying an input binary mask to the input grid in the xe.Regridder call:
This results in the following:
regrid_input_mask (Fig 3)

The results are almost identical, however the second method actually outputs values at few more cells than the first, specifically on Western coasts, with the extra values shown below.
diff_input_masked_input_nand (Fig 4)

It’s unclear to me how the second method produces data at these extra points. Some old discussions on the xESMF github suggest that when a mask is supplied to the regridder, it’s the same as setting the land values to zero and then interpolating.

This doesn’t seem exactly like what’s happening. If we manually set the land values to zero and regrid, we get soft edges:
regrid_input_zeroed (Fig 5)

Worth noting though: The extra values in Fig 4 do equal the values at the same locations Fig 5, so maybe they are a result of interpolating to zero. It’s not clear to me if this is what’s happening, at least visually they don’t look much lower than their neighbours.

In any case, I will need to then extrapolate the values so they cover the whole ocean on the output grid.

I asked about the extra values on the xESMF github, and it was suggested there that the best option is to use the ‘conservative normed’ method instead. An possible issue here is that the conservative normed method might not be able to ensure the outputs are periodic in longitude (currently you can only specify that the grid is periodic when not using the conservative methods)

The options I can think of at the moment are to:

  1. Regrid without any masks before applying the output grids mask. This would be straightforward but the coastal points will be influenced by the artificial land values.
  2. Regrid providing the xe.Regridder call with both the input and output masks, and getting it to extrapolate to the missing points. The issue with this would be that the mystery values in Fig 4 would be included, and would affect the extrapolation to some of the coastal points.
  3. Use the conservative normed method, and not have to worry about nans or 0’s affecting the output, but possibly having discontinuities at the meridian line (maybe this is unlikely to be severe if the input values are periodic anyway)

I’ll try the conservative normed method and see what it looks like, but was wondering if anyone here has any thoughts/recommendations on what could be a good way to do this regridding. Sorry for the long post, but thanks for having a read!


1 Like

The most important thing here is the output mask - you need to make sure that there’s data on all of the model sea points. For that purpose it may be simplest to just use the unmasked data - if there is an island in your masked data that isn’t in the model’s land mask you would need to extrapolate the data anyway.

Your differences being only on the eastern coastlines are suspicious - are you sure the input masks are identical in both cases?

Yes masking is different to setting the input value to zero - if you mask out a region its values do not affect the output field. Think of increasing the resolution so that there’s a new ocean grid point on the coastline, between an existing land and sea point.

If you mask, the land point doesn’t affect the new grid point, the new point gets its values from the surrounding ocean points only. You get sharply defined coastlines.

If you set to zero, the new point will be getting the average of all grid points around it, the zero valued land points will affect it. You end up seeing blurred coastlines with unrealistically low values.


Hi Spencer & Scott,

I have another question similar to yours Spencer. Julie told me about the work you’ll be doing, it would be great to hear more once she’s back from leave.

I have an ocean basin mask file that I would like to interpolate to the MOM grid. In this file, land areas are set to -100 and each basin is assigned an integer. It’s similar to the MOM basin mask file (/g/data/access/access-cm2/input_O1/mom4/, but some changes have been made to the file I have according to an experiment protocol. It seems a bit trickier to do this interpolation than if there was valid data at all points.

What would be the best way to do this? If I use bilinear interpolation, I end up with missing values around the coastlines on cells that should be ocean, which will be an issue when the model runs.

On this page (, files were regridded with ‘umtool regrid’ but I wasn’t able to find it.

Thanks for your help,

1 Like

For category data where each integer represents a different basin use one of the nearest grid point methods. To avoid the missing data around coastlines use a mask on the input data to mask out the land points, and enable extrapolation in the regridding.

Here’s a demo with the MOM basins file

import xarray
import xesmf

ds = xarray.open_dataset('/g/data/access/access-cm2/input_O1/mom4/')

# Variable to interpolate
basins = ds.BASIN_MASK[0,:,:]

# Make this test match the requirements - land points are -100
basins = basins.fillna(-100)

# Mask out land points
mask = basins != -100

# Create input dataset with data and mask
ds_in = xarray.Dataset({'basins': basins, 'mask': mask})

# Random output grid for demo
ds_out = xesmf.util.grid_2d(
    -120, 120, 0.4, -60, 60, 0.3

# Regridder with extrapolation
regridder = xesmf.Regridder(ds_in, ds_out, method='nearest_s2d', extrap_method="nearest_s2d")

# Do the regrid
basins_out = regridder(basins)


1 Like

Hi Scott,

Thanks for this! It is strange that the differences are only on the eastern coasts, but the grids in both examples are taken from the same file, so I’m not too sure what the cause of the difference could be.

That makes sense with the difference between masking and setting the land values to zero, as it would be bad to get those unrealistically low values. It does make the examples in the github discussion where it appears to be doing that quite confusing though.

After reading through the links on the timesteps thread, it looks like the plan of regridding the data which already has had the Karl Taylor/diddling procedure applied is quite problematic, and I best let the regridding be done beforehand.