Problems decoding time dimension for ACCESS-ESM-1.5 long runs

georgyfalster · 22 July 2025 01:26

I’m run some long simulations with the ACCESS-ESM 1.5. I’m trying to collate the output from the native output folders (one per year) to something more useable.

When I read in outputs from the ocean model, they look fine initially but when I go to load them to memory, i.e., ds.load(), I get an error associated with the time index - some variant of cannot convert input 264800.0 with the unit 'D'

The years start at 0301 so xarray uses cftime to decode the times (which usually works fine, and does indeed for e.g., the atmosphere model of these runs). I tried loading with decode_times = False and everything looks ok as far as I can tell.

Here’s a simple example of what doesn’t work for the ocean model outputs:

varname = 'surface_temp'
floc = '/g/data/lg87/gf6872/indo-pacific-pacemakers/full_model_outputs/esm-pm-m60-release-preindustrial+concentrations-b7afa9c9/'
files=sorted(glob.glob(f'{floc}output*/ocean/ocean-2d-{varname}-1monthly-mean-*.nc'))
ds = xr.open_mfdataset(files)
ds.load()

I tried a few checks, like loading single files, but that did not clarify things for me (other than maybe the fact that the un-decoded times are floating points)

Anyone else managed to sort this out already? Otherwise flagging @jemmajeffree

jemmajeffree · 22 July 2025 01:58

Short hack:
(Assuming that the slightly different error I threw is caused by the same underlying problem? I got OutOfBoundsTimedelta: Cannot convert 18018979200 seconds to timedelta64[ns] without overflow with analysis3-25.06)
It’s being caused by the time_bounds variable. Unless you need time_bounds right now, I’d recommend:

import xarray as xr
import glob

from dask.distributed import Client
client = Client(threads_per_worker=1,memory_limit=0)

varname = 'surface_temp'
floc = '/g/data/lg87/gf6872/indo-pacific-pacemakers/full_model_outputs/esm-pm-m60-release-preindustrial+concentrations-b7afa9c9/'
files=sorted(glob.glob(f'{floc}output*/ocean/ocean-2d-{varname}-1monthly-mean-*.nc'))


def drop_timebounds(ds):
    del ds['time_bounds']
    return ds

ds = xr.open_mfdataset(files,
                       preprocess=drop_timebounds,
                      parallel=True,
                       decode_times=True,
                       decode_timedelta=False,
                       use_cftime=True,
                       chunks={'time':12},
                      )

Everything except preprocess=drop_timebounds, is just about speed and memory optimisation (and making those icky pink boxes go away). The most important is memory_limit=0 in the dask client, otherwise it was killing my client when one worker had to use a bit of extra memory to bring everything together at the end of the load

I’ll look into what’s actually going on for a more sustainable fix

georgyfalster · 22 July 2025 03:03

Amazing!! Thanks!! It’s so fast!!

jemmajeffree · 22 July 2025 04:43

The underlying problem is that ACCESS-ESM1-5 is saving time_bounds as “days” rather than “days since days since 0001-01-01 00:00:00”. According to cftime conventions, xarray is then decoding this as a timedelta not a time object. For reasons that aren’t clear to me, we end up with type np.timedelta64, which insists on having nanosecond precision, which is obviously ridiculous when you’re counting hundreds of years, and so somewhere between year 101 and year 400 we overflow 64 bits of nanoseconds.

The solution is to swap out units attribute of time_bounds for something more reasonable, before the times are decoded:

import xarray as xr
import glob

from dask.distributed import Client
client = Client(threads_per_worker=1,memory_limit=0)

varname = 'surface_temp'
floc = '/g/data/lg87/gf6872/indo-pacific-pacemakers/full_model_outputs/esm-pm-m60-release-preindustrial+concentrations-b7afa9c9/'
files=sorted(glob.glob(f'{floc}output*/ocean/ocean-2d-{varname}-1monthly-mean-*.nc'))

ds = xr.open_mfdataset(files,
                      parallel=True,
                       decode_times=False,         # Don't decode times from their "days since" units
                       decode_timedelta=False,  # Don't decode timedeltas (including the one that shouldn't be a timedelta)
                       chunks={'time':12},
                      )
ds.time_bounds.attrs['units'] = ds.time.attrs['units'] # Turn time_bounds into a time not a timedelta
ds = xr.decode_cf(ds) # Okay, now decode times

Ideally we’d push this change into the model code rather than having to make the change when reading, at least for ESM1.6.

Aidan · 22 July 2025 05:43

Oh god! I’ve definitely seen this and solved it before. ZOMBIE ERRORS!

Yes we definitely need to fix this. @dougiesquire is this still an issue for MOM5/ESM1.6?

dougiesquire · 22 July 2025 06:15

I just looked at some ESM1.6 output and the units on time_bnds is also "days", so looks like it.

Can you remember where your previous solution ended up?

Aidan · 22 July 2025 09:42

Oh sorry, the solution was the same as above, just removing the time_bnds or altering the units.

So code at the bottom of the model (cliff).

Topic		Replies	Views
Cftime vs. datetime64 time encoding issues with ACCESS-OM2-025 OMIP-2 run Ocean	2	64	22 July 2025
Building a COSIMA dataset on time-averaged files Technical python , cosima	10	430	21 February 2023
NCI_era5grib - WRF crash? Atmosphere python , help , outofscope	5	54	12 March 2025
Issues loading ACCESS-OM2-01 data from cycle 4 Technical	5	531	15 February 2023
Help from ACCESS-NRI on libaccessom2 netcdf packing issue for COSIMA ERA-5 runs COSIMA	3	245	1 February 2023

Problems decoding time dimension for ACCESS-ESM-1.5 long runs

Related topics