I’m run some long simulations with the ACCESS-ESM 1.5. I’m trying to collate the output from the native output folders (one per year) to something more useable.
When I read in outputs from the ocean model, they look fine initially but when I go to load them to memory, i.e., ds.load(), I get an error associated with the time index - some variant of cannot convert input 264800.0 with the unit 'D'
The years start at 0301 so xarray uses cftime to decode the times (which usually works fine, and does indeed for e.g., the atmosphere model of these runs). I tried loading with decode_times = False and everything looks ok as far as I can tell.
Here’s a simple example of what doesn’t work for the ocean model outputs:
I tried a few checks, like loading single files, but that did not clarify things for me (other than maybe the fact that the un-decoded times are floating points)
Anyone else managed to sort this out already? Otherwise flagging @jemmajeffree
Short hack:
(Assuming that the slightly different error I threw is caused by the same underlying problem? I got OutOfBoundsTimedelta: Cannot convert 18018979200 seconds to timedelta64[ns] without overflow with analysis3-25.06)
It’s being caused by the time_bounds variable. Unless you need time_bounds right now, I’d recommend:
Everything except preprocess=drop_timebounds, is just about speed and memory optimisation (and making those icky pink boxes go away). The most important is memory_limit=0 in the dask client, otherwise it was killing my client when one worker had to use a bit of extra memory to bring everything together at the end of the load
I’ll look into what’s actually going on for a more sustainable fix
The underlying problem is that ACCESS-ESM1-5 is saving time_bounds as “days” rather than “days since days since 0001-01-01 00:00:00”. According to cftime conventions, xarray is then decoding this as a timedelta not a time object. For reasons that aren’t clear to me, we end up with type np.timedelta64, which insists on having nanosecond precision, which is obviously ridiculous when you’re counting hundreds of years, and so somewhere between year 101 and year 400 we overflow 64 bits of nanoseconds.
The solution is to swap out units attribute of time_bounds for something more reasonable, before the times are decoded:
import xarray as xr
import glob
from dask.distributed import Client
client = Client(threads_per_worker=1,memory_limit=0)
varname = 'surface_temp'
floc = '/g/data/lg87/gf6872/indo-pacific-pacemakers/full_model_outputs/esm-pm-m60-release-preindustrial+concentrations-b7afa9c9/'
files=sorted(glob.glob(f'{floc}output*/ocean/ocean-2d-{varname}-1monthly-mean-*.nc'))
ds = xr.open_mfdataset(files,
parallel=True,
decode_times=False, # Don't decode times from their "days since" units
decode_timedelta=False, # Don't decode timedeltas (including the one that shouldn't be a timedelta)
chunks={'time':12},
)
ds.time_bounds.attrs['units'] = ds.time.attrs['units'] # Turn time_bounds into a time not a timedelta
ds = xr.decode_cf(ds) # Okay, now decode times
Ideally we’d push this change into the model code rather than having to make the change when reading, at least for ESM1.6.
3 Likes
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
5
Oh god! I’ve definitely seen this and solved it before. ZOMBIE ERRORS!
Yes we definitely need to fix this. @dougiesquire is this still an issue for MOM5/ESM1.6?