Hi All. I’ve been doing a bit of work using ERA5 and ERA5-Land data recently, and I’ve come across a bit of a discrepancy in how this data is handled between older and newer versions of Xarray. Since Xarray version 2024.3.0, the dtype
for ERA5 and ERA5-Land data returned from xarray.open_dataset()
has been changed from np.float32
to np.float64
. This is due to changes in this function. On NCI, ERA5 and ERA5-Land data is stored as int16
data with scale and offset factors. In earlier versions of xarray
, this function would act on the int16
type and return the fields as float32
’s as float32
can exactly represent all integers up to 24 bits. In the later versions, it is returning the fields in the same type as the scale factor, which is float64
. This leads to small discrepancies at the limit of float32
precision that compound when used as initial conditions in simulations. Below is the difference in skt
ERA5 variable returned by Xarray 2023.12.0 and 2024.5.0 for the AUS2200 domain
At this stage, its unstructured noise at a relative value of around 10-6, (~0.0002K) which is what you’d expect. Every field converted from ERA5 would exhibit something like this. When passed through the reconfiguration program to create the initial conditions for the model, some structure begins to emerge
.
I then looked at what effect this had on the model, by running 12 hours of AUS2200 with both sets of initial conditions. The results are interesting. The animation below shows the difference in air temperature at 1.5m as the model progresses.
This plot is an absolute difference. By the end of the 12 hour run, a significant difference is visible across a lot of the domain. I also took a look at the rainfall amounts, and this does appear to have shifted the rainfall a small amount.
The affected versions of Xarray are present in conda/analysis3-24.01
and later, so anyone who’s using ERA5 and ERA5-Land to initialise models (including via era5grib
) should consider sticking to earlier versions of the analysis3 environments if they’re looking to reproduce or continue earlier model runs. Our AUS2200 runs have been using conda/analysis3-23.01
, so they’re consistent in that regard. I’m currently working on an upgrade for era5grib
, and I’ll attempt to reverse engineer the type conversion behaviour of the older Xarray versions as a compatibility mode to ensure that this kind of discrepancy doesn’t show up when the new version is installed.