I’ve come across a problem using some of xarray’s .groupby('time.month')
operations that I don’t understand. This might not be the forum for it (should I post on the xarray github?), but I thought I’d post here in case anyone else has noticed this.
I’m trying to calculate a climatology of the standard deviation of temperature and salinity as a function of depth and time (month). I’m using the conda 24.04 kernel. When I calculate the standard deviation using Dask, I get a noisy field and some missing values for salinity (the temperature calculation is fine!), despite there being no missing values in the array that I’m calculating the standard deviation of. If I first load the data then the standard deviation looks fine (see below). Perhaps the problem is related to the flox warnings that show up? I don’t really understand what these warnings are telling me.
I find it pretty concerning that the .groupby('time'month').std('time')
function gives different results depending on whether the data is loaded or not.
Example
Loading data:
Calculating and plotting standard deviation using Dask:
Calculating and plotting standard deviation after loading into memory:
If I use an older kernel (e.g. 23.01) the same problem occurs except that the missing cells are not consistent (different cells show up as missing).
Does anyone know why this occurs and how to avoid it? I won’t always be able to load data to run the calculation. Thanks in advance for any advice.