Intake vs mfdataset

I have been trying to use intake to do some analysis with @KZCurtin using the 1/20th degree PanAntarctic model. I don’t personally use intake - I find that using xr.open_mfdataset with a preprocessing function works smoothly 99.9% of the time - but intake seems a cool tool with potential.

Anyway, we were trying to do a temperature average of the first 500m in the Antarctic shelf, and using intake (with and without preprocessing, with the kwargs suggested in the cosima-recipes ) the kernel crashes. Doing the same thing with mfdataset and preprocessing works just fine.

I am not looking for a specific solution, I am personally very happy continuing to use mfdataset. But I did spend some time yesterday trying to make intake work. I am no dask wizard, and this all would perhaps be solved by doing some smart chunking magic, but I don’t think it is realistic to expect every intake user to be a dask expert, specially since I understand intake is aiming to provide a high-level way of opening datasets.

I have made a notebook showing this issue (Intake_vs_mfdatarray.ipynb · GitHub). I’ve used 28 cores on the normalbw queue.

1 Like

@JuliaN Thanks for sending the notebook to illustrate the issue. @dougiesquire will look into it and will let you know what he finds so others can find the information if they have the same issue.

1 Like

@dougiesquire See @CharlesTurner 's reply here, he says intake ESM has a problem in analysis3-25.04, I have no idea if this is linked to what is happening here, but it might be worth trying with a previous version first.