Intake vs mfdataset

Just flagging - the intake to dask efficiently notebook in cosma-recipes contains some info on chunking, & I’ve currently got a PR open which adds a bit more detail to that & makes it clearer chunking is involved.

The recommendation in that notebook is generally to start with chunks = 'auto' and go from there - but like Dougie says, it’s kinda fundamentally impossible to give a perfect default chunking.

In addition, from conda/analysis3-25.05 onwards, the access_intake_utils package can be used to inspect the disk chunking & adjust user defined chunks to respect disk chunking: see here. The tooling will work on a list of files & intake catalogues alike, & should work on an xarray dataset/dataarray if the internal file references are stored in the fashion it expects (not guaranteed though).

I’ll get that example folded into the PR because AFAIK xarray doesn’t expose any information about disk chunks, nor make it clear they are separate to dask chunks.