Intake vs mfdataset

CharlesTurner · 6 May 2025 23:49

Just flagging - the intake to dask efficiently notebook in cosma-recipes contains some info on chunking, & I’ve currently got a PR open which adds a bit more detail to that & makes it clearer chunking is involved.

The recommendation in that notebook is generally to start with chunks = 'auto' and go from there - but like Dougie says, it’s kinda fundamentally impossible to give a perfect default chunking.

In addition, from conda/analysis3-25.05 onwards, the access_intake_utils package can be used to inspect the disk chunking & adjust user defined chunks to respect disk chunking: see here. The tooling will work on a list of files & intake catalogues alike, & should work on an xarray dataset/dataarray if the internal file references are stored in the fashion it expects (not guaranteed though).

I’ll get that example folded into the PR because AFAIK xarray doesn’t expose any information about disk chunks, nor make it clear they are separate to dask chunks.

Topic		Replies	Views
ACCESS-NRI Intake catalogue for CM2-025 simulation Earth System help , cosima , model-evaluation	18	212	24 February 2025
Dask remove time chunks for Fourier transforms Technical dask , xarray , chunking	9	450	30 August 2023
Making Intake Datastore for panantarctic COSIMA help , mom6 , intake	16	257	7 February 2025
Xarray warnings while loading data using cosima cookbook Technical python , help , cosima	15	611	5 November 2024
Intake Virtual Icechunk v0.2.0 Release Announcement: intake-esm style catalogues backed by Icechunk ACCESS-NRI Releases python , data , intake	0	37	27 May 2026

Intake vs mfdataset

Related topics