Intake Virtual Icechunk v0.2.0 Release Announcement: intake-esm style catalogues backed by Icechunk

CharlesTurner · 27 May 2026 00:48

What is intake-virtual-icechunk?

intake-virtual-icechunk is a Python package for building and reading Icechunk-backed catalogues from existing intake-esm datastores.

The goal is pretty simple: take an existing intake-esm datastore, build an Icechunk-backed store from it, and keep the same user-facing catalogue experience. In other words, using one of these catalogues should feel the same as using an intake-esm catalogue: the same style of search, selection, and opening data into xarray, without users needing to care about Icechunk-specific plumbing.

What are we releasing?

ACCESS-NRI has released v0.2.0 of intake-virtual-icechunk:

Together, the v0.1.0 and v0.2.0 releases move the package toward being faster, more reliable, and easier to use for building Icechunk-backed catalogues from existing intake-esm datastores.

What landed in v0.1.0?

The main milestone in v0.1.0 was that the package should now work end-to-end for both:

local filesystem stores (Gadi)
S3 / S3-compatible object stores (Pawsey)

That release also improved Ceph-backed test coverage and bumped the minimum zarr version to support rectilinear chunk grids.

What’s new in v0.2.0?

The headline functional change in v0.2.0 is support for ingesting intake-esm datastores and reserialising them as an Icechunk store without virtualisation. That means the package can now handle a broader range of source datastores, including cases where a fully virtual workflow is not possible due to serialisation limitations.

Why a new package?

Performance

Current intake-esm catalogues often touch many more files than is strictly necessary to extract a subset of data for analysis. On Gadi, this can often be a major and confusing performance limitation. For example, opening grid information files can often take up to a minute in intake-esm, as xarray needs to touch all matching grid files in order to open just the first. In intake-virtual-icechunk, this takes less than a second, as all this information is held within the catalog, not computed on the fly.
No concatenation necessary: by creating an icechunk store backing a catalog, all concatenation operations are performed at build time, not read time. For a dataset backed by 500 netCDF files, this reduces the typical time to open (not even load) the dataset from around 3 minutes to about 1 and a half seconds.

Ergonomics

Although intake_virtual_icechunk retains the same API as intake_esm, combining all datasets into a single icechunk store greatly reduces the amount of work necessary to obtain an xarray dataset. Filtering for time ranges can now be done on the dataset objects directly, without having to worry about opening more files than necessary.
Dataset attributes are included in the datastore by default. If an attribute was written into a dataset, it will appear in the catalog, letting you search it.

Reliability

The same icechunk store backs the catalog, and the data within it. In intake-esm, if a file is moved, deleted, or renamed, the catalog can ‘go stale’, and break in confusing ways. In intake virtual icechunk, if data is moved or deleted, the catalog will tell you what has happened.
Catalog metadata is computed on the fly - so every time you ask for variables or variable_cell_methods, you get exactly what is in the dataset.

Future Proofing

Intake Virtual Icechunk uses the latest and most robust data tooling developed by the Pangeo and PyData communities.
Platform Agnostic: Icechunk supports file system and all major object store interfaces. This means that catalogues built with this package can be stored on disk on Gadi, or in Acacia on Pawsey, and interacted with with no further considerations about storage mechanism.
Zarr Based: Icechunk implements a transactional storage layer for zarr. By transforming an intake-esm datastore to an intake-virtual-icechunk store, the underlying NetCDF dataset can be readily streamed around the planet, without having to reserialise the data. inode explosions are avoided, and alternative executors such as cubed can be use instead of Dask.
For those interested in interactive dataset exploration and distribution, a sister package (intake-virtual-icechunk-ts) designed to read these data catalogues in the browser and facilitate streaming interaction of the data contained within the catalogue as an exploration mechanism is also under development.

How should users think about it?

This is not a whole new analysis interface.
Icechunk-backed catalogues built with intake-virtual-icechunk should be used in essentially the same way as an intake-esm datastore.
The storage backend changes, but the user-facing catalog workflow should stay familiar.

Currently, no icechunk backed catalogues are in the ACCESS-NRI Intake Catalog, and we will not make any default transitions until the new technology is fully mature. In the meantime, we will post in here as we virtualise catalogues and make them publicly available.

Useful links

• v0.2.0 release notes: Release v0.2.0 · ACCESS-NRI/intake-virtual-icechunk · GitHub

• v0.1.0 release notes: Release v0.1.0 · ACCESS-NRI/intake-virtual-icechunk · GitHub

• Repository: GitHub - ACCESS-NRI/intake-virtual-icechunk: An intake plugin for building and reading Icechunk stores from existing esm-datastores via VirtualiZarr and intake-esm. Admins: @charles-turner-1 @rbeucher · GitHub

• Issues: Issues · ACCESS-NRI/intake-virtual-icechunk · GitHub

Topic		Replies	Views
Intake 2: The future Technical python , data , catalogue , intake	1	300	4 October 2023
Icechunk: Earthmover open-sources their ArrayLake backend Technical zarr , data , database	15	215	30 September 2025
Making Intake Datastore for panantarctic COSIMA help , mom6 , intake	16	263	7 February 2025
Intake loading aice_m as two separate datasets COSIMA help , cosima , access-om2 , intake	1	49	3 July 2026
Intake vs mfdataset Technical python , help , intake	17	235	6 May 2025