Icechunk: Earthmover open-sources their ArrayLake backend

Aidan · 16 October 2024 05:59

Earthmover.io are open sourcing Icechunk, the back-end software for their ArrayLake offering:

This is a seriously impressive bit of software that adds transactions and versioning to zarr datastores.

Thomas-Moore · 16 October 2024 22:24

I’m hoping Pawsey HPC engineers are having a serious look for their warm tier object storage cluster, Acacia? And given the hope that NCI will increasingly include on-premises object store capacity that NCI HPC engineers are getting across this to? @Aidan, what is your view on best ways to socialise this across the community?

Aidan · 16 October 2024 22:53

Find some like-minded folks around here and look into uses cases? I know @anton and @MartinDix are very keen on improving versioning of important data, like model inputs.

Proof of concept demos?

Get someone who is familiar with it to give a presentation?

Thomas-Moore · 17 October 2024 23:45

We might get a list together of key people and then ask EarthMover folks to think about tailoring a virtual talk about how to use Icechunk for on-prem object store?

Thomas-Moore · 18 October 2024 02:28

mdsumner · 5 December 2024 04:00

I had a chat with Ryan today, we’re setting up a trial account which I’ll use to explore Pawsey object storage. I’ve struggled with the Rust dependency on the docker/Singularity images I use on Pawsey, I know it’s not that hard but it’s just another thing to add to my Python challenges. I think earthmover would jump at working with anyone with GADI experience and object storage.

As another thing, has anyone looked at Arkouda? Pangeo Showcase: "Arkouda as an XArray backend for HPC!" - Pangeo Showcase - Pangeo

Just watched that and they’re interested in testers on HPC. I’m pretty comfy on Pawsey now, but only have limited experience on GADI with Python tooling (so if anyone wants to explore and hand-hold with me that’d be awesome).

Thomas-Moore · 5 December 2024 23:49

This is awesome.

I’m currently in an email chat with Ryan about setting up a virtual showcase for NCI / Pawsey / Australian folks in late January or early February.

rbeucher · 6 December 2024 01:13

That sounds great!

Thomas-Moore · 6 December 2024 01:34

Hey @mdsumner et al

Ryan would love to have a chat with core folks before he gives a wider showcase.

He’d like to speak to data users ( and those who care about them ) about what the current pain points and problems are. He’d like to ask us some questions to shape his presentation.

Can we target this preliminary chat for late January? Who should be on it? I’m happy to help coordinate and organise.

mdsumner · 9 December 2024 00:25

I’m keen. Anton Steketee, Lenneke Jong, Ben Raymond come to mind.

Aidan · 9 December 2024 08:58

Pinging @anton and @lmjong in case they’re interested in contributing.

Topic		Replies	Views
Zarr 2.14.0 includes experimental support for sharding Technical python , storage	8	387	15 July 2025
Share your experience working with ACCESS models and data Technical	8	239	30 May 2024
How to efficiently chunk data for faster processing and plotting? Technical python , cosima , access-om2	5	170	29 September 2024
Xarray to_zarr causing errors in new xp65 env that weren't present in hh5 General python , help , dask , inscope , climate-conda-enviro	5	104	10 June 2025
Things Learnt From eResearch 2024 Technical eresearch	6	60	2 November 2024

Icechunk: Earthmover open-sources their ArrayLake backend

Related topics