Experiment title : Global km-scale Hackathon Data
Summary :
In May 2025 21st Century Weather, together with NCI and ACCCESS-NRI took part in the Global km-scale Hackathon km-Scale Hackathon | HK25 homepage. Several simulations were run by different global teams covering 2020-03-01 to 2021-03-01. We transferred 3 models to Gadi: ICON Global, 2.5 km, UM Global, 5 km & 10 km, SCREAM Global, 3.25 km / 128 levels totalling ~100Tb in Zarr format.
Project qx55
is where this data is located and where it will be stored long-term. However before it is ready for long-term storage significant processing needs to be carried out to reduce inode usage (better chunking/sharding) and storage usage (higher compression). SUs are needed for this and some storage is required as a staging ground.
Scientific motivation:
There were many different science goals for this data: km-Scale Hackathon | HK25 homepage
For this ML group the motivation is that the data was very hard to get, it took a month of continuous transfers. It is of very high resolution and global, and could be used to train an AI emulator for downscaling. There have also been several interesting data-driven models that trained on dataset with this type of grid and show good performance.
Experiment Name :
People : Sam Green @sam.green
Model: UM, ICON, SCREAM
Configuration:
Initial conditions:
Run plan:
Simulation details: km-Scale Hackathon | HK25 homepage
Total KSUs required : 50kSU
Total storage required : 50TB
Storage lifetime : High chance just this quarter; Low chance the next quarter too.
Long term data plan : qx55 with further application to NCI or ACCESS-NRI to host qx55
Outputs: UM/ICON/SCREAM Zarr datasets with optimised storage/inode usage.
Restarts:
Related articles:
Analysis:
GitHub - digital-earths-global-hackathon/tools: Pre-processing and preparation of data/ simulations. has notebooks to show how to analyse the data.
Digital Earths Global Hackathon Data Catalog is a catalogue containing all datatsets that were available during the Hackathon.
Conclusion:
The Hackathon was very successful and has produced large datasets that provide benefits to the Australian community for simulation analysis and training ML models. If there are other models/datasets in Digital Earths Global Hackathon Data Catalog that may also be useful to this ML community then we can transfer and store them too.