Share your experience working with ACCESS models and data

varvara · 15 May 2024 03:38

Hi everyone,

I’m working with ACCESS-NRI to collect some ideas for the future direction of our web services. Are there any areas in your work with ACCESS models or data that could benefit from better tooling? Please let us know in this thread.

For example, some feedback we’ve had so far is difficulty in finding and working with input datasets, and lack of access to data in the cloud.

anton · 15 May 2024 23:01

This is something I perceive as as an issue but I am not sure how much people agree. If researchers publish using ACCESS data, it is impractical for them to actually provide the data with the publication (with the exception of the CMIP datasets). So even if they provide their code, its basically impractical for others to replicate their analysis. This is something cloud access could help with (obv noting bandwidth limitations)

Also, there are some science / government organisations that would be interested in ACCESS outputs but are not big NCI users. e.g. cloud access should make it easier to access for researchers using R/Matlab/QGIS on their local machines / a different super-computer / cloud machines.

dougiesquire · 15 May 2024 23:53

There are third party providers of “data lake” platforms for exactly our sort of complex, multidimensional data. Earthmover springs to mind.

anton · 16 May 2024 00:09

https://www.eratos.com/products is also another one which has had some local discussion recently.

An in house solution might be better for us, (e.g. a STAC catalogue of zarr assets / files ? )

Aidan · 16 May 2024 23:52

I’m wary of the big cloud providers, as they have a tendency to use research data as a honey pot to draw users into their paid services.

A public/gov alternative is the Pawsey Acacia object store, where the same data is available internally and also publicly: access is controlled but the data owner.

To me that is the future for ACCESS experiment data and similar. Seamless access internally and externally.

dougiesquire · 17 May 2024 00:42

To be clear, Earthmover is not a big cloud provider. Their main (only?) product Arraylake is a “data lake platform” that is compatible with all major object storage services, on prem or in the cloud. It provides data cataloging, version control, access control for multidimensional data like ours. I think it looks pretty neat - there’s a demo in this video at 11.18s.

(No I don’t receive any form of payment from Earthmover )

Aidan · 17 May 2024 02:40

Oh I didn’t mean to imply they are.

That does sound amazing.

taimoorsohail · 23 May 2024 00:43

Hi everyone,

Just jumping on here to give a +1 to hosting ACCESS data on the cloud. I’ve noticed publications and reviewers (rightly so) pulling us up if we don’t provide code and data that can, at minimum, recreate all paper figures from input data.

With ACCESS-OM2-01, this has meant (for me, at least) sharing some post-processed files on which some analysis has been done, and responding with “The raw data is far too large to share in its original format”.

Maybe others have a better way of sharing the ACCESS data that I don’t know about, but this is what I have done!

Thomas-Moore · 30 May 2024 22:14

arraylake overview

We have lots more catalog features on our roadmap, including

Exposing data catalogs via standard REST APIs such as STAC and OGC

Interactive browsing via the Arraylake web application

+1 for On-prem object store at national HPC centres + zarr + kerchunk + catalogues + arraylake managed metadata

Q: do we know if NCI is committed to supporting serious object store for Gadi replacement cycle?

Topic		Replies	Views
ACCESS Workshop Training Day - what should we offer? Training workshop	0	145	12 June 2024
Processing ACCESS-ESM1.5 output Earth System Model help , technical	3	41	24 March 2025
Poster: Improving the reusability of ACCESS model input data Posters data , atmosphere , infrastructure	1	229	30 August 2024
Poster: If you had 3 data wishes, what would they be? Workshop Posters data , poster , workshop-2024	0	17	2 September 2024
Poster: The ACCESS-NRI Intake catalog Posters python , data , catalogue , infrastructure , established	1	293	30 August 2024

Share your experience working with ACCESS models and data

Related topics