Share your experience working with ACCESS models and data

Hi everyone,

I’m working with ACCESS-NRI to collect some ideas for the future direction of our web services. Are there any areas in your work with ACCESS models or data that could benefit from better tooling? Please let us know in this thread.

For example, some feedback we’ve had so far is difficulty in finding and working with input datasets, and lack of access to data in the cloud.

4 Likes

This is something I perceive as as an issue but I am not sure how much people agree. If researchers publish using ACCESS data, it is impractical for them to actually provide the data with the publication (with the exception of the CMIP datasets). So even if they provide their code, its basically impractical for others to replicate their analysis. This is something cloud access could help with (obv noting bandwidth limitations)

Also, there are some science / government organisations that would be interested in ACCESS outputs but are not big NCI users. e.g. cloud access should make it easier to access for researchers using R/Matlab/QGIS on their local machines / a different super-computer / cloud machines.

3 Likes

There are third party providers of “data lake” platforms for exactly our sort of complex, multidimensional data. Earthmover springs to mind.

2 Likes

https://www.eratos.com/products is also another one which has had some local discussion recently.

An in house solution might be better for us, (e.g. a STAC catalogue of zarr assets / files ? )

1 Like

I’m wary of the big cloud providers, as they have a tendency to use research data as a honey pot to draw users into their paid services.

A public/gov alternative is the Pawsey Acacia object store, where the same data is available internally and also publicly: access is controlled but the data owner.

To me that is the future for ACCESS experiment data and similar. Seamless access internally and externally.

2 Likes

To be clear, Earthmover is not a big cloud provider. Their main (only?) product Arraylake is a “data lake platform” that is compatible with all major object storage services, on prem or in the cloud. It provides data cataloging, version control, access control for multidimensional data like ours. I think it looks pretty neat - there’s a demo in this video at 11.18s.

(No I don’t receive any form of payment from Earthmover :grin:)

1 Like

Oh I didn’t mean to imply they are.

That does sound amazing.

Hi everyone,

Just jumping on here to give a +1 to hosting ACCESS data on the cloud. I’ve noticed publications and reviewers (rightly so) pulling us up if we don’t provide code and data that can, at minimum, recreate all paper figures from input data.

With ACCESS-OM2-01, this has meant (for me, at least) sharing some post-processed files on which some analysis has been done, and responding with “The raw data is far too large to share in its original format”.

Maybe others have a better way of sharing the ACCESS data that I don’t know about, but this is what I have done!

4 Likes

arraylake overview

We have lots more catalog features on our roadmap, including

  • Exposing data catalogs via standard REST APIs such as STAC and OGC
  • Interactive browsing via the Arraylake web application

+1 for On-prem object store at national HPC centres + zarr + kerchunk + catalogues + arraylake managed metadata

Q: do we know if NCI is committed to supporting serious object store for Gadi replacement cycle?

2 Likes