I’m working with ACCESS-NRI to collect some ideas for the future direction of our web services. Are there any areas in your work with ACCESS models or data that could benefit from better tooling? Please let us know in this thread.
For example, some feedback we’ve had so far is difficulty in finding and working with input datasets, and lack of access to data in the cloud.
This is something I perceive as as an issue but I am not sure how much people agree. If researchers publish using ACCESS data, it is impractical for them to actually provide the data with the publication (with the exception of the CMIP datasets). So even if they provide their code, its basically impractical for others to replicate their analysis. This is something cloud access could help with (obv noting bandwidth limitations)
Also, there are some science / government organisations that would be interested in ACCESS outputs but are not big NCI users. e.g. cloud access should make it easier to access for researchers using R/Matlab/QGIS on their local machines / a different super-computer / cloud machines.
An in house solution might be better for us, (e.g. a STAC catalogue of zarr assets / files ? )
1 Like
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
5
I’m wary of the big cloud providers, as they have a tendency to use research data as a honey pot to draw users into their paid services.
A public/gov alternative is the Pawsey Acacia object store, where the same data is available internally and also publicly: access is controlled but the data owner.
To me that is the future for ACCESS experiment data and similar. Seamless access internally and externally.
To be clear, Earthmover is not a big cloud provider. Their main (only?) product Arraylake is a “data lake platform” that is compatible with all major object storage services, on prem or in the cloud. It provides data cataloging, version control, access control for multidimensional data like ours. I think it looks pretty neat - there’s a demo in this video at 11.18s.
(No I don’t receive any form of payment from Earthmover )
1 Like
Aidan
(Aidan Heerdegen, ACCESS-NRI Release Team Lead)
7
Just jumping on here to give a +1 to hosting ACCESS data on the cloud. I’ve noticed publications and reviewers (rightly so) pulling us up if we don’t provide code and data that can, at minimum, recreate all paper figures from input data.
With ACCESS-OM2-01, this has meant (for me, at least) sharing some post-processed files on which some analysis has been done, and responding with “The raw data is far too large to share in its original format”.
Maybe others have a better way of sharing the ACCESS data that I don’t know about, but this is what I have done!