Poster: Connecting evaluation tools across the LSM community: ME.org Client + HPCPy

Title

Connecting evaluation tools across the LSM community: ME.org Client + HPCPy

About

With the advent of modern evaluation tools (i.e. benchcab, modelevaluation.org), there is an increasing demand for greater automation and deeper integration to simplify the process of running standard benchmarking experiments to test land surface model (LSM) developments. Previously, this has been a manual process – requiring users to tediously upload large volumes and numbers of data files via a website.

Working closely with the team at modelevaluation.org, the Land Surface Modelling Team at ACCESS-NRI has developed the ME.org Client, a software package designed to automatically transfer data produced by our benchmarking software (benchcab) to modelevaluation.org, triggering remote analyses to shorten the feedback cycle from scientific question to actionable model statistics.

Poster

Note: this topic is part of the 2024 ACCESS Community Workshop Poster session

Hi @ben I scanned the QR code on your poster

Hi, am I able to get access to the HPCPY repo? My githib username is “frizwi”. We basically use an ssh connection (via FABRIC) to launch pbs jobs on gadi, just wondering if this is more streamlined.
Thanks!

Hi @Aidan, thanks for putting the poster and QR code up for me. Did you have any questions?

1 Like

Hi @frizwi, the repository is public and available at GitHub - ACCESS-NRI/hpcpy: Python client for interacting with HPC scheduling systems.. You should be able to access it, but let me know.

HPCPy currently supports the launching of jobs on the same machine on which it is installed. However, a remote launch option as you describe would be possible to implement and something I would like to do in future.

The process of “installing” the job scripts to the server is just a matter of sending a script to a given directory, then issuing a remote ssh command to actually submit it - the only real concern would be that whatever the job script is actually doing on the HPC would need to have all dependencies/libraries installed on the HPC itself, as HPCPy is not intended to replace the likes of something like Cylc.

Does that answer your question? Happy to consider feature suggestions on the GitHub if you wanted to contribute to development.

Thanks @ben for the proper link for the repo - yes, all good! I had “orgs” in it for some reason.

Okay that makes sense about the use case for HPCPy. The use case we have is that we have a web service running on a VM via which the user configures/submits a model run but it needs to execute on HPC. At CSIRO’s HPC, our VM and supercomputer share the same filesystem so it’s easy to set everything up and then do the final submit via ssh. On NCI, perhaps the same thing could be achieved via Nirin (or Nectar?) - I just feel like there should better supprt by HPC centres to support client tools not just on the head/compute nodes.

No worries @frizwi,

I’ve some experience with the CSIRO HPC - I believe they are using SLURM rather than PBS, unless it has changed since I was last logged in? HPCPy has been developed to be scheduler agnostic, however, the bulk of the development has been against PBS as that is what a lot of users are familiar with.

That being said, it wouldn’t be hard to port the PBS directives to SLURM, it is just a matter of setting up the mappings and command templates.

As long as the VM has access to the HPC via SSH (via ssh-keys), it is possible to achieve what you are suggesting.

1 Like

Hi Ben,

ME.org sounds really cool. I guess my question was getting at: are we duplicating a whole lot of data, i.e. is it easier to run the evaluation metrics locally too and just save the metrics? or is that not really a problem?

Hi @Kim_Reid , that is a valid question and one that I have brought up with the guys at ME.org. The intent of the service is for people to run different configurations etc. and evaluate them in a standardised way. So, in a sense the data is not really duplicated if everyone is running different configurations.

Of course, eventually things are going to fill up over there, so we are looking at implementing a garbage-collection routine to clean up the data periodically, retaining only the results of the evaluation itself.

Storage is always going to be an issue, I am reminded of this every time I log in to Gadi where all my projects are above 99%…

Does that answer your question?

Hi @ben,

Yes, thanks for the response!

No worries, @Kim_Reid, anytime.

@frizwi, I’ve raised an issue for this functionality over here. Feel free to add your suggestions.

1 Like