COSIMA recipes / on-ramping for new people

Copying the google doc from the discussion at the COSIMA2022 workshop here for the record.

Much of this is already being discussed in other topics and should be followed up there, e.g.


Starter questions

  1. What are your biggest speedbumps? For data access, data analysis, running supplied executables, compiling your own executables
  2. How can we facilitate contributions of recipes etc? – Should we hold another COSIMA recipe hackathon? Maybe a prior round of asking people what they want accomplished. Clear document explaining how to contribute recipes?
  3. We are sharing examples of code mostly through slack now. That expires and it’s hard to search, what would be a better way of sharing code?
  4. ACCESS-NRI is considering updating / replacing the COSIMA Cookbook with something more widely applicable (e.g. to ACCESS-CM2). What capabilities will it need? What is the right balance of generality to specificity?
  5. ACCESS-OM2 wiki: people running experiments find this useful? Any things that should be updated?
  6. mom6-panan: who is using it? Do we want to create a wiki? NOAA-GFDL has this Tutorials · NOAA-GFDL/MOM6-examples Wiki · GitHub
  7. How can ACCESS-NRI help in the above? What are the specific needs and priorities for model evaluation, diagnostics etc. ?
  • Guidelines on how to contribute (github/hive/contact someone?).
  • What exactly will Hive be for and how will it work?
  • Cookbook is not scaling well with higher res models (1/10, 1/20). Can that be improved? Examples on how to handle these big data sets. Chunking. Dask!!!
  • How to encourage/facilitate sharing and developing code? We want diversity of examples.
  • Continuous Integration running the Documented Examples?
  • Can we use more verbose variable names AND make sure we always include METADATA? panant_tpot_drot_xy does not help…
  • Basics of accessing and using models for starters. – There are 2, 3 tutorials. Is it enough?

Hive and ACCESS-NRI

The exact role of the Hive is not clear for everyone in COSIMA, nor the extent to which NRI staff is there to help. For example, is NRI staff going to help us improve code, or maintain the hive? Or both or neither? Should it host discussions about work in progress or just finished and well documented works?

  • Plan of action: maybe dedicate a COSIMA meeting (or half) to a briefing on the Hive, how to organise it and how to contribute?

COSIMA recipes

The recipes have become a bit outdated, and we could definitely populate it with more examples. We think people are not enticed to contribute because of

(i) lack of self-confidence or

(ii) underestimation of how helpful their code could be for others

(iii) the impasse of github lingo and the technicalities that come with it (“what’s a fork, what’s a PR, omg!… all seems complicated… I’m not a coder nor a computer scientist, I just want to analyse some SST…”)

Plan of action

  • Should we maybe do regular hackathons every X amount of time to guarantee (some sort of implicit) “continuous integration/testing” of the Documented examples? This would also help demonstrate to new members how to make contributions and how valuable they are.
  • Can the ACCESS-NRI stuff help set up script that will go through and run all notebooks in the Documented Examples/Tutorials directories in cosima-recipes and make sure they don’t error? This can be done every month or even every quarter when, e.g., the new conda environment changes?
  • Have a small FAQ on how to push to the github cosima-recipes repo for those not familiar with github. Or have this in the ACCESS-Hive somewhere and we put a link in the cosima-recipes README.

Cosima cookbook + Big data

Some (or most) of the notebook examples do not scale well when using higher resolution models (>1/10 of a degree). This should be a necessary improvement moving forward (ACCESS NRI)

Model documentation and metadata

The number of COSIMA experiments is growing a lot. A lot of the experiments are indexed in the database without metadata. The names of the experiments are not enough to understand what they consist of.

  • Remind everyone to document their experiments and contributions.

Wishlist for examples:

  • Calculate neutral density in the model (compare to observations)
  • calculate data along/across contour (1 km contour around Antarctica)

Dask and other model skills

Nobody is confident about their dask skills, and those skills are becoming increasingly necessary as we increase resolution in our models. Tutorials so far have been basic, and have not been very useful in everyday usage, e.g. doing calculations with 1/20 degree resolution models.

  • Could we call an expert to give us a more “tailored” tutorial?

Wilma commented:

We should advertise that we want people to share their code and help maintaining the recipes. “Did one person asked you for your code? Write a recipe.”

comments in google doc:

Wilma:

Agree, we need to organise the scripts (e.g. analysis, visualisation) and have good names for them. At the moment I find myself using recipes for parts of the code that it is not advertised for.

Navid:

You are very kind here. I’d say “Enforce metadata in the experiments included in the database.”
Otherwise it’s just a collection of random numbers that only a few know what they are. Then cosima-database becomes exclusive to only those who know what those variables are.

Andrew:

Agreed.

Navid’s comments:

We had 2-3 dask tutorials with @angus which were helpful.

We want more dask tutorials. But the problem is that if we leave it vague then the tutorials become “basic”. Let’s find 4-5 example notebooks that are OK-ish for 1/4 deg but blow up at 1/10 or 1/20 degree output. Then we can use those are basis for the “Dask and how non-dask-developers people can use it to analyse big data” tutorial?

That sounds like a great idea. I would be happy to contribute to this, as would others on the ACCESS-NRI team: @rbeucher might be the best candidate as he is dealing closely with model evaluation/anaysis, e.g. cookbook.

Hopefully this update to dask:

will solve some of these issues.

This does seem important (along with the note that all conversations should be moved from slack onto this forum). We have a talk lined up this week and were also going to discuss the potential hackathon in January, so there’s not heaps of time. Would a 5 min speal from someone at the NRI be useful?

Sure. Can always of a follow up in another meeting, as I’m sure people will have more questions, especially once they start getting to grips with the forum.

Ok sounds good, you’re on.