Newest `conda env` kernels silently change model output analysis results

navidcy · 3 April 2024 04:58

I was quite alarmed to see this issue

New python kernels silently give different results in `Cross-contour_transport.ipynb`

opened 12:05AM - 03 Apr 24 UTC

🐞 bug ❓ question

When using a kernel newer than `analyis3-23.04`, the cross contour transport cal…culated in this notebook is wrong (see black line compared to blue line where kernel `analyis3-23.04` was used). If anyone is keen to look into what has changed in the new python versions and packages that changes the results, that would be great and please assign yourself. Otherwise I will add a sentence about which kernel to use at the top of the notebook for now and look into it during the COSIMA hackathon. ![image](https://github.com/COSIMA/cosima-recipes/assets/38216156/83ecd093-3fca-478d-8df4-512fbc9295b5) edit: changed kernel from `analysis3-22.07` to `analysis3-23.04`

that reports that some of the model analysis results silently change when moving to newest conda analysis env!

I should bring everyone’s attention to it in case other people have been having puzzling results and are struggling to figure out where they are coming from!

If you have seen similar issues please post at this issue – we are trying to sort it out.

adele-morrison · 3 April 2024 05:12

Others have definitely had this same issue with the notebook tagged above. @polinash @hrsdawson @Wilton_Aguiar

Not sure about other recipes though.

navidcy · 3 April 2024 07:01

I would encourage people to please report these things sooner rather than later.

Others might be silently struggling a lot to interpret weird results they have been getting…

polinash · 4 April 2024 05:06

I’m one of the silently struggling…

polinash · 4 April 2024 05:13

First, I started getting memory allocation errors when I re-run the code with a bit longer dataset. So I switched to the later conda env (conda-23.10) and the memory allocation issue has gone but the plots with cross-contour transports didn’t make any sense.
Upon seeing this post, I rerun the code with older envs (23.01 and older) and plotting looks alright but the notebook can’t handle longer dataset which I need and keeps killing workers, etc.

In my case, I need newer condas to handle my dataset and older condas to compute the transport correctly. In current situation, it seems like I can’t have both…

Thomas-Moore · 5 April 2024 00:47

Slight sideways comment here:

A python package can have tests written for its methods.

Is there any best practice for “writing tests” against an important collection of notebooks which a community use as “working” or “operational” code? Or is this kind of forum discussion and counting on a careful eye the only & best approach?

Those silent failures are the scary ones . . . .

anton · 5 April 2024 02:01

Best practice would be a set of tests to do the same thing as our notebooks created by a completely independent team, using different software tools. An impractical approach for anything but the most critical software.

A possible approach we could take is regression testing. Which is to save a reference set of results / runs of the notebooks and then compare them between before / after a change is made.

For us, the issues arise because we don’t use a fixed set of dependencies (i.e. python packages) and instead desire to be keeping up to date to the latest releases of packages. So a user might use any version of conda/analysis3 and we hope that the results don’t change between versions. If the design of the some dependency has changed, then the new results might change, or worse, if a bug was introduced, then results might change too.

willrhobbs · 5 April 2024 02:48

Full disclosure: in a former (prescience) life I was a professional software tester (Y2K basically funded my Masters!)

The ‘correct’ way to do this would be to document test cases for each function, with expected outputs for a given set of inputs, as a ‘test script’. (The word script in this case is ambiguous, it would be a document rather than a code script).

The design of each functions test case should focus on cases close to sensible numeric limits (we called these ‘boundary conditions’ but again, ambiguous in ocean modelling). So for example, does a function behave sensibly with a salinity close to zero compared to negative salinity, which is physically impossible (except in the old HadGEM model…)

Obviously these test cases can all be implemented into an executable test script, but there’s value in just sitting down with pen and paper and thinking about what the sensible use cases of a function are first.

In an ideal world we should be doing this as standard (and encouraging students to), so every jupyter notebook would start with a cell outlining the test cases. But, it is extra work…

navidcy · 5 April 2024 04:52

I resonate @Thomas-Moore’s comment.

But cosima-recipes is not a python package but rather a collection of notebooks. These notebooks use python packages which if tested properly they should catch changes in behaviour and if that was intentional they should issue a deprecation warning (e.g. “that methods your_fav_method() will behave differently from version X.Y.Z”) or something.

Regression tests (that @anton suggests) are a way to catch issues like that and we could discuss implementing some of those that will run automatically once a week on the HPC.

I also think that it’s very good idea to try to convey to people the notion of “testing the boundaries” of a method/function they write (comment by @willrhobbs). This is extremely useful concept. I haven’t really thought about it in a formal way as @willrhobbs discussed it. I don’t think we should enforce this to the notebooks since this will make the barrier of newbies contributing to the recipes even higher. But it’s such a useful concept that one should at least have it in the back of their mind.

I so much often see code that is not general enough and its limitations are neither documented nor asserted. For example, someone writes a method/function that works only for a very particular case and will fail if things are slightly different. Then someone else, who naively sees the existence of such method/function use it for their case and gets nonsense results.

def compute_zonal_mean(dataarray):
    return dataarray.mean('xt_ocean')

might suggest that this function computes the zonal mean. But in reality, it computes the zonal mean only for data arrays that have xt_ocean as their coordinate and it also assumes that the coordinate xt_ocean runs across constant latitude values. This function will silently give wrong results if used at the Arctic and will fail if used with MOM6 or MITgcm output.

I’d like to touch on these issues on July 1st Workshop that we are organising.

navidcy · 29 July 2024 07:33

This bug has been resolve via Fix bugs in Cross-contour_transport take 2 by adele-morrison · Pull Request #423 · COSIMA/cosima-recipes · GitHub

Topic		Replies	Views
Cosima cookbook updating needs Workshops cosima-workshop-2022	7	501	14 November 2022
Hh5 analysis environment in ARE Infrastructure	4	265	12 September 2024
Gadi_jupyter script no longer working (for some users) - Is there a fix? Technical python	9	351	14 June 2023
Nci_era5grib no longer working Unified Model	10	256	11 December 2023
Latest version of dask (2022.11.0) could fix many workflow issues Technical python , dask	6	432	5 January 2023

Newest `conda env` kernels silently change model output analysis results

Related topics