Proposal for new contribution guidelines for CABLE

Below you will find a proposal for defining new roles and responsibilities for the development of CABLE. These would come into place after moving CABLE to Git and GitHub. We are trying to strike a balance between keeping developing CABLE simple and ensuring high-quality contributions are made both in terms of science and software.

Please reply to this post with your feedback.


This document explains the roles and responsibilities of the CABLE’s developers and ACCESS-NRI for developing and maintaining the CABLE’s source code.

Note: items in italics listed here are not currently implemented or only partially implemented. They will only apply after their implementation.

Developer’s responsibilities

The developer is the primary author of some code development for CABLE.

Development phase

During the development of a new feature or a bug fix, the developer is responsible for:

  • writing the code modifications in accordance with the CABLE’s coding standards.
  • avoid adding inputs/parameters in a manner that is hard-wired into the code.
  • recording their work with Git and providing adequate information in their commit messages, issues and pull requests on GitHub.
  • opening a pull request on GitHub early in the development (recommended after the first commit) so the community is aware of the work currently in progress. The developer should follow the guidelines to document their pull requests.
  • keeping the pull request updated regularly by pushing new work to GitHub.
  • keeping their code synchronised with the released versions of CABLE. It is expected developers will synchronise their code after each released version of CABLE. ACCESS-NRI will not accept submissions to CABLE from a diverged code base (the special case of historical branches excepted).
  • documenting their code changes according to the documentation guidelines:
    • the User guide must be updated for any change impacting information there. E.g. when adding/removing file inputs or namelist options.
    • the science must be documented within the source code using the FORD formatting.
  • implementing new features within a switch, set in the CABLE namelist file. It can be either a new switch or a new option for an existing switch.
  • resolving any conflicts arising from merging with the main version of CABLE. ACCESS-NRI team can assist with this task.
  • ensuring all automatic tests on GitHub are successful. In case some tests fail for reasons independent of the proposed changes, this responsibility will not apply. The maintainers of the CABLE source code will be able to bypass the test if necessary.
  • if you are introducing new inputs for CABLE:
    • add information about the source of the inputs and the code that has been used to prepare the input files
    • save the pre-processing code in a GitHub repository (ideally under the CABLE-LSM organisation)
    • make the data available to all CABLE users

Submission phase

Once a developer is ready to submit their development for addition to the release version of CABLE, they are responsible for:

  • submitting their development for addition to the next release of CABLE regularly. It is expected feature branches will be merged into the release version of CABLE at least annually. See the section on features with long development times for details on how to handle these.
  • submitting the documentation changes with the code changes and not as a separate submission.
  • running the standard scientific analysis with benchcab for:
    • comparison of the main CABLE code and the feature branch without the new feature turned on
    • comparison of the main CABLE code and the feature branch with the new feature turned on. Not required for long-term development cases (see below).
  • providing the link to the benchcab results in the pull request on GitHub. The results should compare the feature code with the latest state of the main CABLE code. If the main CABLE code changes during the review process, a new run of benchcab will be required.
  • asking for a review by the reviewers’ team.
  • being responsive to the reviewer’s comments.
  • implementing all suggestions from the reviewer that have been agreed on.
  • after approval from the reviewers’ team, merging their development branch into the main branch of CABLE.

ACCESS-NRI responsibilities

The ACCESS-NRI team is responsible for:

  • supporting development work by advising developers on questions about the implementation of new features.
  • supporting developers in keeping their development branch synchronised with the main CABLE code. In particular, the ACCESS-NRI team will help with resolving merge conflicts.
  • reviewing new submissions for their adherence to the code design and the coding standards.
  • supporting developers through the submission process as required. This could include support on:
    • how to run benchcab
    • how to write the pull request description
    • how to use the pull request interface to understand and reply to reviews.
  • releasing a new CABLE version every 6 months. See the release process for details of what it entails.
  • maintaining the automated tests.

Release process

  • synchronising the source code in the CABLE repository with JULES repository.
  • ensuring CABLE in JULES passes the rose-stem tests.
  • implementing new rose-stem tests for JULES as required.
  • ensuring ACCESS-AM passes its rose-stem tests.
  • implementing new rose-stem tests for ACCESS-AM as required.
  • running standard experiments with the new release version
  • running the standard scientific analysis with benchcab and publishing the analysis results in the CABLE documentation for that release.
  • updating the standard scientific analysis in benchcab with additional scientific configurations if required.

Long-term development

Some development work can necessitate more than 6 months or a year to be fully ready. For these cases, we still recommend submitting the development in stages. This means some development will be submitted before it is ready to be used for general use. Follow these guidelines:

  • make sure each development stage allows the code to be compiled. Ask for advice from the ACCESS-NRI team on how to plan the development if necessary.
  • develop your changes within a switch.
  • use benchcab to ensure the results from CABLE are unchanged if your switch is turned off.
  • add your switch to the list of switches protected by the switch “developer_only”. This ensures your new additions can only be used if both your switch and the “developer_only” switch are on.
  • highlight, in the CABLE documentation, that the feature is not ready for general use.

Process when a “developer_only” feature is ready for mainstream use

When the developer of a new feature wants to take the feature out of the “developer_only” switch, they need to submit a new pull request with these code changes:

  • any relevant new code and documentation modifications for the introduced feature.
  • remove the feature switch from the list protected by the “developer_only” switch
  • update the documentation of the feature switch in the CABLE’s User Guide to remove any restriction on the use of that switch
  • provide all the benchcab results as per the guidelines for submission, including comparing the code with the new feature turned on and the latest version of the main CABLE code.
1 Like

Thanks, Claire!

Sometimes, tests can fail but for reasons unrelated to the work done as part of the pull request. E.g. maybe an external library is updated/downgraded and breaks something elsewhere in the code. Is it possible an authorized person (i.e. an ACCESS-NRI responsibility?) can merge a pull request if there are failed tests that are unrelated?

Indeed, I’ll add that. That’s an easy one.

What about changes that require new input data, e.g. something irrigation related?

  • Requirements for documenting the process of going from original data to whatever CABLE is actually using.
  • Preprocessing code in a repo somewhere (ideally CABLE github)
  • Making the data available somewhere for all CABLE users?
  • Making it available in some form that we can eventually build into a pipeline for generating all the online CABLE ancillary files?

Is there a licence agreement for CABLE contributors that describes who has the IP?

Does benchcab have any computational performance measures?

Is it possible that a proposed new scheme could be so expensive that you wouldn’t add it even on a switch?

Could a new scheme give such a poor comparison with observations that you wouldn’t add it even on a switch?

The License is here: https://github.com/CABLE-LSM/CABLE/blob/main/License.md That’s a BSD license modified by CSIRO. The copyright is owned by CSIRO.

About the input data, I haven’t dealt with it yet. This will come later and will likely include the creation of a collection of input data for CABLE that is publicly available in some form. I could add an item to that effect in the developer’s responsibilities knowing it will not be applicable right away.

Providing the code to pre-process the data is a good idea and easy to add now.

The last point for generating all CABLE (and JULES?) ancillary files is a good idea but I don’t think I have a clear enough idea of what form the data should be in for this. So I feel I would not know how to write a guideline for this right away. Or it would be too vague to be useful.

For benchcab and the performance questions, I am still not sure of the process to add new configurations to the standard set in benchcab. We will not test all combinations of switches, at least it won’t be the default set. So we have to define when to add something new to the set.
About the computational cost, it currently takes about 46min to run the full set of tests. There are other tests that take a lot less time (about 3-8min) but these are intended for quick feedback during development and not at submission time.

For the question about the scientific performance of a new scheme, as long as the results are within an acceptable physical range, the new scheme will be accepted in CABLE. This comes from the fact a perfectly valid new scheme could clash with an undiscovered bug or something like that and give poor performance. It is why it is important to publish the scientific performance of each release for various configurations so users can check them before using the code.
Additionally, accepting the poorly performing new scheme allows others to experiment to try and identify the reason for the performance.

Thanks all.

An element that I’m not in agreement here is in the submission phase

  • after approval from the reviewers’ team, merging their development branch into the main branch of CABLE.

A couple of reasons - first this gives the broader community write access to the trunk (which is a bad idea), second (in the unlikely event of multiple submissions within a release cycle) it risks creating issues (not formal conflicts) in the trunk because of how/the sequence the merging has been done.

Realistically (I would say) the final merge needs to be an NRI responsibility and be accompanied by either checks that the trunk hasn’t moved on since the initial branching and/or rerunning of benchcab based on the updated proto-trunk.

While it may be part of the coding standards - I would also add that developers need to

  • avoid adding (further) inputs/parameters in a manner that is hard-wired into the code.

We probably need to outline the approach/steps needed for things to be moved from developer_only to community use - I can’t think of anything special TBH but since we’re implementing this it should be noted one way or the other.

(Now in the correct place)

There’s an interesting balance here between freedom to operate and not making the code base so convoluted it’s unmanageable.

Generally I would say that we (CABLE community and the NRI) shouldn’t be proscriptive in this manner on current/recent developments, but equally

  • we shouldn’t be allowing expensive/poor performing options to be required be safeguarded under benchcab (what forms the set of configurations to test in benchcab is a different topic)
  • we/NRI need to devise a means of assessing options and if necessary removing them from the code base

It would be appropriate to include guidance around what size of update can be accommodated in one go vs having to split up.

For the NRI’s consideration - how are we going to manage the situation that the CABLE trunk evolves (all tests pass) yet something breaks either the ACCESS-AM and/or UM-JULES rose tests?

Do you say to the developer ‘sorry can’t go in to the CABLE trunk’ or do we allow for a difference between the CABLE trunk and the version in MOSRS?

Given how long UM/JULES tickets take I can see this situation as highly likely to occur - and it’s also possible that the conflict will not originate in CABLE.

I had problems in the past with tickets because the code was not at an appropriate standard in the first place. This means the code was not properly indented, there were multiple blank lines, etc.
I have to admit that I find it very, very hard to read code that is not properly formatted. It is hard to find which END belongs to which block, etc. I have to indent it, which creates lots of differences to the original code and created lots of concerns with the tickets in the past.

Thus I think the code should be brought to a coding standard before such guidelines come into place. You should give the appropriate settings for standard editors. For example I have different setups for different projects in my Emacs config, such as:

  (defun f90-eddypro-style ()
    (interactive)
    (setq f90-do-indent 4)
    (setq f90-if-indent 4)
    (setq f90-type-indent 4)
    (setq f90-program-indent 4)
    (setq f90-associate-indent 4)
    (setq f90-critical-indent 4)
    (setq f90-continuation-indent 4)
    )
  (defun f90-default-style ()
    (interactive)
    (setq f90-do-indent 3)
    (setq f90-if-indent 3)
    (setq f90-type-indent 3)
    (setq f90-program-indent 2)
    (setq f90-associate-indent 2)
    (setq f90-critical-indent 2)
    (setq f90-continuation-indent 5)
    )

which I can easily switch (e.g. M-x f90-eddypro-style).

Kind regards,
Matthias

No it doesn’t give everyone write access. In GitHub, only GitHub will have write access to the main branch. Not even ACCESS-NRI. So merging in this case means pressing the button on GitHub, no matter who does it. The reason I prefer the original author to do it is that it gives them a last chance to retract the submission if they discover something is not right.

That is the run of benchcab that the person submitting the changes will have to provide. If the review takes so long that the state of the proto-trunk has changed during the review, a new run of benchcab will be required. I will add this to the list.

Thanks for the feedback, Matthias. I agree that reading poorly formatted code is really hard. At the same time, I feel that if we delay the implementation of guidelines for new submissions, we will end up chasing our tails, accepting new code submissions that brought us further off.

I would rather approach the issue in this way:

  • implement the guidelines
  • implement the coding standards
  • provide developers with settings for editors that we know. Thanks for the settings in Emacs as I had no idea how to write these.
  • go on a formatting spree for CABLE existing code to bring it up to standard quickly. This would only focus on the formatting of the code since that is what can create a lot of whitespace differences.

Note that GitHub has the option to hide the differences with only whitespace changes from the review panel. This would help to deal with submissions that contain formatting changes and science changes.

Does that seem reasonable to you?

About the size of a submission, I am still trying to formulate this one.

Hey Claire,

I think that one should do the effort and go through all the files and do some refactoring. The formatting is really easy. In Emacs (C- is Ctrl, M- is Alt or Option):

  1. Open file: C-x C-f
  2. Set Fortran style (here default, should be defined for CABLE): M-x f90-default-style
  3. Mark whole buffer: C-x h
  4. Remove tabs: M-x untabify
  5. Mark whole buffer: C-x h
  6. Remove trailing spaces: M-x delete-trailing-whitespace
  7. Mark whole buffer: C-x h
  8. Indent code: M-x indent-region
  9. Replace all multiline blank lines by single blank line: M-x query-replace-regex and then replace ^J\{3,100\} by ^J^J where ^J is for newline and can be inserted for example by C-q C-j

The last step asks you if you want to replace at each occurrence. Typing ! replaces all the following occurrences.

I did that on our branch of CABLE on every file. Does not take long and is really worth the effort.
So you could even record 2-9 into a macro and then you only have to press ! at the end. Tell me if you want me to provide you with one. The macros are written in the init files of Emacs, can be read rather easily and hence be adapted to your liking.