Data
Various forms of data is stored within the CABLE SVN repository in addition to the CABLE source code:
- Tumbarumba test case
- User scripts to prepare or post-process data for/from CABLE
- Ancillary input data for CABLE: spatial data already formatted for CABLE to use as initial conditions and static data.
- Archival data
We propose to separate the CABLE source code from any other data so that the CABLE repository is only for CABLE itself and its documentation
Tumbarumba test case
The Tumbarumba test case is included with CABLE’s source code specific to offline simulations. This was done so people can simply get the source code, compile it and directly run the test case. Simple. However, it makes it very hard to distinguish which files are part of the source code and which are part of the experiment setup and output. It is a lot nicer to run experiments from a directory separate from the source code.
We propose to move the Tumbarumba test case to a separate git repository. This repository would contain all the data needed to run CABLE at the Tumbarumba flux site. This method allows us to create a template for sharing test cases. We can then easily extend the collection of test cases for CABLE. The template would have to be adapted for spatial simulations as we wouldn’t share large datasets (e.g. meteorological forcing) via GitHub but it could follow the same principle.
In addition, on a shared server (at NCI for example), this setup offers the advantage it is then possible to provide a pre-compiled CABLE executable as a module for example and users don’t have to worry about the source code and can simply deal with the input information.
User scripts
There are currently no scripts distributed with the trunk version of CABLE. But other branches include various scripts in addition to the CABLE code.
For branches that need to be transferred to the GitHub CABLE repository, ideally, user scripts should be transferred to GitHub repositories.
Scripts used by a single user
In this case, the user can choose the solution they prefer. The only requirements are the scripts can not stay within the CABLE repository and the SVN repository will not stay around forever. The proposed solution in this case is for the user to move their scripts to repositories under their own GitHub account.
Scripts used and developed by a team
For these scripts, it is important to keep the possibility of collaborative development. We propose the developers of these scripts could move them to repositories under the CABLE LSM GitHub organisation.
Ancillary input data
This is the data that is stored under CABLE-AUX. We propose to manage this type of data like reference datasets following the standards for data management. This would allow us to satisfy two requirements:
- access to the data from any machine for anyone who wants to use CABLE
- versioning of the data independently from the CABLE source code
This will take some time to put in place. We propose to have a transition period when the CABLE code will be on GitHub while the ancillary data will still be sourced from the SVN repository (under CABLE-AUX).
Archival data
Some people may have used the CABLE repository as a means to archive their model setup to comply with journal requirements for example. This means we can not remove access to the SVN repository for some time. See the discussion about the future of the SVN repository for more details.
The proposal for the future is for users to use their institutional archival systems and/or GitHub and/or Zenodo.