Payu functionality, naming of output folders

Using the newly released version of payu and finding the new output directory naming rather frustrating. Previously, the output folder would have the same name as the control folder, which was convenient. Now, instead, a (random?) experiment ID gets appended to the end - which is not useful (quite the opposite!) for my use case. As such, I’m wondering if there is a simple way (e.g., a flag I can set: legacy = true maybe!) to return to the previous default behaviour. Reading the docs I can see that setting

experiment: “control_folder”

in config.yaml will have the desired effect, but is there a way to do this without having to manually add the control folder name in config.yaml for every run…?

Thanks in advance.

1 Like

Hi @CallumJShakespeare, I second your opinion and prefer using the old way which was to have the output folder have the same name as the control folder. So far I have dealt with this by using a legacy version of payu, e.g.

module load payu/1.1.6

perhaps not a long-term solution but it does allow you to make your archive folder the “old” way.

payu creates a symlink to the archive folder where the outputs are stored, so they shouldn’t be difficult to find.

Adding part of the experiment_uuid means work and archive directories avoid namespace clashes. e.g. when you called an experiment the same name but in different directories, and payu found the experiment directory from the same experiment name and assumed it was the same experiment.

Not all change is bad.

There is a work-around to achieve the behaviour you want, but I strongly advise against it.

I’m curious why this strong preference?

Appending the UUID solves a long standing problem that can trip up experienced and new users without them realising it with a small change to the automated naming of an archive directory.

I don’t understand the “long standing problem” you’re referring to here? Edit: ahh okay, I see it comes about if one has multiple directories with experiments in them. I’ve never done that, so never had this problem.

My objection is that previously I knew in advance exactly how the output directory would be named, so it was simple to find programatically. Now it’s an extra step to find out what the appended exp-id is before I know what the directory name is to use in scripts… hence its just an extra annoyance that seems entirely unnecessary as it worked perfectly well before…

Don’t worry, I am using the work around… and creating a bunch of symlinks to deal with ones I’ve already run and are oddly named :slight_smile:

It might be that it affected users who cloned shared model configurations, e.g. COSIMA experiment configurations, and used them as-is. If they did this in different directories and used the same PROJECT code to run them, payu would find the archive from the initial experiment and happily begin writing to it rather than beginning a new experiment.

You can get the previous behaviour by setting

metadata:
  enable: False

in your config.yaml.

When you do this payu will no longer create a metadata.yaml file or generate a unique experiment ID.

We’re in the process of adding the ability to add these unique IDs to the global attributes of the model output netCDF files to improve provenance and tracking. This will not be available if you disable metadata generation and would make it harder to ingest model data into an ACCESS-NRI intake catalogue.

5 posts were split to a new topic: Transport matrix diagnostic: using payu to run many related jobs

However I take your point @Aidan that I could use the archive symlink that’s already in the control folder and this would be a way of dealing with it.

Thanks! I’ll do that. I don’t use intake and generally don’t have a need to share my output from bespoke models so provenance and tracking is not a priority…

1 Like

Good that we have solution that works for you.

Fair enough, however provenance can be very useful when a data file gets moved out of its payu context, e.g. copied to another directory.