ACCESS-ESM1.6 development

Hi @Jhan, the new restart hasn’t been added to the default configurations yet. Let me know if it’s ready to be added

IWe have a 100 year run. Rachel is about to look at some things. The things we have looked at already, I doubt there will be a problem at this stage but we should wait, just in case.

1 Like

Apologies from me for the stand-up today. I’ve started processing Jhan’s run with the new vegetation distribution to extract relevant variables for plotting - but there are quite a few years where the conversion to netCDF hasn’t been successful.

I’m manually converting them, but there are dozens - so it may take a while

This shouldn’t be necessary. You can use

payu collate -d archive/output???

but replace the output??? with a glob for the outputs not collated, or just pass a list of those directories.

But if they failed because the collation job overran then the collation walltime should be increased first.

We also have the option of running the job with more CPUs and parallelising the collation to make it run faster.

Can you reply with the path to an example output directory that failed to collate?

I think I am using the script payu collate points to, esm1p5_convert_nc
wall time is increased already. Failed jobs time out after taking 3 times longer than usual though. Setting collation wallltime to 5 hrs is probably a bit much

failed um 2 ncdf:
/scratch/p66/jxs599/access-esm/archive/dev-base1-expt-7db342ca/output096

@Jhan where is the experiment control directory for this experiment?

directories that have gaps (that we’re definitely interested in being able to look at in more detail)
/scratch/p66/jxs599/access-esm/archive/dev-base1-expt…
/scratch/p66/jxs599/access-esm/archive/dev-LUH3…
/scratch/p66/jxs599/access-esm/archive/dev-l-rev-cor…

I also seem to remember that our long run has gaps (around year 400) - so
/scratch/p66/prb599/access-esm/PI-concentrations/

(sorry there maybe errors in here - gadi’s playing up for me atm)

Thanks Ian. Can you point me to the control directories for those experiments? Or if they’re on GitHub the repos?

sorry - I don’t know that - you’d need to ask @Jhan and @pearseb .

ps. Pearse’s long run is at /scratch/p66/pjb581/access-esm/pi_concentrations-expt-c55f7217/

1 Like

e.g. /scratch/p66/pjb581/access-esm/pi_concentrations-expt-c55f7217/output398/atmosphere/ failed in its collation/conversion

I’ll have to give you the ones out of base and LUH3 restart - the 2 runs we need for comparrisson - just give me a minute to organise this. I’ve hopefully just got our land version of the new executable (which is now all plain vanilla sources [except for forcing the restart in])

/scratch/p66/jxs599/access-esm/archive/feat-sublimation-expt-32122cc3/output025/
/scratch/p66/jxs599/access-esm/archive/feat-sublimation-expt-32122cc3/output026/
/scratch/p66/jxs599/access-esm/archive/feat-sublimation-expt-32122cc3/output055/
/scratch/p66/jxs599/access-esm/archive/feat-sublimation-expt-32122cc3/output060/

/scratch/p66/jxs599/access-esm/archive/feat-sand-clay-silt-expt-1fa5d2ce/restart027/
/scratch/p66/jxs599/access-esm/archive/feat-sand-clay-silt-expt-1fa5d2ce/restart028/
/scratch/p66/jxs599/access-esm/archive/feat-sand-clay-silt-expt-1fa5d2ce/restart056/
/scratch/p66/jxs599/access-esm/archive/feat-sand-clay-silt-expt-1fa5d2ce/restart057/
/scratch/p66/jxs599/access-esm/archive/feat-sand-clay-silt-expt-1fa5d2ce/restart062/
/scratch/p66/jxs599/access-esm/archive/feat-sand-clay-silt-expt-1fa5d2ce/restart064/

/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output016/
/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output024/
/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output025/
/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output026/
/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output030/
/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output031/
/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output057/
/scratch/p66/jxs599/access-esm/archive/PI-case2e-expt-1c3bc8fd/output063/

Thanks. @spencerwong found some logs and there are broadly two classes of error: exceeding walltime and errors accessing files. The latter are more common than the former.

The errors accessing files are very strange and looks to me like errors with the lustre file system. We are running with this as a working hypothesis as the current configurations produce over 30,000 ocean files that are collated to a total of 177 files.

@spencerwong is testing setting an io_layout parameter to reduce the number of output files by an order of magnitude. This should make the collation a lot faster and should put less pressure on the filesystem, which we’re hoping will solve both classes of error.

We’ll report back when we @spencerwong has done some testing.

If you need any more examples - from the long run /scratch/p66/pjb581/access-esm/pi_concentrations-expt-c55f7217/
output000, output001, output002, output003, output004, output005, output084, output085, output088, output090, output091, output092, output093, output094, output216, output303, output352, output398, output444, output454, output564, output565, output569, output570, output596, output608

all ‘failed’ - but in different ways.

Of these output000 - output005 successfully created nc files but didn’t fully clean out all the old files (.pg files remain). So not really a failure.

The rest are proper ‘failures’ in that the scripts didn’t convert the .pa and .pe files.

Thanks Ian. That is a different failure mode than that identified in my previous message. Will look into it.

Hi all,

I’ve just been chatting to @clairecarouge and we have three ideas/updates:

  1. As the spin-up run is now progressing. We think it’s prudent to carefully document which exact model codes have been used at various phases of the spin-up. For instance, there was a wombat bugfix at some stage and we are about to change executables. I have started post where this could be documented here, this is a wiki so anyone can edit it. @pearseb perhaps you’re the person to fill in as a first draft? Then possibly people such as @tiloz, @matthew.chamberlain, @spencerwong might need to add some finer details. For inspiration, there’s an example from OM2 here – note that you can add additional rows as is helpful to the table.
  2. @clairecarouge and I thought it could be time to simplify our places for discussion. With this in mind, I’d like to move this thread to the CMIP7 dev category and rename to ACCESS-ESM1.6 development (making it public). The other chat for the land development would be closed. Let me know if you foresee any issues.
  3. We’ve moved the minutes to the dev category too, here

If we do not hear any objections to #2 above, we’ll move this thread as above on Friday.

Hi @pearseb, all,

@pearseb thanks very much for filling out the table as requested, here

One small thing, I think it would be good to document when the executable has changed? Or when input files were changed etc.

I’m thinking of an additional section, here’s a possible template, you can see I’ve just added some dummy data but hopefully you get the idea… I’ll DM you the markdown below so you can just edit it.

@tiloz @spencerwong, feel free to chime in on these details if you think your input is relevant…?

Exe changes

Section gives additional detail on the run

Years: 1 - 250
Exe versions: access-esm1p6/pr31-3
Inputs: /g/data/vk83/configurations/inputs/access-esm1p5/ … /g/data/vk83/prerelease/configurations/inputs/access-esm1p6/ …
Outputs: /scratch/p66/pjb581/access-esm/archive/pi_concentrations-expt-c55f7217
Restarts: /scratch/p66/pjb581/access-esm/archive/pi_concentrations-expt-c55f7217
Brief description: Initial spin up run, simulation was stopped when bug was found in wombat sinking
Other notes: Simulation was re-started when it crashed due to X, a smaller time step was used for a year, then continued without issues
People: Pearse Buchanan (@pearseb), ?
Years: 251 - 650
Exe versions: access-esm1p6/pr31-5
Outputs: /scratch/p66/pjb581/access-esm/archive/pi_concentrations-expt-c55f7217
Restarts: /scratch/p66/pjb581/access-esm/archive/pi_concentrations-expt-c55f7217
Brief description:
Other notes:
People: Pearse Buchanan (@pearseb), ?

Hi @cbull and @pearseb. I’ve just added in some details on which executables were used at different stages of the run. Feel free to add any corrections! If you have any more details on what changed between the executables, that would be great too!

1 Like