Running ACCESS-CM2 for less than 1 month

Hi all,

I am having some trouble running ACCESS-CM2 for 1 day long integrations. The coupled job runs fine but the history_postprocess part of the job fails.

(note for others in the netcdf_conversion rose-app.conf i turned off the monthly processing so it will work see below)

NETCDF_STREAMS=d,7,8

Background
I am using suite de209 if you want to try replicate this

  • I am running some ensemble experiments in ACCESS-CM2 and I want new initial conditions for each ensemble member.
  • Each member starts from a different point in the control when there are neutral ENSO conditions.
  • I have restart files from a 200 year control run that are every 10 years. Only 5/20 of these restart years are in a state that I want to use.

To create more ensembles I want to run the model for 1 day so we can create new restart files.

Problem
The model will set up and run the coupled job fine, but then fails in the post-processing in history_postprocess. From what I have gathered, this job fails because we only have daily outputs and this script is trying to do things on the monthly files.

Error message in history_postprocess

ls: cannot access 'fields*.nc': No such file or directory
ls: cannot access 'fields*.nc': No such file or directory
ls: cannot access 'fields*.nc': No such file or directory
ncatted: ERROR file iceh_[dm]* not found. It does not exist on the local filesystem, nor does it match remote filename patterns (e.g., http://foo or foo.bar.edu:file).
ncatted: HINT file-not-found errors usually arise from filename typos, incorrect paths, missing files, or capricious gods. Please verify spelling and location of requested file. If the file resides on a High Performance Storage System (HPSS) accessible via the 'hsi' command, then add the --hpss option and re-try command.
2024-03-12T03:00:19Z CRITICAL - failed/EXIT

The part where it fails seems to be where the history_process.sh script is the nco (ncatted) command

for histfile in iceh_[dm]*; do
  # Fix the calendar
  ncatted -a calendar,time,m,c,proleptic_gregorian ${histfile}
  mv -f ${histfile} ${ARCHIVEDIR}/history/ice
done

How can I make these only work where the outputs are daily?
I do have the restart files in my output directory, but I think it is best to check that the model is running as expected - there are no ice or ocn files in archive/history

1 Like

Hi Seb

Does the post-processing step matter for your work? i.e. I think you are just trying to generate a restart files?

Is daily output turned on in cice_in.nml and diag_table ?

Hi,
The post-processing only really matters in that I want to compare the restarted data to the control where it came form and ensure it looks the same. Otherwise, yes the restart files should be enough to go off.

I haven’t changed the diag tables for ocean ice or atmosphere. i will turn off the monthly outputs and see how it goes

Are there any daily iceh files produced?

In the initial run I am not sure. If they are, they are not copied over to de209/history/ice/ before the jobs history_postprocess job fails.

Semi related is that when restarting from the restart file produced for the second day, files e.g. on 0990-01-02 (jan 2 year 990) the post processing also fails when trying to do the netcdf conversion and history postprocess. I can make this a separate topic and provide more details?

Ah ok. It looks like you need to update the post-processing script to catch the write files. The restart files calendar sounds like a related issue, so I think leave it here.

I’m not familiar with rose - @paulleopardi do you have time to look at this?

I have taken a quick look and am having some difficulty navigating through the suite. It looks like it is not self-contained, in the sense that when I do

rosie co u-de209

I do not see history_postprocess.sh anywhere in the ~/roses/u-de209 directory tree. Where does this script come from?

It comes from this git directory - access-cm2-drivers/src at main · ACCESS-NRI/access-cm2-drivers · GitHub
This directory is cloned in the make drivers part which is only in the suite.rc ( this suite was ported from accessdev and required some changes for it to work)

Oh. I see now. Try commenting out the ‘ice_nc4.py’ and changing

for histfile in iceh_[dm]*; do

to

for histfile in iceh*; do

ice_nc4.py iss trying to merge the daily history output on a single file per month, but presumably failing because the month is incomplete.
The post processing was only moving the iceh_d files that had been merged, changing it to move all iceh files should give you output to work with.

Thanks @anton!
That fixed the issue for post-processing one day of model run.

(edit: I just realized that i made the above change to history_post_process.sh not ice_nc4.py)

Is it worth me forking the git repo to modify the history_postprocess.sh script so that others could use this method more easily?

I will probably make another topic for the starting from day 2 as that seems a little more complex.

This feels sufficiently niche that I wouldn’t worry. I think the fix is the update ice_nc4.py so that it works when there isn’t a complete month of daily files

I am guessing the issue is the same? i.e. the first day of the month is missing in the daily history output, so ice_nc4.py fails?

Yeah something similar to do with no having the first day.

In history_postprocess here is the error. it looks like the script wants to concatenate all dates form month 1 in ice files but cannot. Ideally we compute the monhtly from day 2 for just this one month.

Traceback (most recent call last):
  File "/home/561/sm2435/cylc-run/u-de545/share/access-cm2-drivers/src/ice_nc4.py", line 60, in <module>
    subprocess.check_call(cmd, stderr=subprocess.STDOUT)
  File "/usr/lib64/python3.6/subprocess.py", line 311, in check_call
    raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['ncrcat', '-4', '--deflate', '4', 
'iceh.0990-01-01.nc', 'iceh.0990-01-02.nc', 'iceh.0990-01-03.nc', 'iceh.0990-01-04.nc', 'iceh.0990-01-05.nc', 'iceh.0990-01-06.nc', 'iceh.0990-01-07.nc', 'iceh.0990-01-08.nc', 'iceh.0990-01-09.nc', 'iceh.0990-01-10.nc', 'iceh.0990-01-11.nc', 'iceh.0990-01-12.nc', 'iceh.0990-01-13.nc', 'iceh.0990-01-14.nc', 'iceh.0990-01-15.nc', 'iceh.0990-01-16.nc', 'iceh.0990-01-17.nc', 'iceh.0990-01-18.nc', 'iceh.0990-01-19.nc', 'iceh.0990-01-20.nc', 'iceh.0990-01-21.nc', 'iceh.0990-01-22.nc', 'iceh.0990-01-23.nc', 'iceh.0990-01-24.nc', 'iceh.0990-01-25.nc', 'iceh.0990-01-26.nc', 'iceh.0990-01-27.nc', 'iceh.0990-01-28.nc', 'iceh.0990-01-29.nc', 'iceh.0990-01-30.nc', 'iceh.0990-01-31.nc', 'iceh_d.0990-01.nc']' returned non-zero exit status 1.
2024-03-14T03:13:30Z CRITICAL - failed/EXIT

In netcdf_conversion (converting um files), here is the error. It is because there is no jan file made for the whole month of jan. This is happening now, because I had previously turned off the option for monthly processing when i only wanted the one day of output. But since these runs need monthly data the option needs to be on

Traceback (most recent call last):
  File "/home/561/sm2435/cylc-run/u-de545/share/access-cm2-drivers/src/run_um2netcdf.py", line 121, in <module>
    um2netcdf4.process(input, output, args)
  File "/g/data/access/projects/access/apps/pythonlib/um2netcdf4/2.1/um2netcdf4.py", line 232, in process
    ff = mule.load_umfile(str(infile))
  File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-23.10/lib/python3.10/site-packages/mule/__init__.py", line 1844, in load_umfile
    with open(file_path, "rb") as open_file:
FileNotFoundError: [Errno 2] No such file or directory: '/scratch/e14/sm2435/archive/de545/history/atm/de545a.pd0990jan'
[FAIL] run_um2netcdf.py # return-code=1
2024-03-14T03:13:57Z CRITICAL - failed/EXIT                                           

The model won’t/shouldn’t be able to give monthly output if it hasn’t run for the whole month

Do you need the one day run?

i.e. are you changing the restart files after the one day run? If not, and the first day of output looks like its correct, then restart again from start of the year?

Or, can you just run the one month to make your restart files, instead of one day? You will still get output for the first day to check the experiment has been set up correctly

Hi sorry for the confusion.

I wanted to generate new restart files so i can run ensemble members for my experiments. I currently only have 5 suitable restart years for the experiment purpose and need to create more restart files to run more members

So I thought a simple approach would be to restart from +1 (or 2, or 3…) days from the restarts I already have, by creating new restart files from + 1 day

the ensemble structure would be like this, but for multiple different years.

Restart Year Restart day
0990 1
2
3
4
5

The model won’t/shouldn’t be able to give monthly output if it hasn’t run for the whole month

OK that makes sense. I won’t need to actually analyse the data from January (year 0).
But I would need to work out a way of skipping the full processing from this month so that the rest of the months can process after

Ok, i still don’t quite follow. But if the first month output doesn’t matter, maybe use os.path.exists() to check the files exists before the processing step? os.path — Common pathname manipulations — Python 3.12.2 documentation

Or wrap the lines in try ... except ... statements. 8. Errors and Exceptions — Python 3.12.2 documentation. Be careful with this option though because it would be easy to mask out error you want it to report to you.

OK thanks I will try this and see how it goes. I think using an if os.path ==True else will be what i try to do

Sorry that i wasn’t clear, I am doing ensemble pacemaker runs so need different initial conditions for each ensemble member.
The changed initial conditions come from different restart files, but I need to generate more of them

No all good - have a go and I can review (make a fork on github)

How are the runs in the ensemble different? i.e. just making different restart files should still give the same long term results when run …

The runs are different because I am doing SST restoring (pacemaker) in each. So the starting point when restoring begins is different - the restoring is trying to move towards that temp from a different point

1 Like

I made some small changes to ice_nc4.py and run_um2netcdf.py which allow the script to skip if all are not present or

These changes have allowed the post-processing things to finish. I can submit the pull request if you want, but here is the fork - GitHub - SebastianMckenna/access-cm2-drivers: Driver scripts for the ACCESS-CM2 coupled model. Changes made to allow post-processing work when starting model from after first day of month

I only ran for 1 month to test but note there may be funny things happening with the dates in the ocean files, like ocean_daily.nc-09900201 contains date 0990-01-02 to 0990-02-01 (the days in the month that was run in the job). I will run for a whole year and see what the outputs are like. This could affect the APP4 conversions…

1 Like

You probably need to run the right number of days to get back in sync with the calendar months? Otherwise the model history/ restart will always be out of sync by a day (or 2/3/4/5) and the first month of history output for every run will be incomplete.