MOM6 Calendar Error (Potentially Payu related?)

I’ve got two problems related to running MOM6 on Setonix:

  1. After running up to 31st December of year one (1991), it won’t start a new run crossing over to year two, outputting the error:
FATAL from PE  2091: diag_manager_mod::register_diag_field:  file=ocean_month_z: Invalid date. Date=1992-02-31 00:00:00
  • This happens even when trying to run a single day. I have suspicions that it’s potentially due to how I’ve run the first year - rather than single monthly run-increments, it’s been messy and staggered (e.g., run 5-days here, run 3-months there). This was due to troubleshooting and also the following payu error…
  1. Payu doesn’t listen to arguments given after payu run (e.g., payu run -n 3). This is working fine for @ChrisC28 on the same machine so I might’ve done something along the way to disable this accidently.

Has anyone come across either of these errors in the past?

It’s not wrong:

31st of February is pretty darn invalid. I wonder why it didn’t complain about the 30th?

What calendar are you using? What forcing product?

You’re falling into a serious support crack using payu on Setonix. As in, it isn’t supported. @angus-g might be able to assist if there isn’t a simple solution.

For simple solutions:

  1. How are you invoking payu? (like literally cut n’ paste the command you’re using, and the stdout payu spits out)
  2. Are you using the same install as @ChrisC28?
  3. Do you have any aliases set that might be interfering, like say to python?

Thanks for these suggestions Aidan.

We are using the “NO LEAP” calendar, with JRA55-do RYF forcing.

(base) jreilly@setonix-01:/software/projects/pawsey0410/jreilly/eac_003_v5_copy1> payu run -n 2
128
128
----------
payu: warning: Job request includes 118 unused CPUs.
payu: warning: CPU request increased from 6154 to 6272
sbatch -A pawsey0410 --time=06:30:00 --ntasks=6272 --exclusive --ntasks-per-node=128 --wrap="/software/setonix/2022.11/software/cray-sles15-zen3/gcc-12.1.0/python-3.9.15-xjiu6sfxngs3gl5nmq6sqxlicihla66p/bin/python3.9 /software/projects/pawsey0410/jreilly/setonix/python/bin/payu-run"
Submitted batch job 1973498

The first two lines are from some debugging Chris and I were doing. There were also some other issues that Chris had fixed previously, related to strict specification of number of tasks/node and number of nodes that you might see different to the gadi version.

Separate installs, but as far as I can tell they are the same. I just wasn’t entirely sure how to point to Chris’ install in my conda environment. I tried conda-develop /path/to/chris/payu/library/ which didn’t seem to work.

No aliases.

Myself and Chris have been weighing up whether it is worth the effort persisting with payu here as it has been a bit of a headache for the past few months. And especially when we start to look at setting up IAF configurations, I anticipate a lot more challenges. Any advice on alternative approaches? Or best ways forward?

As far as "not listening to command line arguments like -n 2, payu passes information like number of runs remaining to subsequent submissions via environment variables

If there was anything interfering with environment variables in your job submission that might be culprit.

It’d be a damn shame to have come this far and drop it now.

I don’t know what the plans are for ACCESS-NRI supporting users at Pawsey, but it is not an impossibility, and we’re about to get an influx of new staff in the next few months so resourcing some assistance is on the cards if we were to go down that route. I don’t want to give you false hope, but there is a glimmer …

As far as alternative approaches, I don’t really have a good suggestion. It needs someone to log in and take a gander, and I’m afraid I just don’t have the time to do that these days. Maybe if you asked @angus-g really really nicely he’d take a look.

2 Likes

Agreed. I can have a poke around

Thanks for that direction Aidan. I’m going to spend some time understanding the workflow of payu a bit more, then it might be good to catch up with @angus-g some time soon to find what’s tripping things up. Angus, if it’s alright with you, I’ll message you directly when I know enough to at least follow along with what you might think the problem is. Maybe we could catch up over zoom with @ChrisC28 for a quick discussion on best way forward on Setonix?

For what it’s worth, I don’t think this has anything to do with payu. It’s probably hitting some edge case in FMS setting up a monthly diagnostic file, due to the sporadic run segments. It probably makes sense to disable monthly diagnostics for such short run segments anyway, since they won’t contain anything.

1 Like

Looks like your right (once again) with that suggestion of the FMS problem @angus-g . It’s able to run now that I’ve removed the monthly diagnostics. It’s not that important now, but any suggestion on how to start saving monthly output without starting from day 0 again?

There’s still the issue of payu not responding to command line arguments but I should be able to find the problem for this one.

In regards to setting up Interannually Forced runs, are there any quick pointers? Otherwise I’ll have more of a read and then make a new post if/when I get stuck.

Thanks again!

1 Like

I guess you’d probably want to run a segment to line back up with the start of a month, and then resume running segments of at least one month in length from there on.

I think payu prints its submission command when you run it, I think that would be a useful diagnostic to see if it’s setting the correct environment variable, but it’s not making it through. Or perhaps look at the slurm output for a run you expect to continue, and see if it fails to execute the resubmission command.

I don’t think there’s anything automated for this, unfortunately. @ashjbarnes might have some experience with this by now? You can fiddle around with the data_table after a year of running, although I expect FMS might complain about trying to interpolate beyond the end of the year unless everything is set up perfectly on the time axes.

Great, thanks. I’ll look into all of those points.

I found some good links on this forum post that should be helpful for the IAF configuration.

@ashjbarnes - have you made any progress on this front?

I think that pretty much mirrors my suggestion, except it automates editing the data_table to point to the relevant forcing files for the running year.

1 Like

Yeah the calendar has been a pain! I made a forum post a few days ago outlining some real headaches with FMS’s documentation on the matter

What I’ve done in the automated pipeline is to reset the calendar of all my forcing files to be DAYS SINCE your experiment start date. I don’t yet know whether this will mess other things up, but at least it handles issues related to calendar type (unless, god forbid, you trip to run your model on an actual leap year)

I haven’t automated data_table modification for IAF runs though and probably won’t. It would be awesome if someone put this functionality into the pipeline though!

1 Like

If we use a sufficiently modern version of FMS, with -Duse_yaml specified when building, it actually supports a data_table.yaml. That would be pretty easy to process with Python-based tools if necessary. I’m not sure about how to automatically grab the date from the current/latest run though. Otherwise, just replacing a sentinel value like YEAR from a template file is easy enough.

2 Likes

I’m going to give this a crack over the next few days and can update next week with how I go. It seems quite out of my depth but worth a go haha

1 Like

Oh! Don’t tease me @angus-g! A standard format?!