I am trying to run a CM2 suite which fails to successfully submit a job.
I have made a local copy of a suite to run an ensemble, this might not be the recommended way, but it has worked for me previously. The only aspect that is different to what I have done in the past (to my knowledge), is that I am running this under a different project. When I do rose suite-run it opens up the gui which suggests that the jobs have been submitted and they do show up in qstat, however, only for a short time and then they disappear. The gui suggests the jobs are still submitted, but gadi suggests otherwise.
I have tried to do a rose suite-clean with no success.
Hi @wghuneke thanks for posting your query! I’ll try to find someone to answer your query, but they may not get back to you until next monday. In the meantime, to help us figure out the problem, can you post any error messages or logs that you see? Could you elaborate on what you mean by “gadi suggests otherwise”?
Thanks,
Ed
Hi Wilma, I’ve had this issue (or similar) before!
I was restarting my simulation from a run that was saved to scratch/e14. When I updated to run the simulation using a different compute project, it couldn’t find the restart directory because the suite.rc file automatically updates the storage flags to use the compute project. Is this a similar situation to your case? If so, you could try updating the storage flags in the suite.rc file manually to make sure the restart directory can be found.
Most likely others have better advice on this, but just wondering if there are any issues with the “persistent session” needing to be rebooted after you were recently added to the project y99. Might be that you already did that.
I’ve tried running u-cy339 and changed the rose-suite.conf->PROJECT setting from my default project to lg87. This failed in a similar way to how you described.
The rose suite-run command first sets up a working directory on scratch, where the suite builds the executables and writes the output and logs. However, by default it creates it using the $PROJECT environment variable instead of the rose-suite.conf->PROJECT setting. E.g. with $PROJECT=tm70 and rose-suite.conf->PROJECT=lg87, the working directory is still set up in scratch/tm70:
[u-cy339]$ rose suite-run
[INFO] export CYLC_VERSION=7.9.7
...
[INFO] create: /scratch/tm70/sw6175/cylc-run/u-cy339
As @hrsdawson pointed out, the PBS storage flags are based on the rose-suite.conf->PROJECT setting. When this and the $PROJECT environment variable don’t match, the suite fails to find the working directory and crashes.
It’s a bit awkward that the rose-suite.conf->PROJECT setting isn’t able to control everything itself, but there are a a couple of workarounds. These affect the final file locations differently:
Add your default project to the PBS storage flags in suite.rc
In this case, the working directory will be set up on your default project. If ARCHIVEBASE='/scratch/$PROJECT/$USER/archive' in the rose-suite.conf file, the model output will also be archived on the default project.
Switch the $PROJECT environment variable to match the rose-suite.conf->PROJECT setting when submitting the run. This can be done for the current gadi session with
switchproj <new-project>
In this case, the working directory and model output will be written to the new project’s scratch directory.
That’s great! @spencerwong since a few people have come across this issue, I’m wondering if it’s worth updating the run ACCESS-CM2 instructions to include this detail (i.e. update storage flags if changing the compute project for an exisiting run). What do you think?