It looks like the nqstat command might only show jobs on your default project, and so by default won’t show jobs you’ve submitted to lg87. To see if there is anything running on lg87, try:
I tested running the CM2 suite in the Hive Docs documentation (u-cy339) specifying a custom project in the rose-suite.confPROJECT field.
Everything seems to work as expected, and the jobs that run through PBS use the specified project instead of my default one (see nqstat output below):
$ nqstat -P k10 -u dm5220
Job ID S Proj User Queue Job Name Used Request CPUs CPU%
----------------------------------------------------------------------------------
150061525 Q k10 dm5220 express fcm_make_um.09 00:40:00 6
150061527 Q k10 dm5220 normal make_cice.0950 00:05:00 1
150061529 Q k10 dm5220 copyq make_drivers.0 00:05:00 1
150061531 Q k10 dm5220 copyq make_mom.09500 00:05:00 1
You are not supposed to be changing anything else than the PROJECT field. The persistent-session project doesn’t matter (for PBS jobs), it only matters for tasks within the suite that run in the background (but usually these are very quick and use only a bunch of SU).
It seems to me you are trying to run a different suite than the suggested CM2 one (u-cy339). From your nqstat output you shared, a Job Name is atmos. 19000101, but within the u-cy339 suite there is no task which has that name.
What suite are you trying to run using a different project?
Many thanks, Davide!! I was expecting just to change the PROJECT under the rose-suite.conf. But somehow it’s not working with @hao very strange. @hao Manually changed the job ID. maybe should start a new one and test @hao ?? Thanks Davide!!
It is worth noting that the PROJECT that is used to submit a job to PBS can be specified in the Cylc PBS [[[directives]]] and that for some suites this may be hardcoded - {% set PROJECT = ‘dx2’ %} for example - Although this is not a good practice, it is worthwhile checking that it has not been done in your implementation/suite
I’m actually running the suite u-cz934. I just tested a new one but it still didn’t work, so the issue might be related to the structure of this suite. I’ll check the Cylc PBS [[[directives]]]. Thanks again.
@Hao, the suite u-cz934 doesn’t seem to have any project directives set (it has a PROJECT field in the rose-suite.conf but it doesn’t get used within the suite.rc). Therefore, it currently uses always your default project.
Thanks again for your suggestions last week. I added -P = {{PROJECT}} line in the [[[directive]]] of u-cz934’s suite.rc, but the job stays in “submitted“ state and never actually runs.
I also tested u-cy339, setting PROJECT = ‘lg87‘ in rose-suite.conf, but it was also stuck in the “submitted“ state. Do you know what might cause this?
If the job is submitted, then there are many reasons why it may get stuck in the queue. First step - Check that the job has been submitted and is in the correct project with qstat. Second step - Check the comment in the job with qstat -f {JOB_ID} as this will tell you things like “Not enough resources”, “Not enough CPUs”, stuff that can help us determine the cause of the delay. Sometimes there is just a delay, other times, the comment will indicate that the job will never run. But if the job is in the queue and is the right project, then this ‘Solved’ ticket is complete.
Not sure if this helps, but I have had a similar problem before - Have you checked that the storage flags in the suite.rc file contain all the relevant paths/directories needed for your suite to run? This has been a problem for me in the past when changing the compute project that I run my suite from.
Thanks a lot for all your suggestions! I checked again and realized the issue was truly caused by the storage path setting in my suite.rc. I didn’t included scratch path of the previous project in the storage path. After adding it, the jobs are now running fine.