I have a suite that has run successfully, but still seems to think it is running.
rose sgc tells me ‘stopped with ‘succeeded’’
If I try
cylc stop 'u-di850'
I get:
Cannot connect: https://cylc.slf563.jk72.ps.gadi.nci.org.au:43092/set_stop_cleanly?kill_active_tasks=False: HTTPSConnectionPool(host='cylc.slf563.jk72.ps.gadi.nci.org.au', port=43092): Max retries exceeded with url: /set_stop_cleanly?kill_active_tasks=False (Caused by NewConnectionError('<urllib3.connection.VerifiedHTTPSConnection object at 0x7f169ae32e90>: Failed to establish a new connection: [Errno -2] Name or service not known',))
cylc kill 'u-di850' gives same response.
Anyone know how I can stop this suite so I can re-run it!?
I am sure someone else will have a more thorough response, but it you want to run it again in the short-term, could you copy the roses/u-di850 directory to roses/u-di850.restart and then run from the new roses directory?
It won’t solve the initial problem but might be a workaround …
If you are still having any trouble with the job not stopping please email help@nci.org.au and hopefully they can point you in the right direction. Hopefully the problem resolved by itself.
Thanks Paul and Chermelle - actually the problem still persists. The rose commands all give me a similar error message as the original. The cylc scan command is telling me:
ERROR: [Errno -2] Name or service not known: cylc.slf563.jk72.ps.gadi.nci.org.au
I had also suspected this was because I’ve somehow stuffed up my persistent sessions - and possibly this confirms that to. I’ve not got a lot of time today to look at it, but I will try to take another look probably next week & let you know if I resolve it! I’ll email nci help as suggested too if I can’t figure it out then!
Thanks Chemelle + Paul - I’ve managed to fix it - was a persistent session problem. It had clearly been killed I guess with the maintenance they had to do a few weeks ago and I just didn’t even think of it…