Can't kill rose/cylc suite

Hi all.

I have a repeat of the issue @sonyafiddes encountered back in 2024 : Cannot kill a suite

But if I kill my persistent session and restart it, I still can’t connect to my rose/cylc site.

The situation:

$ rose suite-run -- --mode=simulation --hold
[FAIL] Suite "dummy" appears to be running:
[FAIL] Contact info from: "/home/<directory?>/cylc-run/dummy/.service/contact"
[FAIL]     CYLC_SUITE_HOST=gadi-login-02.gadi.nci.org.au
[FAIL]     CYLC_SUITE_OWNER=<username>
[FAIL]     CYLC_SUITE_PORT=43034
[FAIL]     CYLC_SUITE_PROCESS=3365851 python2 /home/548/pag548/code/cylc-flow/bin/cylc-run dummy --mode=simulation --hold
[FAIL] Try "cylc stop 'dummy'" first?

$ cylc scan

(returns nothing)

$ ps -fu <username>

(returns no cylc tasks)

$ rose suite-shutdown

Really shutdown dummy? [y or n (default)] y

Cannot connect: http://gadi-login-02.gadi.nci.org.au:43034/set_stop_cleanly?kill_active_tasks=False: HTTPConnectionPool(host='gadi-login-02.gadi.nci.org.au', port=43034): Max retries exceeded with url: /set_stop_cleanly?kill_active_tasks=False (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f7bedc05690>: Failed to establish a new connection: [Errno 111] Connection refused',))

**[FAIL]** cylc shutdown dummy --force # return-code=1

Kill my persistent session and restarting it has no effect.

I’m stumped.

Hi @Paul.Gregory,
I am not sure why Cylc thinks the suite is still running in your case.
However, it might happen (especially when you stop a suite using qdel ...) that Cylc doesn’t record that and doesn’t delete the ~/cylc-run/<suite>/.service/contact file (which is the metadata file that tells Cylc how to contact the running suite).

In these cases, I would try deleting the file manually before retrying to submit the job:

rm -f `~/cylc-run/dummy/.service/contact`

Small note

I don’t remember if you need to delete the actual file (in the /scratch directory) as well or only the symlinked one in your home directory.
You can test it out or remove both:

target="$HOME/cylc-run/dummy/.service/contact"
rm -f "$(readlink "$target")" "$target"
1 Like

Cheers Davide. Deleting that file in the cycle-run directory solved the problem.

1 Like