Accessdev is down?

Paola-CMS · 6 July 2023 22:46

Looks like accessdev is down again. We were trying to add your umami package to conda and I tried to connect to Jenkins to see if it succeeded.
Which reminds me that we haven’t discussed a long term plan for Jenkins once accessdev is gone.

dale.roberts · 6 July 2023 23:59

httpd is not responding. It appears to be stuck trying to access some files on gdata that I’m not sure it should have access to. /etc/httpd/conf.d/25-accessdev.conf was modified at 3:29am this morning, probably by puppet. There are a huge amount of errors in /var/log/httpd/error_log starting at 3:29am about processes not being able to be killed.

I think this part of the config is the problem:

<Directory "/home/*/*/public_html" >
    Order      Deny,Allow
    Deny       from all
    Allow      from 134.178 124.47.137.7 124.47.159.2
    AuthName   "accessdev"

    Options    Indexes SymlinksIfOwnerMatch IncludesNOEXEC



    Satisfy                     Any
    AuthType                    Basic
    AuthBasicProvider           ldap
    AuthLDAPURL                 "ldaps://ldap.nci.org.au/ou=People,dc=apac,dc=edu,dc=au?uid"
    AuthLDAPGroupAttribute      "memberUid"
    AuthLDAPGroupAttributeIsDN  off

    Require     ldap-group cn=access,ou=Group,dc=apac,dc=edu,dc=au



</Directory>

This is matching with /home/548/sjr548/public_html/, which has symlinks into /g/data. root cannot read from /g/data due to the NFS root-squash option. Can whoever did this please revert the config?

dale.roberts · 7 July 2023 00:07

I take that back, I think the config above has been there for a while. Might be a gdata NFS problem.

Scott · 7 July 2023 00:09

I put in a change yesterday for the auth of rose-bush, I’ll investigate

Scott · 7 July 2023 00:17

httpd failed to restart properly, the old processes are still present and not killable with sudo kill -9. they’re hanging on to port 443 so a new process can’t start

dale.roberts · 7 July 2023 00:34

Yep, can see that. Tried remounting the /g/data directories containing files opened by stuck processes, no luck there unfortunately.

dale.roberts · 7 July 2023 00:59

I think our culprit is pid 3082, it has a different stack than the rest of the httpd processes. Its stuck in nfs_idmap_id, and the rest are waiting for it to finish. Config change is a red herring, these processes seem to have been stuck since 9:44am.

Scott · 7 July 2023 01:06

Is there anything we can do here, or do we need to restart the server?

dale.roberts · 7 July 2023 01:26

Nope, needs a reboot.

Scott · 7 July 2023 01:37

Ok, I’ll send out a notice and schedule the reboot

Scott · 7 July 2023 03:19

It’s being slow to come back online

lachlanswhyborn · 12 September 2024 05:55

Closed for archival.

Topic		Replies	Views
Grafana post-accessdev Infrastructure	15	282	10 March 2025
ACCESS/ROSE cannot access to Gadi Technical	2	445	11 May 2023
Gadi down? General	8	105	8 January 2025
Should we move all transient conversations/random requests for help off slack onto the forum? Technical	8	441	29 March 2023
Transferring UMUI experiments to gadi Technical	6	162	21 February 2024

Accessdev is down?

Related topics