Looks like accessdev is down again. We were trying to add your umami package to conda and I tried to connect to Jenkins to see if it succeeded.
Which reminds me that we haven’t discussed a long term plan for Jenkins once accessdev is gone.
httpd is not responding. It appears to be stuck trying to access some files on gdata that I’m not sure it should have access to. /etc/httpd/conf.d/25-accessdev.conf was modified at 3:29am this morning, probably by puppet. There are a huge amount of errors in /var/log/httpd/error_log starting at 3:29am about processes not being able to be killed.
I think this part of the config is the problem:
<Directory "/home/*/*/public_html" >
    Order      Deny,Allow
    Deny       from all
    Allow      from 134.178 124.47.137.7 124.47.159.2
    AuthName   "accessdev"
    Options    Indexes SymlinksIfOwnerMatch IncludesNOEXEC
    Satisfy                     Any
    AuthType                    Basic
    AuthBasicProvider           ldap
    AuthLDAPURL                 "ldaps://ldap.nci.org.au/ou=People,dc=apac,dc=edu,dc=au?uid"
    AuthLDAPGroupAttribute      "memberUid"
    AuthLDAPGroupAttributeIsDN  off
    Require     ldap-group cn=access,ou=Group,dc=apac,dc=edu,dc=au
</Directory>
This is matching with /home/548/sjr548/public_html/, which has symlinks into /g/data. root cannot read from /g/data due to the NFS root-squash option. Can whoever did this please revert the config?
I take that back, I think the config above has been there for a while. Might be a gdata NFS problem.
I put in a change yesterday for the auth of rose-bush, I’ll investigate
httpd failed to restart properly, the old processes are still present and not killable with sudo kill -9. they’re hanging on to port 443 so a new process can’t start
Yep, can see that. Tried remounting the /g/data directories containing files opened by stuck processes, no luck there unfortunately.
I think our culprit is pid 3082, it has a different stack than the rest of the httpd processes. Its stuck in nfs_idmap_id, and the rest are waiting for it to finish. Config change is a red herring, these processes seem to have been stuck since 9:44am.
Is there anything we can do here, or do we need to restart the server?
Nope, needs a reboot.
Ok, I’ll send out a notice and schedule the reboot
It’s being slow to come back online
Closed for archival.