Accessdev is down?

Looks like accessdev is down again. We were trying to add your umami package to conda and I tried to connect to Jenkins to see if it succeeded.
Which reminds me that we haven’t discussed a long term plan for Jenkins once accessdev is gone.

1 Like

httpd is not responding. It appears to be stuck trying to access some files on gdata that I’m not sure it should have access to. /etc/httpd/conf.d/25-accessdev.conf was modified at 3:29am this morning, probably by puppet. There are a huge amount of errors in /var/log/httpd/error_log starting at 3:29am about processes not being able to be killed.

I think this part of the config is the problem:

<Directory "/home/*/*/public_html" >
    Order      Deny,Allow
    Deny       from all
    Allow      from 134.178
    AuthName   "accessdev"

    Options    Indexes SymlinksIfOwnerMatch IncludesNOEXEC

    Satisfy                     Any
    AuthType                    Basic
    AuthBasicProvider           ldap
    AuthLDAPURL                 "ldaps://,dc=apac,dc=edu,dc=au?uid"
    AuthLDAPGroupAttribute      "memberUid"
    AuthLDAPGroupAttributeIsDN  off

    Require     ldap-group cn=access,ou=Group,dc=apac,dc=edu,dc=au


This is matching with /home/548/sjr548/public_html/, which has symlinks into /g/data. root cannot read from /g/data due to the NFS root-squash option. Can whoever did this please revert the config?

I take that back, I think the config above has been there for a while. Might be a gdata NFS problem.

I put in a change yesterday for the auth of rose-bush, I’ll investigate

httpd failed to restart properly, the old processes are still present and not killable with sudo kill -9. they’re hanging on to port 443 so a new process can’t start

Yep, can see that. Tried remounting the /g/data directories containing files opened by stuck processes, no luck there unfortunately.

I think our culprit is pid 3082, it has a different stack than the rest of the httpd processes. Its stuck in nfs_idmap_id, and the rest are waiting for it to finish. Config change is a red herring, these processes seem to have been stuck since 9:44am.

Is there anything we can do here, or do we need to restart the server?

Nope, needs a reboot.

Ok, I’ll send out a notice and schedule the reboot

It’s being slow to come back online