Payu and change of project on gadi

I am unable to checkout, set up or run ACCESS-ESM1.6 using payu after changing the project code.

Changing the project code in the config.yaml file after payu clone and submitting to run will not even write an error message, but job gets cancelled immediately. Email message reads:

PBS Job Id: 159733600.gadi-pbs
Job Name: test_pi_proj
Post job file processing error; job 159733600.gadi-pbs on host gadi-cpu-spr-0158

Changing the project code in shell from p66 to xh94 (using switchproj) before payu clone will not even let me checkout the code:

stderr: 'Cloning into ‘/g/data/p66/txz599/ACCESS-ESM1p6/testfolder/test_pi_xh94’…
fatal: cannot copy ‘/g/data/vk83/./apps/base_conda/envs/payu-1.2.1/share/git-core/templates/hooks/prepare-commit-msg.sample’ to ‘/g/data/p66/txz599/ACCESS-ESM1p6/testfolder/test_pi_xh94/.git/hooks/prepare-commit-msg.sample’: Disk quota exceeded

There is plenty of storage under p66, but xh94 does not have a gdata allocation, but this should not matter?

Thanks Tilo - this may be a help desk question

Could you include the contents of the .err file on from 159733600.gadi-pbs please?

The second error I suspect is because xh94 doesn’t have a gdata allocation. My incomplete understanding is that quotas are based on the group membership of files (which should be the same as Project in that test) rather than the files location.

Thanks Anton. There was no error file produced, just the email. I guess it is the same issue that it could not write anything due to the change in project code. Other people were able to run ACCESS-ESM1.6 using project code xh94, so I guess it has something to do with my environment/setup.

Also, I can write data on gdata p66 after switching to xh94. It just doesn’t work with payu.

-rw-r–r-- 1 txz599 xh94 8 Feb 5 12:28 test.txt

1 Like

Thanks Tilo - does qstat -xf 159733600 reveal anything?

Specifically do the storage paths look correct and does the comment reveal anything ?

(Alternatively - do payu run and confirm the storage paths are added to the command printed to the terminal)

I had to set this up again using xh94 as project in config.yaml and writing on scratch p66 using shortpath.

.err and .out files are empty. Job was only running for about 20 seconds. Output of qstat -xf and email below.

[txz599@gadi-login-09 test_pi_proj]$ qstat -xf 159875585
Job Id: 159875585.gadi-pbs
Job_Name = test_pi_proj
Job_Owner = txz599@gadi-login-09.gadi.nci.org.au
resources_used.cpupercent = 0
resources_used.cput = 00:00:00
resources_used.mem = 0b
resources_used.ncpus = 520
resources_used.vmem = 0b
resources_used.walltime = 00:00:00
job_state = R
queue = normalsr-exec
server = gadi-pbs-01.gadi.nci.org.au
Checkpoint = u
ctime = Thu Feb 5 13:10:09 2026
Error_Path = gadi.nci.org.au:/g/data/p66/txz599/ACCESS-ESM1p6/test_pi_proj/
test_pi_proj.e159875585
exec_host = gadi-cpu-spr-0151/0104+gadi-cpu-spr-0152/0104+gadi-cpu-spr-01
55/0104+gadi-cpu-spr-0156/0104+gadi-cpu-spr-0161/0*104
exec_vnode = (gadi-cpu-spr-0151:ncpus=104:mem=536870912kb:jobfs=307200kb)+(
gadi-cpu-spr-0152:ncpus=104:mem=536870912kb:jobfs=307200kb)+(gadi-cpu-s
pr-0155:ncpus=104:mem=536870912kb:jobfs=307200kb)+(gadi-cpu-spr-0156:nc
pus=104:mem=536870912kb:jobfs=307200kb)+(gadi-cpu-spr-0161:ncpus=104:me
m=536870912kb:jobfs=307200kb)
group_list = xh94
Hold_Types = n
Join_Path = n
Keep_Files = n
Mail_Points = a
mtime = Thu Feb 5 13:10:47 2026
Output_Path = gadi.nci.org.au:/g/data/p66/txz599/ACCESS-ESM1p6/test_pi_proj
/test_pi_proj.o159875585
Priority = 0
qtime = Thu Feb 5 13:10:09 2026
Rerunable = False
Resource_List.jobfs = 1572864000b
Resource_List.mem = 2748779069440b
Resource_List.mpiprocs = 520
Resource_List.ncpus = 520
Resource_List.nodect = 5
Resource_List.place = free
Resource_List.select = 5:ncpus=104:mpiprocs=104:mem=549755813888:job_tags=n
ormalsr:jobfs=314572800
Resource_List.storage = scratch/xh94+gdata/vk83+scratch/p66+gdata/p66
Resource_List.walltime = 02:30:00
Resource_List.wd = 1
stime = Thu Feb 5 13:10:42 2026
session_id = 2865695
jobdir = /home/599/txz599
substate = 42
Variable_List = PBS_O_HOME=/home/599/txz599,PBS_O_LANG=en_AU.UTF-8,
PBS_O_LOGNAME=txz599,
PBS_O_PATH=/opt/conda/payu-1.2.1/bin:/g/data/vk83/apps/base_conda/envs
/payu-1.2.1/condabin:/home/599/txz599/.local/bin:/home/599/txz599/bin:/
opt/pbs/default/bin:/opt/nci/bin:/opt/bin:/opt/Modules/v4.3.0/bin:/opt/
pbs/default/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin
:/bin,PBS_O_MAIL=/var/spool/mail/txz599,PBS_O_SHELL=/bin/bash,
PBS_O_TZ=:/etc/localtime,PBS_O_INTERACTIVE_AUTH_METHOD=resvport,
PBS_O_HOST=gadi-login-09.gadi.nci.org.au,
PBS_O_WORKDIR=/g/data/p66/txz599/ACCESS-ESM1p6/test_pi_proj,
PBS_O_SYSTEM=Linux,
PYTHONPATH=/g/data/access/projects/access/apps/pythonlib/umfile_utils,
PAYU_PATH=/g/data/vk83/apps/base_conda/envs/payu-1.2.1/bin,
MODULESHOME=/opt/Modules/v4.3.0,
MODULES_CMD=/opt/Modules/v4.3.0/libexec/modulecmd.tcl,
MODULEPATH=/g/data/vk83/modules:/etc/scl/modulefiles:/opt/Modules/modu
lefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles,
PBS_NCI_HT=0,
PBS_NCI_STORAGE=scratch/xh94+gdata/vk83+scratch/p66+gdata/p66,
PBS_NCI_IMAGE=,PBS_NCPUS=520,PBS_NGPUS=0,PBS_NNODES=5,
PBS_NCI_NCPUS_PER_NODE=104,PBS_NCI_NUMA_PER_NODE=8,
PBS_NCI_NCPUS_PER_NUMA=13,PROJECT=xh94,PBS_VMEM=2748779069440,
PBS_NCI_WD=1,PBS_NCI_JOBFS=1500mb,PBS_NCI_LAUNCH_COMPATIBILITY=0,
PBS_NCI_FS_GDATA1=0,PBS_NCI_FS_GDATA1A=0,PBS_NCI_FS_GDATA1B=0,
PBS_NCI_FS_GDATA2=0,PBS_NCI_FS_GDATA3=0,PBS_NCI_FS_GDATA4=0,
PBS_O_QUEUE=normalsr,PBS_JOBFS=/jobfs/159875585.gadi-pbs
comment = Job run at Thu Feb 05 at 13:10 on (gadi-cpu-spr-0151:ncpus=104:me
m=536870912kb:jobfs=307200kb)+(gadi-cpu-spr-0152:ncpus=104:mem=53687091
2kb:jobfs=307200kb)+(gadi-cpu-spr-0155:ncpus=104:mem=536870912kb:jobfs=
307200kb)+(gadi-cpu-spr-0156:ncpus=104:mem=53…
etime = Thu Feb 5 13:10:09 2026
umask = 27
run_count = 1
Submit_arguments = -q normalsr -P xh94 -l walltime=9000 -l ncpus=520 -l mem
=2560GB -l jobfs=1500MB -N test_pi_proj -l wd -j n -v PYTHONPATH=/g/dat
a/access/projects/access/apps/pythonlib/umfile_utils,
PAYU_PATH=/g/data/vk83/apps/base_conda/envs/payu-1.2.1/bin,
MODULESHOME=/opt/Modules/v4.3.0,
MODULES_CMD=/opt/Modules/v4.3.0/libexec/modulecmd.tcl,
MODULEPATH=/g/data/vk83/modules:/etc/scl/modulefiles:/opt/Modules/modu
lefiles:/opt/Modules/v4.3.0/modulefiles:/apps/Modules/modulefiles -W um
ask=027 -l storage=gdata/p66+gdata/vk83+scratch/p66 – /g/data/vk83/./a
pps/conda_scripts/payu-1.2.1.d/bin/launcher.sh /g/data/vk83/./apps/base
_conda/envs/payu-1.2.1/bin/python3.10 /g/data/vk83/apps/base_conda/envs
/payu-1.2.1/bin/payu-run
executable = jsdl-hpcpa:Executable/g/data/vk83/./apps/conda_scripts/payu-
1.2.1.d/bin/launcher.sh</jsdl-hpcpa:Executable>
argument_list = jsdl-hpcpa:Argument/g/data/vk83/./apps/base_conda/envs/pa
yu-1.2.1/bin/python3.10</jsdl-hpcpa:Argument>jsdl-hpcpa:Argument/g/da
ta/vk83/apps/base_conda/envs/payu-1.2.1/bin/payu-run</jsdl-hpcpa:Argume
nt>
project = xh94
Submit_Host = gadi-login-09.gadi.nci.org.au

PBS Job Id: 159875585.gadi-pbs
Job Name: test_pi_proj
Post job file processing error; job 159875585.gadi-pbs on host gadi-cpu-spr-0151

Hi Tilo

I am a bit stumped,

  1. can you try removing PYTHONPATH=/g/data/access/projects/access/apps/pythonlib/umfile_utils ? I don’t know where this would be set for you - possibly in .bash_profile or .bashrc . Then open a new session and do a module purge before loading the payu module and trying again.
  2. If that fails, can you try a fresh clone into your home drive and try running from there in case there is some sort of issue working across physical drives going on
  3. If that fails, I think try the help desk if there is additional information on what Post job file processing error means ?

@Aidan also asks if you have any git hooks configured globally?

In my experience it means that PBS can’t write to the directory set by -o or -e - either because quota is full or the directory doesn’t exist

1 Like

Thanks Anton. It looks like removing PYTHONPATH=/g/data/access/projects/access/apps/pythonlib/umfile_utils from my .bashrc did the trick. Not sure why this was causing problems.

Anyway, fresh clone and running under xh94 with output on scratch p66 seems to work now.

Thanks for your help!

Cheers,
Tilo

1 Like

good news !

We often find issues with adding anything much to .bashrc (aliases are ok).