I am running mom6_panan-005 and using the parallel I/O layout, so I need to collate the tiles afterwards, but it’s not working correctly. The tiles are collated into one, but the individual files are not deleted, so now I am using up unnecessary disk space.
I use this in config.yaml:
collate:
restart: true
mpi: true
walltime: 6:00:00
mem: 190GB
ncpus: 4
queue: normal
exe: /g/data/ik11/inputs/access-om2/bin/mppnccombine-fast
I also tried
exe: /g/data/ik11/inputs/access-om2/bin/mppnccombine
but that didn’t work either.
This is one of the error logs:
Currently Loaded Modulefiles:
1) openmpi/4.1.4(default) 2) pbs
payu: error: Thread 1 crashed with error code 255.
Error message:
Copying non-collated variables
Copying contiguous variables
Copying chunked variables
[rank 003] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0008
[rank 002] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0009
[rank 001] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0010
[rank 003] Unaligned or compression change (slow) copy of umo_2d from 19920901.ocean_month.nc.0008
[rank 001] Unaligned or compression change (slow) copy of umo_2d from 19920901.ocean_month.nc.0010
[rank 002] Unaligned or compression change (slow) copy of umo_2d from 19920901.ocean_month.nc.0009
[rank 003] Unaligned or compression change (slow) copy of tauuo from 19920901.ocean_month.nc.0008
[rank 002] Unaligned or compression change (slow) copy of tauuo from 19920901.ocean_month.nc.0009
[rank 001] Unaligned or compression change (slow) copy of tauuo from 19920901.ocean_month.nc.0010
[rank 001] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0011
[rank 002] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0012
[rank 003] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0022
[rank 001] Unaligned or compression change (slow) copy of umo_2d from 19920901.ocean_month.nc.0011
[rank 002] Unaligned or compression change (slow) copy of umo_2d from 19920901.ocean_month.nc.0012
[rank 003] Unaligned or compression change (slow) copy of umo_2d from 19920901.ocean_month.nc.0022
[rank 002] Unaligned or compression change (slow) copy of tauuo from 19920901.ocean_month.nc.0012
[rank 001] Unaligned or compression change (slow) copy of tauuo from 19920901.ocean_month.nc.0011
[rank 003] Unaligned or compression change (slow) copy of tauuo from 19920901.ocean_month.nc.0022
[rank 002] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0024
[rank 003] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0023
[rank 001] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month.nc.0044
[rank 001] Unaligned or compression change (slow) copy of yq from 19920901.ocean_month.nc.0044
[rank 000] var 3 yq from 3 dims 1 [0,4225648,72057594037928206]
[rank 000] ERROR in HDF5 /home/502/aph502/code/c/mppnccombine-fast/async.c:446
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: ../../src/H5Dio.c line 404 in H5Dwrite_chunk(): can't write unprocessed chunk data
major: Dataset
minor: Write failed
#001: ../../src/H5Dchunk.c line 461 in H5D__chunk_direct_write(): unable to allocate chunk
major: Dataset
minor: Can't allocate space
#002: ../../src/H5Dchunk.c line 6564 in H5D__chunk_file_alloc(): unable to free chunk
major: Dataset
minor: Unable to free object
#003: ../../src/H5MF.c line 1216 in H5MF_xfree(): can't add section to file free space
major: Resource unavailable
minor: Unable to insert object
#004: ../../src/H5MF.c line 665 in H5MF__add_sect(): can't re-add section to file free space
major: Resource unavailable
minor: Unable to insert object
#005: ../../src/H5FSsection.c line 1409 in H5FS_sect_add(): can't insert free space section into skip list
major: Free Space Manager
minor: Unable to insert object
#006: ../../src/H5FSsection.c line 1124 in H5FS_sect_link(): can't add section to non-size tracking data structures
major: Free Space Manager
minor: Unable to insert object
#007: ../../src/H5FSsection.c line 1069 in H5FS_sect_link_rest(): can't insert free space node into merging skip list
major: Free Space Manager
minor: Unable to insert object
#008: ../../src/H5SL.c line 1122 in H5SL_insert(): can't create new skip list node
major: Skip Lists
minor: Unable to insert object
#009: ../../src/H5SL.c line 783 in H5SL_insert_common(): can't insert duplicate key
major: Skip Lists
minor: Unable to insert object
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x4095a1]
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x409127]
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x404311]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x1549e63abd85]
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x4030ee]
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
payu: error: Thread 4 crashed with error code 255.
Error message:
Copying non-collated variables
Copying contiguous variables
Copying chunked variables
[rank 003] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0008
[rank 001] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0010
[rank 002] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0009
[rank 001] Unaligned or compression change (slow) copy of umo from 19920901.ocean_month_rho2.nc.0010
[rank 003] Unaligned or compression change (slow) copy of umo from 19920901.ocean_month_rho2.nc.0008
[rank 002] Unaligned or compression change (slow) copy of umo from 19920901.ocean_month_rho2.nc.0009
[rank 001] Unaligned or compression change (slow) copy of vmo from 19920901.ocean_month_rho2.nc.0010
[rank 003] Unaligned or compression change (slow) copy of vmo from 19920901.ocean_month_rho2.nc.0008
[rank 002] Unaligned or compression change (slow) copy of vmo from 19920901.ocean_month_rho2.nc.0009
[rank 002] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0022
[rank 001] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0012
[rank 003] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0011
[rank 001] Unaligned or compression change (slow) copy of umo from 19920901.ocean_month_rho2.nc.0012
[rank 002] Unaligned or compression change (slow) copy of umo from 19920901.ocean_month_rho2.nc.0022
[rank 003] Unaligned or compression change (slow) copy of umo from 19920901.ocean_month_rho2.nc.0011
[rank 002] Unaligned or compression change (slow) copy of vmo from 19920901.ocean_month_rho2.nc.0022
[rank 001] Unaligned or compression change (slow) copy of vmo from 19920901.ocean_month_rho2.nc.0012
[rank 003] Unaligned or compression change (slow) copy of vmo from 19920901.ocean_month_rho2.nc.0011
[rank 001] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0024
[rank 003] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0044
[rank 002] Unaligned or compression change (slow) copy of xq from 19920901.ocean_month_rho2.nc.0023
[rank 003] Unaligned or compression change (slow) copy of yq from 19920901.ocean_month_rho2.nc.0044
[rank 000] var 3 yq from 2 dims 1 [0,4225648,72057594037928206]
[rank 000] ERROR in HDF5 /home/502/aph502/code/c/mppnccombine-fast/async.c:446
HDF5-DIAG: Error detected in HDF5 (1.10.5) thread 0:
#000: ../../src/H5Dio.c line 404 in H5Dwrite_chunk(): can't write unprocessed chunk data
major: Dataset
minor: Write failed
#001: ../../src/H5Dchunk.c line 461 in H5D__chunk_direct_write(): unable to allocate chunk
major: Dataset
minor: Can't allocate space
#002: ../../src/H5Dchunk.c line 6564 in H5D__chunk_file_alloc(): unable to free chunk
major: Dataset
minor: Unable to free object
#003: ../../src/H5MF.c line 1216 in H5MF_xfree(): can't add section to file free space
major: Resource unavailable
minor: Unable to insert object
#004: ../../src/H5MF.c line 665 in H5MF__add_sect(): can't re-add section to file free space
major: Resource unavailable
minor: Unable to insert object
#005: ../../src/H5FSsection.c line 1409 in H5FS_sect_add(): can't insert free space section into skip list
major: Free Space Manager
minor: Unable to insert object
#006: ../../src/H5FSsection.c line 1124 in H5FS_sect_link(): can't add section to non-size tracking data structures
major: Free Space Manager
minor: Unable to insert object
#007: ../../src/H5FSsection.c line 1069 in H5FS_sect_link_rest(): can't insert free space node into merging skip list
major: Free Space Manager
minor: Unable to insert object
#008: ../../src/H5SL.c line 1122 in H5SL_insert(): can't create new skip list node
major: Skip Lists
minor: Unable to insert object
#009: ../../src/H5SL.c line 783 in H5SL_insert_common(): can't insert duplicate key
major: Skip Lists
minor: Unable to insert object
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x4095a1]
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x409127]
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x404311]
/lib64/libc.so.6(__libc_start_main+0xe5)[0x149b3d138d85]
/g/data/ik11/inputs/access-om2/bin/mppnccombine-fast[0x4030ee]
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode -1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
My rundirectory is
/home/142/cs6673/payu/panan_005deg_jra55_ryf_2023_05_17