Note that if runlog: True
is set in config.yaml
(or not set at all, and defaults to True
) then all the necessary configuration files are automatically added to the git
repo and checked in when the model is run.
Just sayin’ … use runlog.
Note that if runlog: True
is set in config.yaml
(or not set at all, and defaults to True
) then all the necessary configuration files are automatically added to the git
repo and checked in when the model is run.
Just sayin’ … use runlog.
Hi Aidan
I’m not sure I understand you. Are you saying that when I run payu run
it will automatically create a MOM_layout
and add it to the remote repository?
Will it add it to the local copy on disk?
I tried using @mmr0 's config repo.
$ payu clone -b expt -B tassie-test https://github.com/mmr0/access-om3-configs.git access-rom3
Cloned repository from https://github.com/mmr0/access-om3-configs.git to directory: /home/548/pag548/access-om3/access-rom3
Created and checked out new branch: expt
laboratory path: /scratch/gb02/pag548/access-om3
binary path: /scratch/gb02/pag548/access-om3/bin
input path: /scratch/gb02/pag548/access-om3/input
work path: /scratch/gb02/pag548/access-om3/work
archive path: /scratch/gb02/pag548/access-om3/archive
Metadata and UUID generation is disabled. Experiment name used for archival: access-rom3
Added archive symlink to /scratch/gb02/pag548/access-om3/archive/access-rom3
Added work symlink to /scratch/gb02/pag548/access-om3/work/access-rom3
To change directory to control directory run:
cd access-rom3
Then I altered config.yaml
and commented out the runlog
line but it payu setup
fails with
FileNotFoundError: [Errno 2] No such file or directory: '/home/548/pag548/access-om3/access-rom3/MOM_layout'
Oh no, sorry for not being clear. If @mmr0 had runlog
turned on then it would automatically add the correct files to the repo when the model is run.
This is one of the reasons it’s a good idea.
I have recently added to the instructions to direct everyone to turn on runlog:
In config.yaml we want to change runlog: false
to runlog: true
This addition was after @mmr0 and @Paul.Gregory had already gone through the instructions (sorry!) but it would be great if you could turn it on now.
@Paul.Gregory I went through the instructions and can get the model to run. The one thing I did differently to you is that I used the more recent (non dev) executable and the dev-1deg_jra55do_iaf branch. However, It seems unlikely to me that this is the issue so if you have time, I would like to delve a bit more into @Anton’s above suggestions.
If you haven’t already, can you please double check that you have deleted these lines from config.yaml:
- /g/data/vk83/configurations/inputs/access-om3/share/meshes/global.1deg/2024.01.25/access-om2-1deg-ESMFmesh.nc
- /g/data/vk83/configurations/inputs/access-om3/share/meshes/global.1deg/2024.01.25/access-om2-1deg-nomask-ESMFmesh.nc
@anton my understanding is that we don’t use the ocean_mask.nc at model run time but we use it along with hgrid.nc to create the mesh. Would the issue be that we are using an incompatible ocean_mask and hgrid.nc when generating the meshes?
@Paul.Gregory if the fresh version doesn’t work, would it be hard to try running with @mmr0 hgrid and meshes to see if we can rules out these as the issue?
Both files are used, annoyingly its a situation where the same information is captured in two files.
ocean_mask.nc
is used by MOM, the mask in ESMFmesh.nc
is used by the mediator. They should be the same. In the mom “cap”, it compares the two masks are the same, which is where the error is coming from.
As this is a regional domain, we might expect ocean everywhere, so all values of the mask should be 1 ?
You could plot the mask in the ESMF mesh using something like:
mesh_ds.elementMask.values.reshape(249,140)
and then compare to the ocean_mask.nc
file. And it should be the same everywhere.
OK so I copied @mmr0 's configuration :
4x4 decomposition on a domain measuring 70x125 at 0.1 resolution,
to another directory and regenerated the default domain :
10x10 decomposition on a domain 140x249 at 0.5 resolution.
During the domain decomposition, the notebook generated the following:
Running GFDL's FRE Tools. The following information is all printed by the FRE tools themselves
NOTE from make_solo_mosaic: there are 0 contacts (align-contact)
congradulation: You have successfully run make_solo_mosaic
OUTPUT FROM MAKE SOLO MOSAIC:
CompletedProcess(args='/g/data/ik11/mom6_tools/tools/make_solo_mosaic/make_solo_mosaic --num_tiles 1 --dir . --mosaic_name ocean_mosaic --tile_file hgrid.nc', returncode=0)
cp: './ocean_mosaic.nc' and 'ocean_mosaic.nc' are the same file
cp: './hgrid.nc' and 'hgrid.nc' are the same file
cp ./hgrid.nc hgrid.nc
NOTE from make_coupler_mosaic: the ocean land/sea mask will be determined by field depth from file bathymetry.nc
mosaic_file is grid_spec.nc
***** Congratulation! You have successfully run make_quick_mosaic
OUTPUT FROM QUICK MOSAIC:
CompletedProcess(args='/g/data/ik11/mom6_tools/tools/make_quick_mosaic/make_quick_mosaic --input_mosaic ocean_mosaic.nc --mosaic_name grid_spec --ocean_topog bathymetry.nc', returncode=0)
===>NOTE from check_mask: when layout is specified, min_pe and max_pe is set to layout(1)*layout(2)=100
===>NOTE from check_mask: Below is the list of command line arguments.
grid_file = ocean_mosaic.nc
topog_file = bathymetry.nc
min_pe = 100
max_pe = 100
layout = 10, 10
halo = 4
sea_level = 0
show_valid_only is not set
nobc = 0
===>NOTE from check_mask: End of command line arguments.
===>NOTE from check_mask: the grid file is version 2 (mosaic grid) grid which contains field gridfiles
==>NOTE from get_boundary_type: x_boundary_type is solid_walls
==>NOTE from get_boundary_type: y_boundary_type is solid_walls
==>NOTE from check_mask: Checking for possible masking:
==>NOTE from check_mask: Assume 4 halo rows
==>NOTE from check_mask: Total domain size is 140, 249
_______________________________________________________________________
NOTE from check_mask: The following is for using model source code with version older than siena_201207,
Possible setting to mask out all-land points region, for use in coupler_nmlTotal number of domains = 100
Number of tasks (excluded all-land region) to be used is 98
Number of regions to be masked out = 2
The layout is 10, 10
Masked and used tasks, 1: used, 0: masked
1111111111
1111111111
1111111111
1111001111
1111111111
1111111111
1111111111
1111111111
1111111111
1111111111
domain decomposition
14 14 14 14 14 14 14 14 14 14
25 25 25 25 25 25 25 25 25 24
used=98, masked=2, layout=10,10
To chose this mask layout please put the following lines in ocean_model_nml and/or ice_model_nml
nmask = 2
layout = 10, 10
mask_list = 5,7,6,7
_______________________________________________________________________
NOTE from check_mask: The following is for using model source code with version siena_201207 or newer,
specify ocean_model_nml/ice_model_nml/atmos_model_nml/land_model/nml
variable mask_table with the mask_table created here.
Also specify the layout variable in each namelist using corresponding layout
***** Congratulation! You have successfully run check_mask
OUTPUT FROM CHECK MASK:
I then wrote the following the code to check the values of ocean_mask.nc
and the ESMF mesh files.
input_dir=Path('/scratch/gb02/pag548/regional_mom6_configs/tassie-access-om2-forced')
# Load mask files
ocean_mask = xr.open_dataset(f'{input_dir}/ocean_mask.nc')
ESMF_mesh = xr.open_dataset(f'{input_dir}/access-rom3-ESMFmesh.nc')
ESMF_nomask_mesh = xr.open_dataset(f'{input_dir}/access-rom3-nomask-ESMFmesh.nc')
# Reconstruct the ESMF mesh as a 2-D array
ESMF_mask = xr.DataArray(ESMF_mesh.elementMask.values.reshape(249,140),
dims=['ny','nx'],
coords={'ny':ocean_mask.ny,
'nx':ocean_mask.nx})
fig,ax=plt.subplots(1,3,figsize=(15,4.5))
ocean_mask.mask.plot(ax=ax[0])
ESMF_mask.plot(ax=ax[1])
delta = ocean_mask.mask - ESMF_mask
delta.plot(ax=ax[2])
fig.suptitle('Delta b/w ocean_mask and ESMF_mask')
plt.tight_layout()
The two meshes are identical.
The delta min/max value is zero.
Additionally, the value of access-rom3-nomask-ESMFmesh.nc
is 1.0 everywhere.
The pays run
task fails with the same error:
FATAL from PE 86: ERROR: ESMF mesh and MOM6 domain masks are inconsistent! - MOM n, maskMesh(n), mask(n) = 306 1 0
FATAL from PE 82: ERROR: ESMF mesh and MOM6 domain masks are inconsistent! - MOM n, maskMesh(n), mask(n) = 81 1 0
Is this related to the output of check_mask
from the notebook earlier?
Total number of domains = 100
Number of tasks (excluded all-land region) to be used is 98
Number of regions to be masked out = 2
...
domain decomposition
14 14 14 14 14 14 14 14 14 14
25 25 25 25 25 25 25 25 25 24
used=98, masked=2, layout=10,10
i.e. there are two regions in the domain that don’t align somehow?
Do I need to follow the advice in check_mask
output?
To chose this mask layout please put the following lines in ocean_model_nml and/or ice_model_nml
nmask = 2
layout = 10, 10
mask_list = 5,7,6,7
EDIT : I’m guessing not as this corresponds to contents of mask_table.2.10x10
?
2
10, 10
5,7
6,7
Ok - thats interesting! That seems like a bug. My hunch would be to try without any masked blocks in the mask_table
But we use the AUTO_MASKTABLE option without trouble, so I am not sure
The notebooks are running a few processes that are not actually needed in the nuopc coupler - I think the mask_table file is something we are able to ignore
Thanks @anton.
So to remove the masked blocks in mask_table
, do I remove the last two lines in the file?
So I change
2
10, 10
5,7
6,7
to
2
10,10
?
Or do I just clear all contents from the mask_table
file?
BTW if this doesn’t work (and I note that you’re not too optimistic ) I’m happy to try and re-compile my own MOM6 executable with debugging flags and try to run debug the MPI process.
I’ve never debugged MPI but I have lots of experience using gbd
and idb
in Fortran so I’ll be able to make good progress once I know how to attach it to the mpi run
process.
I’ve also decided to start reading the MOM6 docs from the beginning at Welcome to MOM6’s documentation! — MOM6 0.2a3 documentation
This probably works, I am not sure.
We set AUTO_MASKTABLE = True
in MOM_input and don’t have a MOM_layout file. With a MOM_layout file presumably you could set in the MOM_layout file instead?
We might need a MOM person !
Ok trying to add AUTO_MASKTABLE = True
in MOM_input
or MOM_layout
with or without the MASKTABLE
variable set produces the following payu
error:
ValueError: OCN_modelio pio_root exceeds available PEs (max: 0) in nuopc.runconfig.
Which refers to this section of nuopc.runconfig
OCN_modelio::
diro = ./log
logfile = ocn.log
pio_async_interface = .false. #not used
pio_netcdf_format = 64bit_offset #not used
pio_numiotasks = -99 #not used
pio_rearranger = 2 #not used
pio_root = 1 #not used
pio_stride = 48 #not used
pio_typename = netcdf #not used, set in input.nml
::
I keep going with compiling MOM6 with debug flags and then attaching a debugger to it.
I might start a separate hive thread.
This check occurs before the model even runs - this is during payu setup. So I don’t see how it can be related to changing the masktable.
Its a bit concerning is says the max availabe PEs is 0.
What is ocn_ntasks / ocn_nthreads / ocn_pestride / ocn_rootpe
in the PELAYOUT_attributes section set to ?
If you push your config to github ill have a look.
Ok I’ve got @mmr0 's config up and running now. It seems to make progress and actually run because the following directory now exists:
/scratch/gb02/pag548/access-om3/archive/access-rom3-MR/output000/
However the run fails after a few minutes with
ls: cannot access 'archive/output000/access-om3.cice.r.*': No such file or directory
cal: unknown month name: om3.cice*.????
These grids are specified in config.yaml
- /g/data/vk83/configurations/inputs/access-om3/cice/grids/global.1deg/2024.05.14/grid.nc
- /g/data/vk83/configurations/inputs/access-om3/cice/grids/global.1deg/2024.05.14/kmt.nc
- /g/data/vk83/configurations/inputs/access-om3/cice/initial_conditions/global.1deg/2023.07.28/iced.1900-01-01-10800.nc
In an earlier global MOM6 run (1deg_jra55do_ryf
) the following files exist in /scratch/gb02/pag548/access-om3/archive/1deg_jra55do_ryf/output000/
access-om3.cice.1900-01.nc
access-om3.cicem.1900-01.nc
I tried to copy them into
/scratch/gb02/pag548/access-om3/archive/access-rom3-MR/output000/
But I generate the same error.
I’m guessing I’ve completed the first stage in the model run because the stdout/sterr files 1deg_jra55do_ia.*
are fully written. The set of PBS job files build_intake_ds.sh.*
have been created. The stderr from the build_intake_ds.sh.e*
file is
Downloading data from 'https://raw.githubusercontent.com/ACCESS-NRI/schema/e9055da95093ec2faa555c090fc5af17923d1566/au.org.access-nri/model/o
utput/file-metadata/1-0-1.json' to file '/home/548/pag548/.cache/pooch/8e3c08344f0361af426ae185c86d446e-1-0-1.json'.
Traceback (most recent call last):
File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.01/lib/python3.10/site-packages/urllib3/connection.py", line 203, in _new_conn
sock = connection.create_connection(
File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.01/lib/python3.10/site-packages/urllib3/util/connection.py", line 85, in create_
connection
raise err
File "/g/data/hh5/public/apps/miniconda3/envs/analysis3-24.01/lib/python3.10/site-packages/urllib3/util/connection.py", line 73, in create_
connection
sock.connect(sa)
OSError: [Errno 101] Network is unreachable
Are there extra NCI projects or ACCESS-NRI permissions I need access to?
Hi @Paul.Gregory
This is sounding promising. Are there netcdf files related to MOM in the output000 folder?
Thanks for bringing this error up as a few people have received this error – it hasn’t been fatal for the model run but it is something we should look into.
We are not running CICE – so the call to CICE outputs shouldn’t be happening – but as we are the first people trying to run without CICE I am wondering if there is something in the workflow that is hardwired to include CICE? @Aidan or @anton might know more – and if it is in the realm of something we can fix in our configurations or whether I should raise this one somewhere?
Note that Monday is a public holiday in Canberra so responses may be delayed.
Ok that’s a good reminder. I’ll double check the inputs to ensure all cice
related processes aren’t active.
That error is because you’re trying to download a file from GitHub and I’m guessing this was on a PBS compute note, which has no internet access.
@CharlesTurner might have some idea if this is expected.
It looks like you’re using the hh5
conda/analysis3
environment:
Are we not using the xp65
environments for the intake catalogue generation?
I’d need to look into the cice
issue, but @anton may have some idea.
Yeah this an issue that is fixed in more recent versions of access-nri-intake - see Ship schema with package · Issue #185 · ACCESS-NRI/access-nri-intake-catalog · GitHub.
The version in hh5
environment is quite old now. Using the xp65
environment should fix the issue. I can provide more details on Tuesday if someone doesn’t do so first.
Regarding the CICE error, I suspect the Payu configuration still includes running a userscript after the model completes to postprocess CICE output. But because you have no CICE output, this fails. Again this is easy to fix and I can provide more detail on Tuesday.
Morning.
I tried using payu
with the xp65
modules.
$ module use /g/data/xp65/public/modules
$ module load conda/analysis3
Then load payu
$ module use /g/data/vk83/prerelease/modules
$ module load payu/dev
But this generates a PE error.
FATAL from PE 0: time_interp_external 2: time 734872 (20130105.000050 is after range of list 734868-734872(20130101.000000 - 201301
05.000000),file=INPUT/forcing_obc_segment_001.nc,field=u_segment_001
in mom_cap.F90
Are there other steps required when using xp65
conda? Here are my loaded modules at runtime.
$ module list
Currently Loaded Modulefiles:
1) pbs 2) singularity 3) conda/analysis3-24.12(access-med:analysis3) 4) payu/dev-20250220T210827Z-39e4b9b(dev)
EDIT : I tried to run again with the standard config and I generated this error again.
So it looks like I’ve broken something and this PE error is not related to xp65
conda.