Need help: 1/4 MOM ACCESS-ESM1-5 coupled run, crashed!?!?

Hi there,

=> # Topics tagged help <=

/scratch/public/ars599/PI-EDC-03

It is ESM1.5 upgraded to 1/4 mom. I have generated the N96-025OCN regridding associated files under hxy599_13042024:

oasis3_areas_N96_13042024.nc
oasis3_masks_N96_13042024.nc
oasis3_grids_N96_13042024.nc
um_n96_landseamask_gice2n96.v13.nc

Somehow, I got the error message from the PI-EDC-03.e114702017:

forrtl: error (78): process killed (SIGTERM)
Image              PC                Routine            Line        Source
um7.3x             00000000012FCBC4  Unknown               Unknown  Unknown
libpthread-2.28.s  0000152AA9835CF0  Unknown               Unknown  Unknown
libpthread-2.28.s  0000152AA9835180  nanosleep             Unknown  Unknown
libopen-rte.so.40  0000152AA4E0F83A  orte_show_help_no     Unknown  Unknown
libopen-rte.so.40  0000152AA4E0FB74  orte_show_help        Unknown  Unknown
libmpi.so.40.20.2  0000152AA9E2927A  MPI_Abort             Unknown  Unknown
libmpi_mpifh.so    0000152AAA1421FE  Unknown               Unknown  Unknown
um7.3x             0000000001110F0A  mpl_abort_                 55  mpl_abort.F90
um7.3x             000000000110A376  gc_abort_                 135  gc_abort.F90
um7.3x             000000000041A523  ereport_                  391  ereport.f90
um7.3x             00000000005FF563  initial_                 6549  initial.f90
um7.3x             0000000000428DB1  Unknown               Unknown  Unknown
um7.3x             000000000041481E  um_shell_                3930  um_shell.f90
um7.3x             000000000040D968  MAIN__                     40  flumeMain.f90
um7.3x             000000000040D8A2  Unknown               Unknown  Unknown
libc-2.28.so       0000152AA9294D85  __libc_start_main     Unknown  Unknown
um7.3x             000000000040D7AE  Unknown               Unknown  Unknown

/scratch/public/ars599/PI-EDC-03

and set up, and the run scripts are:

PI-EDC-03
PI-EDC-03.init
PI-EDC-03.fin

Any suggestion?

Thanks.

Arnold

Hi @ars599. If you want ACCESS-NRI support with this, please add the help tag. Also, it will be easier for people to help if they can access your run directory (I get Permission denied).

1 Like

Thanks Dougie!!

Model outputs under log sometimes are in 644 or 600, which is hard to expect.

d ~/cylc-run/u-db965/log/job/01210101/coupled/01/
total 2468
drwxr-sr-x 2 ars599 p66    4096 Mar  4 16:20 ./
drwxr-sr-x 3 ars599 p66    4096 Mar  1 10:13 ../
-rwxr-xr-x 1 ars599 p66    6258 Mar  1 10:13 job*
-rw-r--r-- 1 ars599 p66     197 Mar  1 10:16 job-activity.log
-rw-r--r-- 1 ars599 p66 1719672 Mar  1 10:16 job.err
-rw-r--r-- 1 ars599 p66  780408 Mar  1 10:16 job.out
-rw-r--r-- 1 ars599 p66     242 Mar  1 10:16 job.status

other files might need to be checked. so that people who are helping should ask rather than ask users to change all the permissions for all files. Such as those files under work:

d ~/cylc-run/u-db965/work/01210101/coupled/CPL_RUNDIR/
total 725176
drwxr-sr-x 2 ars599 p66      4096 Mar  1 10:11 ./
drwxr-sr-x 6 ars599 p66      4096 Mar  1 10:14 ../
-rw-r----- 1 ars599 p66   8424712 Mar  1 09:53 a2i.nc
-rw-r----- 1 ars599 p66 622161313 Mar  1 09:53 i2a.nc
lrwxrwxrwx 1 ars599 p66        12 Mar  1 10:11 namcouple -> ../namcouple
-rw-r----- 1 ars599 p66 111974996 Mar  1 09:53 o2i.nc

It might cause security issues for people like me who are unfamiliar with Linux.

Or will you suggest copy those to public folder?

Topic tags can be set when composing or editing the topic title. @Aidan has written a nice description of how to use them here. To tag this topic with the help tag:

  1. click on the pencil icon to the right of the topic title
  2. add “help” to the “optional tags” box

Or will you suggest copy those to public folder?

Yeah, that might be easiest, e.g. /scratch/public?

1 Like

@ars599 I hope you don’t mind, but I’ve changed the tag to help, since this specific tag triggers the support process on our end.

Could you copy all relevant logs and the diag_table to somewhere public?

@ars599, are you still wanting help with this? If you managed to resolve the issue yourself, can you provide details of the resolution here to help others in the future.

1 Like

Dear Dougie,

Sorry for the late reply. I had an emergency family issue and flight back to Taiwan two weeks ago and just came back. Maybe please help me to remove this issue. It hasn’t been solved but since there are other issues I need to fix them so that I might repost this later :slight_smile:

Regards,

Arnold

No worries @ars599. Hope all is well.

We don’t remove topics, but I’ll remove the help tag so that it doesn’t get flagged for ACCESS-NRI support. Please reply here if you find you want help again with this in the future.

Sorry, @dougiesquire, would you be able to help? Many thanks. This time, I have copied files to the public, and I hope you can access all the files.

Regards,

Arnold

Hi @ars599. I looks like you’ve edited your original post to change the error and even experiment you are wanting help with? I thought I could help with your original issue (related to the MOM diag table), but I’m not sure I can help with the new issue. Your subsequent posts are also now no longer related to your first post which makes it quite difficult/confusing for someone trying to help.

Can you please open a new topic with your new issue and I’ll revert this topic to have your original post.

(Also, although the files are now in scratch/public the permissions you have set still prevent read access to the files)

Hi @dougiesquire,

Thank you. Can we delete this thread, please? I’ll begin a new topic separately.

Sorry for any confusion.

Cheers,
Arnold