UM RNS Error: ‘glm_um_recon1’ failed with MPI_INIT error and missing PE0 file

ZhangchengPei · 8 January 2026 04:17

Hi Atmos community,

I failed at the ‘glm_um_recon1’ step while running the UM RNS.

The job.err indicates:

It looks like MPI_INIT failed for some reason; your parallel process is
likely to abort. There are many reasons that a parallel process can
fail during MPI_INIT; some of which are due to configuration or environment
problems. This failure appears to be an internal failure; here’s some
additional information (which may only be relevant to an Open MPI
developer):

ompi_mpi_init: ompi_rte_init failed

→ Returned “Error” (-1) instead of “Success” (0)

*** and potentially your MPI job)
*** An error occurred in MPI_Init
*** on a NULL communicator
*** MPI_ERRORS_ARE_FATAL (processes in this communicator will now abort,
*** and potentially your MPI job)

and job.out reports:

Could not find PE0 output file: pe_output/umgla.fort6.pe000

I previously encountered this bug during the hh5 to xp65 transition, where reverting to an older version of conda/analysis3 (25.05) solved it. However, that fix is no longer working.

Does anyone have suggestions on how to resolve this?

Thanks,

Zhangcheng

Matt_Woodhouse · 8 January 2026 04:55

Hi Zhangcheng, I’ve been running with analysis3 24.09, which seems to be working (having previously had that error).

ZhangchengPei · 9 January 2026 00:17

Hi Matt,

Thanks for the suggestions! It looks like I don’t have version 24.09 available in my conda environment—only 24.07, 24.11, and 25.**. I’ve tested both 24.07 and 24.11, but unfortunately, neither resolved the issue.

Would you mind sharing your branch id? I’d like to compare our setups and see if I can spot any key differences.

Cheers,

Zhangcheng

Matt_Woodhouse · 9 January 2026 05:06

Hi Zhangcheng,

I have updated to xp65 and analyis3-24.09 in a nesting suite simulation that is currently running. I’m also still using cylc7.

I’ve updated the following basis suites to match my running suite, but haven’t tested them. They also include changes to the emissions files and boundary layer nucleation options, changes which you might also like to consider.

u-df869 - glm only nesting suite to generate start dumps

u-df510 - ancillary suite

u-df403 - nesting suite

Matt

Matt_Woodhouse · 9 January 2026 06:04

I think I meant 24.11, not .09

Though I now seem to be getting the error, having just had a suite complete successfully.

ZhangchengPei · 11 January 2026 23:20

It seems like something in the environment has changed, specifically regarding the openmpi. Is anyone familiar with this issue?

lachlanswhyborn · 12 January 2026 03:06

Is there a reason you’re loading the conda/analysis module to run the RNS?

ZhangchengPei · 12 January 2026 05:11

Hi Lachlan,

It appears the RNS requires the ‘pytz’ module to enable model cycling, as Bec has mentioned in this post Using xp65 in UM suites. I tested this by not loading the conda/analysis environment, which resulted in a

ModuleNotFoundError: No module named ‘pytz’.

lachlanswhyborn · 12 January 2026 05:24

The xp65 Conda environment overrides the openmpi which is loaded with module load openmpi/x.y.z (should be able to see this with echo $OPAL_PREFIX with the conda/analysis module loaded). You might be able to get around this by unsetting $OPAL_PREFIX (i.e. setting it to an empty string) after loading the Conda environment, but I’m not sure.

ZhangchengPei · 12 January 2026 23:46

Update: I load python3 instead of the default python2 in PRE_COMMAND and removed the conda/analysis. The RNS is now running successfully! Thank you for the suggestions @Matt_Woodhouse and @lachlanswhyborn .

Topic		Replies	Views
Using xp65 in UM suites Infrastructure python , help , um	18	263	28 October 2025
UM Nesting Suite - can't run nci_era5grib Regional Nesting Suite help , um	8	254	18 June 2024
"Run ACCESS-ESM" fails with error code 139 Earth System Model help , mpi , access-esm , cice4	18	589	12 February 2024
ACCESS-AM2 BUILD error Atmosphere help , um , error , access-am2	6	81	11 August 2025
Python problem with RNS, task install_glm_startdata Regional Nesting Suite help-needed	4	157	26 March 2024

UM RNS Error: ‘glm_um_recon1’ failed with MPI_INIT error and missing PE0 file

Related topics