Please reply to this topic if you have feedback on the ACCESS-AM3 Alpha. We are primarily looking for feedback on the usability of the build system and configuration, and the documentation. We are also happy to receive science-related feedback, which we will address later in the release process. Feedback can be to point out problems encountered, or positive to highlight what worked well.
If your feedback is involved, please make an issue on the configuration repository (see this post if you do not yet have access to this repository).
If youâre not sure, reply here and your query can be moved to a GitHub issue if required.
Thanks for the work again. I could run the original suite and one with modifications. I spot a few issues.
The jobs atmos_main and netcdf_conversion fail irregularly, once in a few years. For example see /home/563/qg8515/scratch/cylc-run/access-am3-configs/log/job/19911101T0000Z/atmos_main/01 and /home/563/qg8515/scratch/cylc-run/am3-plus4k/log/job/19860601T0000Z/netcdf_conversion/01. The log files do not provide much information. I am not sure whether it is a gadi problem or I output too many variables. Anyway, it would succeed after rerun again.
Unfortunately, the failed jobs do not resubmit themselves, so I have to babysit them and trigger a rerun after they fail. And they also do not send a email notification about failure. I thought I could set execution retry delays in the file /home/563/qg8515/roses/access-am3-configs/site/nci_gadi.rc as 10*PT1M. But it does not work.
I tried to set EXPT_AEROSOLS='aeroclim' in rose-suite.conf to run climatological aerosols. It again failed without much information (just segmentation fault). @clairecarouge already helped to look into it, but still unresolved. The suite is here: /home/563/qg8515/roses/am3-climaerosol, and the log is here: /home/563/qg8515/scratch/cylc-run/am3-climaerosol/log/job/19820101T0000Z/atmos_main/01. I am working into it. If you have any ideas, Iâm happy to implement.
FATAL: container creation failed: mount /proc/self/fd/10->/opt/nci/singularity/3.11.3/var/singularity/mnt/session/overlay-images/0 error: while mounting image /proc/self/fd/10: failed to find loop device: could not attach image file to loop device: failed to attach loop device: transient error, please retry: resource temporarily unavailable
It happens to me a few times today. I donât think there is a problem with the suite, it can be annoying but if you trigger the job again it should work.
I may be wrong but I think this is for when the process fails to be submitted to the queue. Not for when the process fails.
The container creation failed error is something that I have experienced with all UM suites Iâve run on gadi. It seems to be a persistent transient (and annoying! Scott can confirm) error, but not an issue with the individual suite itself.
If you manually re-trigger the job it should usually then run to success. (In my experience this can even sometimes take a few tries).
Cylc mon is a way to monitor and trigger in a terminal (if you have shut down your gui or donât want to view it in the gui).
Containers like the xp65 environments use âloop devicesâ to load. Depending on which node you get put on there can be a limited number of these loop devices available, some have a couple hundred some have only 12.
Talking to the NCI folks the loop devices should get automatically created by the container so it shouldnât matter how many are listed before you load xp65. Something to try is to increase the number of cpus requested so there are less jobs on a single node.
Hi @MartinDix Hope you have a nice start of the new year. I think you could be of great help for us configuring climatological aerosols for AM3, so I tag you here (sorry if you are already busy with all other duties). Claire points me to your post Run with aerosol climatologies and I assume you managed to run AM3-N96 with climatological aerosols with some changes in aeroclim-new-ancils.
May I ask, would you suggest to modify the alpha release in the same way as you did in the branch aeroclim-new-ancils to run with climatological aerosols? Or is there a simpler way with minimum changes necessary to make it run? Under the NEW_ANCIL_DIR, there are only two folders (n216e and n96e), so would it be more complicated for the high-res n512e?
Another question for NRI: as I mentioned to @lachlanswhyborn long ago, the current ancillaries normaly extend up to 2014, could we get NRI-support to easily extend the simulation to 2024/25 for a better comparison with recent observations (e.g. Himawari)?
(It seems I am keeping everyone busy in a holiday season, sorry for that;)
clairecarouge
(Claire Carouge, ACCESS-NRI Land Modelling Team Lead)
14
@qinggangg We are planning for a beta release of ACCESS-AM3 in the next 6 months, maybe earlier. There is a lot to do before the beta release, so the release date is still very vague.
We will try to implement as much of the feedback from the alpha-release as possible into the next release. At this point, it is hard to tell when any part of the work will be done. With the holidays, we have yet to meet and decide on prioritisation of the tasks for the beta release.
This means I donât know when we will have the time to work on any of this, but we will keep you updated on timelines when they become clearer for us. Keep asking questions as we might be able to provide temporary solutions.
Thank you @clairecarouge Sure, that makes sense. I fully understand. I will keep posting issues if I find so it may be resolved in the beta release or I may receive some suggestions. Feel free to decide your priorities, I will also try myself to find workaround.