Technical Working Group Meeting, February 2020

Minutes

Date: 27th February, 2020

Attendees:

Aidan Heerdegen (AH) CLEX ANU, Angus Gibson (AG) ANU
Russ Fiedler (RF), Matt Chamberlain (MC) CSIRO Hobart
Rui Yang (RY), Paul Leopardi (PL) NCI
Nic Hannah (NH) Double Precision
Marshall Ward (MW) GFDL

New installed payu version

Version 1.0.7 is now installed in conda/analysis3-20.01 (analysis3-unstable

AH: payu is now 100% gadi compatible. Default cpus/node is now 48 and memory 192GB/node. Python interpreter, short path and manifests are scanned to automatically determined from model config and manifests. Using qsub_flags to manually specify storage flags no longer works, as automatically determined storage flag option is appended and the manually specified one no longer works.

RF: Paul Sandery having issues getting 0.1 deg model working. [AH: turns out it was a typo in config.yam]

AH: No need for the number of cpus in a payu job to be divisible by the number of CPUS in a node. Request however many the job uses, and payu will pad the request to make sure the PBS submission is requesting an integer number of nodes if ncpus is greater than the number in a single node. PL: Rounds up for each model? AH: No, just the total. MW: Will spread models across ranks, so a rank can have different models on it.

AH: Andy Hogg ran out 80 odd submits with the tenth model. Occasional hang, resubmit ok. Might be more stable than raijin.

AH: Navid has MOM6 model that cannot run more than a couple of submits without it crashing with an error that it cannot find the executable. Weird error, let me know if you see anything similar.

NH: Caution with disks and where to put things. Reading input files can be very slow sometimes, or not, and then files not there and turn up later. If executable is missing, running off a disk that is not good? MW: Filesystems are very complicated on gadi? NH: Less certainty of performance with such a different system with data file systems being mounted separately. I’d look at this.

PD: Good place to look if disk has got caught up doing too many tasks. gdata just hangs, saving text file takes a while. Due to being on login node? Get similar delays with interactive job on execute node.

AH: People reporting issues with login delays. Probably a disk issue? Navid’s job is not being run from gdata, but from scratch. Inclined to blame new system of mounting. Could we use jobfs. MW: Like in the old days when we ran on the node? Good luck! AH: Could just do some tests. NH: Concerning if scratch is slow.

AH: Not sure if filesystems are mounted with NFS. MW: That is what we do on gaia, and have tons of problems with mount on demand. Biggest frustration with using GFDL machine. It’s a nightmare. At least NCI have lustre know-how. AH: Used to have a lot of problems with NFS cache errors in the past, files disappearing and reappearing. Does sound similar to Navid’s problem.

MW: Raijin’s filesystem was quite good. Why the change? AH: Security. Commercial in confidence stuff. I think it is overblown. Can’t seen anyone else’s jobs on the queue. Can’t even check it other people are running on the project. Are moving to 2-factor auth also.

What is required to get gadi transition into master for ACCESS-OM2

AH: Andrew Kiss is on personal leave but sent around an email:

re. gadi-transition, we could proceed like so:

– we’ve also been transitioning libaccessom2 to use submodules for its dependencies instead of cmake https://github.com/COSIMA/libaccessom2/issues/29 which would require this commit https://github.com/COSIMA/libaccessom2/tree/53a86efcd01672c655c93f2d68e9f187668159de (not currently in gadi-transition branch)

– get the libaccessom2 tests working https://github.com/COSIMA/libaccessom2/issues/36

– there’s a gadi-transition branch libaccessom2, cice and mom that could be merged into master. They use openMPI4.0.2

– there’s also a gadi-transition branch for all the primary (ie JRA, non-minimal) configurations but the exe paths would need to be updated before merging to master

– the access-om2 gadi-transition branch would then need to be updated to use the correct submodules for model components and configurations. We also want to remove the core and minimal config submodules https://github.com/COSIMA/access-om2/issues/183

also fyi the current gadi build instructions are here

– https://github.com/COSIMA/access-om2/wiki/Getting-started#running-on-gadi

AH: Feels urgent that people can use on gadi. Any comments on Andrew’s email?

PL: Transition to submodules finished? AH: That is on a separate branch. NH: I did that work. Put it in a dev branch. Not intending to be part of gadi transition to have least number additions. AH: Agree if that is the easiest. Master is broken for gadi, so anything that works is an improvement. If there is no feedback can do this offline. Could make a project to be explicit about what is required. NH: Given that gadi-transition does work. Andrew and Andy use it. Wouldn’t hurt to put it in now. Work that PL has done to make sure it does reproduce ticks that box. So ready to go. Able to reproduce if we need to. I’ll merge it and do some interactive testing. Then people can use it and I can do automatic testing.

PL: What branch will it be merged into? A lot of branches in a lot of repos.

NH: Isolate gadi-transition branches and merge into master straight away. Not bother with other development branches at this stage. Want to get something in master that people can use. In future bring everything into dev as discussed, with master staying stable, just bug fixes, until decide to update from dev. I’ll go through the branches and just bring in the gadi transition stuff. PL: So dev will have submodule changes and master will not? NH: For the time being. With previous discussion we’ll be slower moving on master, to make sure it is working. Having dev will allow us to move that more rapidly. People can run off dev at their own risk. AH: Submodules will remain a named feature branch and pulled into dev at some future time. Should discourage having personal development branches on the main repo. If you want to experiment do it on your own fork. Branches on the main repo should be master, dev or named feature to keep it clean and everyone can understand what they mean.

Stack array errors and heap-array option

AH: Apologies minutes from last TWG meeting are not on the COSIMA website. There is an IT issue with the server. We wanted to follow up with stack array errors.

AH: Did ever test on raijin with same compiler? Is there any way we can do comparative test? Use raijin image? Any more from Dale about this stack stuff? PL: Haven’t heard anything. AH: Last meeting some mention of there being a limit on UM stacksize. RY: Already fixed Ilia’s issue. Fixed by making stacksize unlimited. RF: Always run with unlimited stack size. When had problem only fixed by setting heap arrays small or zero. When I went into code and made array allocation from automatic to allocatable the error went away.

MW: If I have an automatic array I get three different heap allocations for three different compilers. RF: This option forces all arrays on to the heap.

AH: This was fixed a while ago Rui? RY: Not clear this is the same problem. Ilia’s issue was the end of 2019 when gadi first on line. Not sure it is the same issue.

BGC Update

AH: Russ forwarded an update to Andy Hogg.

RF: Work was completed on raijin in 2019. BGC code in to MOM and CICE. Required changes in CICE: moving arrays around to different modules due to scope issues which allow optional fields to be sent. Main one is to send 10m winds to ocean, not just the wind stress. Holding off to issue PR until gadi transition done so could go in clearly.

NH: Will be useful for JRA1.4 work.

RF: Hakase will be using it for BGC. Passing algae between ice and ocean components. To add new field, need to add field to code, but don’t have to be passed. Just picked up from namcouple using the flags in OASIS to see if it’s registered.

AH: Can this be the next cab off the rank after gadi-transition, before AKs science tweaks. Not relying on any changes in Andrews branches? RF: Would like to get gadi transition out of the way and then test these changes. Not tested on gadi yet.

How to proceed? Testing?

https://github.com/russfiedler/MOM5/tree/wombat

https://github.com/russfiedler/cice5/tree/wombat

I’ve held off issuing a pull request until the dust settles wrt the gadi transition. There’s a bit of code rearrangement in order to allow optional fields (10m wind speed but this can be extended) to be passed from CICE.

The flags ACCESS-OM-BGC (tested) and ACCESS-ESM (untested) enable compilation of the BGC code. The 10m winds need to be added to the namcouple files and the MOM coupling fields namelist.

Work done on raijin last year. Changes in CICE to move arrays around in modules due to scope issues. Main one is to send 10m winds to ocean. No just wind stress. Holding off until gadi-transition done.

NH: Useful for stuff I’m doing with JRAv1.4.

RF: Hakase will use for BGC, passing algae between ice and ocean components. Have to change code to add fields. Don’t need to hard code as much. Once field in there optional to pass. Using the OASIS flags to see if registered.

JRA55-do counter-rotating cyclones

RF: Fortunately Paul Sandrey’s started in 1988. Last reverse cyclone in 1987. Cafe 60 use whole month window, so washed out on the average.

One of the RYF runs has reverse cyclone (83-84). Tell Kial.

Scaling

PL: Thanks to Marshall for getting me up to speed on scaling tests and sharing scripts. Can reproduce diagrams so can compare between raijin and gadi.

AH: Any more performances numbers? PL: Now in a position to answer questions, just need to know what questions to ask.

AH: ACCESS-OM2-01 currently running around 5K cores, would love to be able to scale to 10K, 20K even better. MW: MOM scaled to 50K. AH: CICE doesn’t scale as well. MW: Any work on CICE distributions? RF: Nope. Would need to be done again at higher core counts. MW: Current one working really well. AH: On NH’s to-do list was to experiment with layouts and load balancing. MW: Alistair is very interesting in load balancing sea ice models. Particularly icebergs. Has some quasi lagrangian code in SIS2 to load balance icebergs. Maybe some ideas will translate or vice versa.

PL: For the moment will just look at MOM and see how it scales at 0.1? AH: Maybe just try doubling everything and see if it scales ok? MW: Used to make those processor heat maps to get the load imbalance of CICE. Would be good to keep an eye on that while working with scaling. Tony Craig (CICE developer) is very interested.

Atmosphere/coupled models

PD: Still using code frozen for CMIP runs. Extending number of runs in ensemble.

AH: People in CLEX are keen to run CM2. PD: Not aware, maybe through someone else, maybe Simon or Martin? CM2 and ESM-1.5 runs have been published under s38 project.

AH: Scott Wales doing an ultra high resolution atmosphere run over Australia, under the STRESS2020 project. PD: Atmosphere only, do you know what resolution? I’ve also done some high res atmosphere only runs. On a project to improve turbulent kinetic energy spectrum in UM. Working on code to put stochastic back scatter into low res N96 (CMIP6) atmosphere. Got some good results injecting turbulent kinetic energy into small scales to improve artificial dissipation associated with semi-lagrangian timestep in UM. To test this is to see how improved N96 results compare to N512 runs using STRESS2020 resources. Working with Jorgen Fredrikson. Should talk to Scott.

AH: At the moment Scott is targeting 400m over Australia. PL: Convection resolving? AH: Planning a 2 day run to simulate Cyclone Debbie. Nested 400m run for Australia, inside BARRA at 2.2km. 10500×13000. PD: We’re going global. MW: How many levels? Same as global? PD: 85. AH: Major problem is running out of memory. MW: More cores should mean less memory. Maybe their Helmholtz server imposes some memory limit on the ranks. AH: Currently waiting for large memory nods to come online.

New FMS

MW: New FMS version coming. Targeting auto tools and getting rid of mkmf. If you’re on MOM5 you can use your frozen version. Completely rewritten IO in FMS. Now a thin wrapper to netCDF. No more magic functions like save_restart, write_restart. They have been replaced by lower level ops to allow model developers to have more control. Not sure MOM5 significance. AH: API compatible? MW: Keep compatible with old API as long as they can. Could dump it in and slowly integrate. Only raising in case you want to do more innovative stuff with IO. PL: Affects MOM6 mainly? MW: MOM6 is one of the main targets. PL: Parallel IO support? MW: Part of the reason. They want parallel IO in atmosphere model which NCAR now uses it. Now an important model. This implements the hooks for that work. RY: MPI-IO still there or be replaced by PIO? MW: It is. RY: Simpler to do one? MW: They’ve sent a patch to get MOM6 working with that now. Doesn’t work currently. Not sure about the progress, but know you were interested in PIO. RF: We’re interested from the ICE point of view. New version of BRAN will need daily inputs in CICE. Performance is terrible as IO is collected on to one processor. MW: FMS will not help CICE, but a test case if PIO is a valid solution.