Technical Working Group Meeting, March 2021

Minutes

Date: 17th March, 2021

Attendees:

Aidan Heerdegen (AH) CLEX ANU
Andrew Kiss (AK) COSIMA ANU
Angus Gibson (AG) RSES ANU
Russ Fiedler (RF) CSIRO Hobart
Paul Leopardi (PL), Rui Yang (RY) NCI
Nic Hannah (NH) Double Precision
Peter Dobhrotoff (PD) CSIRO Aspendale
Mark Cheeseman (MC) Down Under Geosolutions (DUG)

General forcing perturbation support

AK: Would like to perturb forcing in model run without having to change forcing files. Would like to support linear function on forcing. Not most general, covers most current cases. How to represent scaling and offset fields. Common to separate in time and space. Generalisation is sum of separate components.

AK: Currently implement arbitrary spatiotemporal or constant . Full spatiotemporal variation is data intensive. Need to work out details of implementation. Is the proposal feasible? One component, should be able to be generalised to N components. Calendar is a bit more complicated. Currently support experiment and forcing calendars. Can be different, e.g. RYF forcing. May want to tie forcing to experimental or forcing calendar. Perturbation should have a time coordinate and is only applied when it matches the current forcing time. Is this feasible?

JM: Precompute spatial/temporal fields. Going to be standardised ramps? Are perturbations going to simple functions of time and space? Reproducibily generated? AK: Not necessarily. EOFs would not work like this. NH: Important to know what the model is forced with. This complicates it a bit. A file is a dead-end. Files are a complete representation, even if you don’t know how you made the files. Want to document the creation of forcing files and attach code. Don’t want it to be too expressive or complex, will introduce more difficulty in understanding what it does. Sounds like if you could put arbitrary mathematical functions in there that would be preferable? JM: Say a temporal perturbation, have to look up file, has calendar issues. Different information to saying this is a linear ramp, or a step function every 12 years. Can put that tooling somewhere else in the workflow. This is maybe better, previously had a whole new forcing which is worse. So this is an improvement.

NH: Pretty arbitrary what can go in there. Can damp a single storm or collection of storms. AK: Manifests will document which files are used with hashes on the data. Yes not evident what is happening without looking at the file. Would encourage comments in netCDF attributes with git commit for script that made the file. So files carry with them a reference to what created them.

AH: Maybe insist on a comment field in the JSON file? AK: Encourage people to make it a URL to commit of script that made the file. AH: Yes, but some short descriptive information. We know it won’t necessarily mean it is updated, but make it compulsory and make them put something in there. AK: Compulsory means people tend to copy & paste form previous, and bad information is worse than none. AH: Allow people to do the right thing. A pithy comment can have a lot of information in it. NH: Enforce comments in netDF files? AH: Most netCDF files have a comment field in them, but often not that useful. NH: Allow or enforce comment. Make it possible to have a comment. AK: Ignore any non-defined field they can put anything they like? AH: Always a bad idea to allow through things that are wrong, as can mean people think they have defined something, but a typo can mean they haven’t. Happens with payu config.yaml files sometimes.

PL: What is description field? NH: For the file as a whole.

DUG scaling with ACCESS-OM2

MC: Running ACCESS-OM2 on CascadeLake cluster and KNL cluster, about 10-15% slower than gadi. Probably due to slightly slower Intel Cascade Lake, and MPI interconnect also slightly slower. Very early results. PL: Is that across the board speed, or top end scalability or both? MC: Both across the board and individual components. Only running quarter and one degree so far. Most results are one degree. Run quarter a couple of times. Doing that work in Houston and has had problems recently. AH: How does it work on KNLs? MC: Slower per node. Scaling is just as good. OpenMP not as good as thought. Will look at, as OpenMP critical to getting good scaling on KNL. PL: With OpenMP, which components? MOM and CICE? MC: Both CICE and MOM. Not done anything deep, just turned it on. NH: Little to no instrumentation. Not much OpenMP stuff. RF: There is in CICE. Not a huge amount. Thread over the blocks. NH: Cool.

MC: Had another guy working on it. He has been running in the last week. Will be back on to it next week looking at OpenMP. AH: Anything stopping 0.1? MC: Just getting data. Getting timeouts when transferring. Some guys looking at it who work on SKA moving 10+TB quickly and easily. Will get in touch to try high speed trial between DUG and ANU. AH: Involve working a client on our side communicating? MC: I think so, just something at the very end, doesn’t need IT support. Not 100% sure. AH: Very interested in this. MC: If works should have entire 0.1 data in minutes.

MC: Visualisation guys interested due to insight work. Different from normal data. Very excited. AH: Lots of small scales. MC: Interested in path tracing. Made filter to follow isopycnal along gulf stream. POV or from a coordinate position.

NH: Seen the great one on youtube? Awesome visualisation which shows isopycnal surfaces around Antarctica. AH: NCI one uses iso-surfaces to pick out density classes.

AH: Nice NYT article https://www.nytimes.com/interactive/2021/03/02/climate/atlantic-ocean-climate-change.html

MC: Anything simulation we could do, or benchmark, that would help you guys out? NH: 0.1 with daily ocean output. PL: Got all the benchmark documentation? Concentrated on not producing output.ur

AH: Any point pushing the scaling? NH: Focussing on year/submit. Get a year done in 5 hours, whatever it takes. Not so interested in how big we can get it. PL: As KNL individual cores slower, how good is top end scaling as that is where speed comes from. Could try AMDs as well. Don’t have enough for large scale runs. As most have GPUs on them, ML guys use them a lot. AH: AMDs maybe interesting. Pawsey have AMDs for their new system. Historically AMDs had a big memory bandwidth advantage. Isn’t model cache limited? AK: Didn’t Marshall say it was the speed of the RAM. AH: So memory bandwidth improvement might make a difference.

MC: For memory transaction bound calcs depends on memory type and data being moved. AMD and CL faster, depending on data types. Intel works better if switching between 32-bit integer and 64 bit real causes a problem with AMD. Bioinformatics and genomics hit this hard. Intel is better by far. Using latest intel compiler. PL: Complicated by alignment and AVX? MC: Maybe Intel preload vector units better? Open question trying to answer.

PL: Other scaling: OpenMP vs MPI? More threads vs more cores? NH: Curious about this for CICE. KNL has 4 threads/core need OpenMP to make full use of it. On gadi only one FPU/core, maybe don’t care about threads? PL: Depends on latency in MPI calls vs what happens in OpenMP threading. MC: For KNL 2 threads/core is a sweet spot for saturating vector units, very few can use 4 threads/core. AH: Which MPI library? MC: Using OpenMPI 4.1, also have Intel MPI. AH: Never used Intel, so don’t know if there is much difference. Can you comment on any difference? MC: Mostly OpenMPI, mostly optimised for that. Have recently put some time into Intel MPI, some recent mellanox collectives don’t work as well in newer OpenMPI (hierarchical collector, collectives). Dues to some of the lower driver ware. Got a really good guy working on this. Don’t see those issues with Intel MPI. Only notice a performance difference on a couple of very specific codes.

PIO

NH: Last time updated on getting async IO working. PIO library allows synch or asynch IO servers. Trying to get async working. Had to change OASIS version, which meant changes to coupling code. Also change handling of communicators. All done. Very close to getting working. Some memory corruption inside C library within PIO. Fairly new code, especially FORTRAN API. Maybe just running into some bugs. Same thing happened when I first started with PIO. Test case is very simple. Single model, single CPU doing IO. Our case is more complex. In the process of working out what is going on. Now have to run with valgrind. Couldn’t find with code inspection. AH: Make test case more complex until it falls over? NH: My 1 degree test case is pretty simple. Errors early on. Good idea, might try that. Running valgrind on test case would be much simpler. AH: Can be tricky to make a test case that fails. NH: Run of the mill memory corruption should pop up with test case. AH: Maybe just adding one more CPU will make it fail.

NH: PIO devs good to work with. Get changes in quickly, feel positive about it.

ACCESS-CM2-025

PL: Have accumulated some fixes to CICE. Code for CM2 was a file copy rather than pulled from GitHub. Don’t know where to put my fixes. Currently thinking of creating a branch from my own fork of CICE and put it there. One example in ice_grid.f90 looks like if auscom is defined then OpenMP won’t work. Not sure how OpenMP works with CICE. NH: Don’t remember that in our version of CICE. Got the entire bundle from Martin Dix as stand-alone test case for ACCESS-CM2.

PD: There is a repository for CICE. Would be good to create a branch for that. Defer back to Martin where to put it. Maybe a .svn dir in tarball? PL: Possibly? In first instance will create a branch in my fork on my own GitHub CICE repo. PD: Do an svn info. Not sure if it is the same code version as CMIP runs. PL: Will take offline.

AH: I am working on CICE harmonisation between OM2 and CM2. My strategy was to look at subversion history and trace it back to the shared history of the CICE version we have in COSIMA. Got very close, maybe a commit away from linking them. CICE version used in the CMIP runs have a single large commit from UKMO after versions diverged. NH cloned from Hailin’s repo but before this large commit. A lot of changes made similar files. Intention was to make a branch on our CICE repo and a pull request where we could see all the changes are. Maybe PL could make his changes there. We are making this for ACCESS-CM2-025, not sure if the main ACCESS-CM2 will end up pulling from the same repository in the future.

NH: Is this everything except the driver? In the drivers folder there is auscom and accesscm drivers, not changing those? AH: Not as far as I know. Pulled in a lot of code changes. Wanted something people could look at, and see if changes affect OM2. Maybe don’t have time to have a version that OM2 and CM2 could work. Could be a separate branch.

PL: Harmonisation is great for MOM5. Would be great to have CICE harmonised, but much more tightly bound to atmosphere model. Hence big changes with GA7 release. Need to make sure CICE correctly coupled to UM. CICE is intermediary between atmosphere and ocean. Not sure what costs and benefits would be. Dave Bi would know, how much effort and what scope. AH: Valuable even having in the same repo. Can cherry-pick come of the changes NH has and will make. Maybe a 0.25 might need some of the changes NH has made for decent performance. PL: There is a lot of interest in the improvements and bug fixes, but not sure from this distance about effort required.

NH: We are completely up to date with upstream CICE5. Half a dozen commits that are good and valuable. Also brought some things in from CICE6. Not scientific. Less of an issue. PL: Also updating OASIS coupler? Will that complicate with UM? NH: I’ve updated to OASIS4, should be some performance improvements. Not a lot of coupling time, not sure how much difference it will make, but it is a bottle neck, so any improvement will make an impact. Upgrading to OASIS was flawless except for warning for unbalanced Comms. No changes to namcouple or API.