Technical Working Group Meeting, July 2017

Minutes

Date: 11th July 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS ANU)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale)

Reproducibility

  • Peter is looking at reproducibility across resubmit periods. ocean_solo.F90 has changed since old CMIP5 setup. A call to the coupler has been commented out. May have upset forcings? CMIP5 worked. Doesn’t work on CMIP6.
  • Some discussion ensued about when and who might have changed the file.
  • Roger says Martin Dix sees differences in restarts happening in Red Sea. Nic suggested turning off red sea fix and see if reproducibility issues goes away.
  • Nic: what is plan of action of ocean_solo issues? Peter will send a diff to Nic. Roger will ask Martin about turning off Red Sea Fix.
  • Nic can help Peter with updating their GitHub fork of MOM.
  • Nic wondered if there is there a better way to do the Red Sea fix? Use sponges in localised areas? This is the way it is done in other models. Our method is not particularly standard. Specialised code runs over specific region doing clamping. Maybe do this more generically? Russ: reason it is done that way is to conserve the salt. Restoration doesn’t conserve salt. Nic: open channels? Russ: don’t want to change land masks. Not issue of channel size, just not enough mixing. They are mixing locations far apart. Can’t do cross-land mixing. Not on the same processor. Tenth and quarter don’t need need it this fix.
  • Russ: might be an issue about when Red Sea fix is called. Done so many steps after start of the model, so will see a different history of the model. Maybe should change to time rather than step based call. Nic reports that Fabio says the red sea fix never runs on first coupled run. After looking at code Nic says a single 2 day run will never do salinity fix. 2×1 day runs will do salinity fix twice. Code is not reproducible.
  • Marshall asked Nic if ACCESS-OM2 models are reproducible? Nic: don’t know, haven’t done that yet.

COSIMA Models

  • Russ fixed heat budget in ACCESS-OM2.
  • Nic used Russ’ offline kd-tree runoff regridding and implemented it online. Without conservation checks it is fast enough to run online. Russ: set up connections and read them in? Nic: build tree once at beginning of run, and tree searched at runoff frequency.  Being run on MATM core. Hopefully won’t slow other models as it is doing it in parallel while other models working.
  • Nic: this runoff regridding might be relevant to coupled model. Don’t know how runoff works, but believe it has to do with land/sea masks match as closely as possible. As they’re different resolutions might still lose some. This technique is guaranteed to get all runoff into ocean. Also if you change ocean mask, you have to also change your atmosphere land/sea mask. This would avoid this. Nic asked Peter/Roger if they were 100% certain all runoff goes into ocean? Peter didn’t know for sure. Roger reported that this was anecdotally a problem. Peter will pass idea on to Dave.
  • Russ: did you just implement nearest neighbour or spread? Nic: no spread. Russ: will blow up near amazon. Nic: doing a conservative remapping onto fine model grid and will then remap from each land grid cell with runoff. So will not dump all runoff in one location. Aidan suggested river spread module could be used to redistribute runoff, but Russ said better not to use river spread if can be avoided (can be across cells and so increase communication, slow model).
  • Aidan explained Andrea Dittus had salinity issues with her coupled chemistry model that were to do with a bad river routing table. Maybe this approach could help?
  • Aidan explained the JRA55 data set, as a replacement for CORE II and how the RYF data was created as a replacement for CORE NYF. There was interest amongst the group at using JRA55.
  • Matt explained CORE II is a weird reanalysis product which is a mish-mash of other products. Some of the component products have ceased so CORE II also ceased.
  • Aidan explained the JRA55 IAF forcing dataset is incompatible with MOM, as it is split into separate years. Aidan developed some rudimentary code to support time formats in the data_table, but this breaks on time interpolation.
  • Nic thinks we should use OpenDAP to overcome this. OpenDAP access via URLs is fully supported by netCDF library. Should work in MOM. Marshall wonders if it would be too slow. Aidan also pointed out that it would require an OpenDAP/THREDDS server which is not publicly facing as JRA55 has limits on redistribution. Nic made an issue for this on MOM5 repo already.

Benchmarking

  • Marshall: NCI needs benchmarking code/config ASAP. Want to package MOM benchmarks. Currently packing stock MOM-SIS-025. Can’t choose everything. Will dilute scores.
  • Marshall: Is the Hobart THREDDS data ok? Nic: Put up 2-3 years ago. Maybe worth running through it all to make sure it works ok.
  • Can’t use coupled model due to UM licensing.
  • Wants MOM6. Not sure which.
  • Do we want to include ACCESS-OM2? Nic: yes want OM2 tenth. Marshall: restricted by CPU count. Can’t really bench tenth model. 1000 CPUs was too big for Broadwell expansion. 500 was the limit. 1000 might be pushing it.
  • Aidan had a bunch of tenth configs when checking out optimal configurations for production. Will look into tenth layout configs.
  • Roger: looking at N96 benchmark from MetOffice that doesn’t run. How does Bureau do benchmarking? Marshall: BoM gets vendors to sign confidentiality contracts. Need lawyers but NCI might not.
  • Smallest benchmark. Maybe less than 1000CPUs.

Actions

New:

  • Aidan to tell TWG about JRA55 location.
  • Aidan investigate tenth degree MOM configs for benchmarks.
  • Possible bench-mark configs (everyone)
  • Nic to help Peter get his MOM repo up to date with MOM5 master branch, and then merge changes
  • Look into OpenDAP/THREDDS for use with MOM on raijin (Aidan, Nic, Marshall)

Existing:

  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)
  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Do longer runs with Nic’s 1 deg and 0.25 deg ACCESS-OM2-JRA55 configs (Andy and Aidan)
  • Try repeat year forcing with Nic’s configurations (Nic and Andy)
  • Create document outlining options for configuration sharing (?)
  • Test OpenDap netcdf (Aidan)