Technical Working Group Meeting, June 2017

Minutes

Date: 13th June 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS ANU)
  • Scott Wales (ARCCSS Melbourne Uni)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Justin Freeman (BoM)
  • Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale)

COSIMA Workshop

  •  Marshall: Good meeting. A couple of main messages:
    1. Must have convergence of source code
    2. System for efficient sharing of model configurations
  • First item we are well on the way to achieve. Second has never been addressed.

Configuration Sharing

  • Marshall: Should focus our energies on configuration sharing.
  • Marshall: having configurations in OceansAus is close to what we want?
  • Currently when using payu configs are saved into local git repos, free floating. This could work for everyone. Should gather config files for runs.
  • Aidan: can have a GitHub organisation and push configurations there, or push to own GitHub accounts and tag repositories to allow search and discovery. Use OceansAus or dedicated GitHub organisation for more canonical configurations.
  • Peter agreed it is ok to have many thousands of configurations. Scott agreed as long as there is a canonical version that is easily identifiable for others to fork from.
  • Nic felt that unless there was a good pay off most individuals would not make the effort to push their configurations up to shared spaces, or make the visible to others, but there is a pay-off for everyone if configurations are shared.
  • Nic: How do create a searchable index or database for others to find the stuff? May be bigger problem than storing data.
  • Marshall: Payu users can be forced to do it. Tracking history not seem such a big deal but a pay off for users down the track
  • Pete: if someone wants to use my config I give them my config and people branch off that. rosie-graph will show relationships between them.
  • Marshall: rose solves a lot of these problems already. We don’t benefit from that. Do we need to just use rose or emulate what it does best?
  • Scott: rose setup: metadata file in each repo branch, hook within repo which reads meta data file and adds to database. Currently there is not a lot of meta data in most branches.
  • Justin: does rose use git? Scott: should be back-end agnostic. Nic: possible solution? Use GitHub for configs, but use rose DB to share and index configurations?
  • Marshall: should we draft a plan?
  • Nic: Just try with rose? Gives us DB which is searchable and viewable. Doesn’t stop anyone from working in the way we want?
  • Marshall: Do we need accessdev to do this? Scott: I think it is a just a sqlite DB, so can just be on filesystem. Hooks in GitHub repo won’t work the same way to update it however.
  • Marshall: just start with something searchable? Nic: yeah, see what people are running.
  • Scott: Just start with a simple list. Nic: would want a GitHub repo and commit hash in meta data as a minimum.
  • Justin: make notes about what we think is possible. Common set of functionality will guide us. Rose sounds good, but may be too large a solution? Want something that won’t get in my way and lightweight. Don’t know too much about rose. Everyone agrees this is a good idea.
  • Scott: can just have a list of experiments. Don’t need the editor. Will send around link so Justin and others can take a look.
  • Marshall: really need a bunch of git repos and a way to organise them. Justin: metadata would be a great, to find things without bothering others.
  • Marshall: how do we share input data?
  • Justin: share through RDS? Jingbo working on this.
  • Nic: Justin said all our data sources should be pointing to URLs. Great idea! Never have to care about where data is coming from. Input through THREDDS URIs. Marshall: too ambitious? If you’re using netCDF interface, the library takes care of network. Should all happen under netCDF. Model thinks it is a file.
  • Aidan: slower? Nic: might not be slower?

COSIMA Models

  • Marshall: talking with Peter and Roger at optimisation meetings, seem to be bit-reproducibility issues with CM2. Has Nic ever looked at bit repro in OM2?
  • Nic: have checksums, and make sure they don’t change with code changes under same layout etc. haven’t noticed things aren’t reproducible in a typical short run. As soon as processor layout changes you’ll run into issues.
  • Peter: if run model 3 months and then 1 month at a time, get different answers. A few degrees over a few months. Restarts are not reproducible. CMIP5 worked ok. Now doesn’t work. Maybe need to check that the ice dumping ok.
  • Marshall: have faith in MOM5 core. Issues with flux exchange reproducibility. Anyone got issues with that?
  • Russ: Fabio Dias and Russ have been working on an energy leakage issue. MOM-SIS conserves, MOM-CICE has small leakage. Is a computational issue. ASCII diagnostics don’t close. Should close 10^-12 W/m2, only 10^-6 W/m2.
  • Marshall: Has CM2 been subjected to same scrutiny for flux exchange. How far has Dave Bi got?
  • Peter: does UM have this issue? Scott: bit repro is heavily tested. Processor decomp and run length shouldn’t be an issue.
  • Peter: maybe CICE is a bit more vulnerable? Nic: if someone hasn’t made an effort to make it reproducible, odds on that is isn’t. As far as Nic knows no checking has been done on the ACCESS-CM/OM specific boundary code. Suspicious this might be the issue.
  • Roger: reproducibility tends to be over the higher latitudes and higher altitudes. A couple of degrees over a few months.
  • Marshall: MOM5 is not reproducible over processor layout changes north of tripole. There is an expensive operation in the MOM5 flux exchange that isn’t turned on by default. Scott: MPI reduce is not generally reproducible unless special steps taken.
  • Nic: repro needs to be tested with correct compiler flags turned on. Normally optimised code will not reproduce. Roger did this, but will double check this for CICE. Should have fpmodel precise across all 3 models.
  • Russ: fixed ice salinity issue that Nic had heard of before from Dave Bi. Was a default namelist option that people don’t often change. Nic: will forward the conversation to Roger.
  • Marshall: if we routinely shared configs this would not be an issue.

Actions

New:

  • Create document outlining options for configuration sharing (?)

Existing:

  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)
  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Do longer runs with Nic’s 1 deg and 0.25 deg ACCESS-OM2-JRA55 configs (Andy and Aidan)
  • Try repeat year forcing with Nic’s configurations (Nic and Andy)