Technical Working Group Meeting, May 2017

Minutes

Date: 16th May 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen and Andy Hogg (ARCCSS ANU)
  • Scott Wales (ARCCSS Melbourne Uni)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Justin Freeman (BoM)
  • Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale)

Updates

  • Roger is having issues with N95 atmosphere. Marshall says it doesn’t scale past 256 cores. Roger would like to get RTM profiling working. Martin and Peter have got multiple threads working in AMIP. Has spoken to Scott Wales. Something odd happens in decomposition. Marshall will ask Dale Roberts about effects of OpenMP. Has chemistry has been enabled for OpenMP?

Liaison with COSIMA Management

  • Andy is here to get some feedback about our activities. Andy thinks TWG is doing a good job of communicating, a goal is to get more communication amongst COSIMA in general. Management team meets quarterly. No science talk or detail. Have to be better at merging/converging disparate code, TWG is crucial for this.
  • Andy wants a better framework for analysing and post-processing runs, and access others outputs. And we are currently doing some work in this space. The effort it is scattered, no one lead. Nic has done some work in past. Now using ipython notebooks to share analysis. James Munroe working on dashboards. Justin asked if COSIMA will deliver this? Andy: not explicitly funded but will benefit uptake. Need better ways to serve data.
  • Andy would like more engagement around COSIMA website. Blog our progress? Marshall: do you have sample topics? Is it legitimate to put updates on issues with library versions say? Andy: borderline, maybe just TWG for that example. Currently reply on members of TWG to propagate information back to users. Maybe more relevant would be update on scaling of code for example. Don’t want to limit people now. Encourage as much as possible and filter if required. Peter agreed: get useful information up there. Haven’t had anything over the last year. Suggestion to make minutes more report like? Nic thought blog posts are a nice idea, but need deeper insight to be useful and interesting.
  • Andy wants science side to publish results of runs, and point to data.
  • Andy also keen for COSIMA to have information about model versions. ACCESS doesn’t have a way of releasing versions and hosting code. ACCESS is somewhat hobbled with partner disputes. Would like ACCESS-OM releases on COSIMA. Marshall pointed out the TWG was set up to address this. Models were not ready at that stage. Marshall suggested we make a proper plan for a release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model.
  • What are the expectations of TWG for the COSIMA Meeting? Andy: some already giving talks. Interact and discuss with others. Get to know each other and ambitions and look for synergies.
  • Andy will contact Paul to make sure TWG will have a slot to fill in others on progress.
  • Andy would like an email list for COSIMA announcements.

COSIMA Models

  • Nic: Was in Canberra week ago. Had tenth timestepping on <3000 cores. More than 3000 didn’t initialise. MCT couldn’t set up routing tables with more than 3000 cores. Would just hang. After discussions figured out some MPI switches and flags to get it working: mxm mtl makes it work better. Justin suggested this would make a good blog post. Marshall found MOM6 was failing at 3000 cores too. Went away with mxm mtl.
  • Andy: MOM-SIS tenth was also failing. About 30% fails. Russ has had similar. Nic now running on 6K cores in ocean.
  • Nic had discussions in Canberra around CICE halo updates. Made 12+ changes to CICE and MATM code. Made big improvements. For all three model resolutions (1,0.25,0.1) overhead of coupling is 1-2% compared to MOM solo. That is a tiny serial bit of interpolating forcing fields on to ocean. Like 20s/month for quarter degree.
  • Nic: quarter degree 1800s time step should be less than 75min/year. Andy: UNREAL!. 1 deg is also running super fast. 50 years/day. Did a new compile on MCT library to squeeze as much performance as possible.
  • Now have 3 new configs. 1 deg and 0.25 deg could be used. Focussing on tenth at the moment.
  • Andy: 70min MOM-SIS-025  + CORE, JRA55 adds 30%.
  • In old config, all models block waiting for MATM to read files. Now MATM has sent everything. Reduces difference between CORE and JRA55. Nic has not done longer runs as yet. No longer buffering multiple years of MATM output.
  • Agreed Andy should get these configs and do some longer runs.
  • Andy talking to NCAR about JRA55 forcing. CORE used NYF. JRA doesn’t have that. Others have used a single year. Our strategy is May-May forcing with a shock at the end of May. Candidate years are 84/85, 91/92, 03/04. Want to test this at 0.25 deg and 1deg. Should adopt just what Nic has done.
  • These are MOM-SIS as Andy wants a baseline. Doing MOM-SIS from CORE with WOA13. Repeat with JRA55 RYFs. Want to compare to ACCESS-OM config of Nic.
  • Nic and Andy to talk offline and try out a repeat year.
  • Although it is fast, the tenth is inefficient, as there is no ocean masking currently. This is the next priority. Probably beyond Nic’s current contract.
  • Nic can I use unmasked restarts? Russ: yes. Just need to combine them.

COSIMA Workshop

  •  Marshall: can we agree to transfer to CM2? To get on common version of CICE.
  • Andy: will the code we release in OM2 be different in ocean and/or ice? Can we manage it in one codebase? Marshall: should be possible. Set aside time to discuss this at meeting.
  • Discuss moving to common CICE repo for all.
  • Marshall: Justin need some info from us on OM config? Nic and Justin will liase.
  • Justin won’t be at the COSIMA Meeting, does he want us to cover anything? Justin: staying up to date with what we’re doing, will be engaging much more in future.

Updates on previous actions

  • Nic: has updated OceansAus repo to Peter’s CICE. Can Peter look at the code and check it.
  • Marshall: Justin need some info from us on OM config? Nic and Justin will liase.
  • Russ has been doing a lot of clicking for bathymetry. Aus and PNG done. Need help.

Actions

New:

  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Contact Paul Spence about TWG speaking slot at meeting (Andy)
  • Prepare slides for TWG presentation at COSIMA meeting, and present (Aidan and Marshall)
  • Email list for COSIMA announcements (Aidan)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Do longer runs with Nic’s 1 deg and 0.25 deg ACCESS-OM2-JRA55 configs (Andy and Aidan)
  • Try repeat year forcing with Nic’s configurations (Nic and Andy)

Existing:

  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)