Technical Working Group Meeting, August 2017

Minutes

Date: 8th August 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS ANU)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale)

COSIMA Models

  • Nic: Toy model is good for yes/no type tests
  • Invite James Munroe to the TWG meetings. He could be of assistance getting a test suit together.
  • Nic: using Issues interface on github has been very helpful. Hasn’t written emails and has answers to problems. Steve and Russ been super helpful. Marshall: good that users have been Issues. Nic: every time I go to write an email, can I make this a github issue? Dave Bi, Siobhan have useful input. Marshall: how do we get them onto github too? Russ: Arnold has used it. Hopefully Dave and Siobhan will jump on to it also.
  • Peter: using a different MOM. Nic and Peter to meet to sort this out. Roger: can cice be on the same repo? Nic + Peter will liase.
  • Invite Arnold to these meetings.
  • Marshall: Bit repo issues. Surprised ACCESS-CM2 has bit reproducibility problems. Nic found 4 issues that affected this. Payu was missing restarts. Also found a couple of issues where there was a different code branch for first coupling time step. Red Sea and another one. Added checksums to all coupling fields. And tested all of these. Now restarts ok from restarts, but 3 or 4 time steps starts to diverge. Probably still some small issue from restarts. Wasn’t just one problem. Some of these were Nic’s issue with restarts and payu. Wasn’t just one problem. Some were particular with Nic’s setup. When it is solved can talk to CM guys. The general pproach is useful though. Concentrate on coupling fields.
  • Pete: combing log files and looking for checksums and numbers to compare between the runs.
  • Nic: once I get through all this I can talk to Peter about the method and use it to work on Peter’s issues.
  • Marshall: was Red Sea confirmed to be a repro issue? Did Russ fix it? Russ: Aspendale code was fixed. Marshall: was fixed in Aspendale but not main repo? Yes. Marshall: Arnold knew about that. Peter: we had the fix in that case and hadn’t shared it.

CMIP6

  • Peter talked to David Karoly. Wondering about spin-up for CMIP6. 500-1000yr spinup. Is it possible to spin up ocean first by forcing an Ocean/Ice model with JRA? Then add CM when it is ready. Ice issues might make big difference to stratification. Aidan: model drift issues won’t help. Marshall: can you stabilise stratification with OM run? Russ: deep stuff takes thousands of years to get into steady state. Marshall: ask this at a MOM meeting? Russ: ask Ocean modellers.
  • What is CMIP schedule? Peter+ Roger: about six months behind. Start production run early next year. Still working on configuration. Coupled cable model running now. Reproducibility issue has become more issues which need to be nailed down.
  • Marshall: when do you feel like you need to fix source/versions? Roger: delaying until the end of the year. How long would a 500 year 1 deg MOM spin up take? Nic: 0.5 hour/year. 50 years/day without queuing issues. Have to take crashing into account. Marshall: does 1 deg crash? Nic: probably not.
  • Nic: created issue recently, wanted to 50 years in single submit. Memory leaks limit how long you can run. Maybe only 3 years.
  • Aidan: Should add multi-year runs per submission ability to payu.

Bathymetry

  • Aidan: how do you deal with non-advective cells? Russ: it is potholes with no advective velocity possible. If you allow cells to fill if they’re too thin, can create cells that have no velocities.
  • Russ to add his code to OceansAus repo.

New HPC

  • Marshall: Tender for new machine. Understand current limits of codes, and if new machine will work, and what we need to get more performance. Convinced MOM is a RAM bound code. Vectorisation is not making a difference. Want a machine with more RAM bandwidth, not more vectorisation. Away from KNL and SkyLake, towards IBM power and AMD.
  • Peter: Met Office XC40 can run coupled model with 48*24 processors. They are 32 processor  Broadwell nodes. Marshall: maybe running more threads? Roger: they run 2 threads.
  • Marshall:  bring errors that stop it working. Roger: ok, will get some info together.
  • Marshall: incorrect Message Parsing and halo understanding. MPI messages in MOM5 are healthy. Get GB/s bandwidth, even corners. Problems are related to library or load imbalance, or maybe CPU throttling. We are doing a reasonable job of MPI. Faster interconnect may be useful. Broadwell is 10-15% faster, as it has faster interconnect.

Actions

New:

  • Invite Arnold Sullivan and James Munroe to TWG meetings.
  • Add feature request to payu: multiple runs per submission
  • Ask MOM Ocean meeting about 1000yr OM spin-up possibility
  • Russ to add all his ocean bathymetry code to OceansAus repo.

Existing:

  • Aidan investigate tenth degree MOM configs for benchmarks.
  • Possible bench-mark configs (everyone)
  • Nic to help Peter get his MOM repo up to date with MOM5 master branch, and then merge changes
  • Look into OpenDAP/THREDDS for use with MOM on raijin (Aidan, Nic, Marshall)
  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)
  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Do longer runs with Nic’s 1 deg and 0.25 deg ACCESS-OM2-JRA55 configs (Andy and Aidan)
  • Try repeat year forcing with Nic’s configurations (Nic and Andy)
  • Create document outlining options for configuration sharing (?)

Technical Working Group Meeting, July 2017

Minutes

Date: 11th July 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS ANU)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale)

Reproducibility

  • Peter is looking at reproducibility across resubmit periods. ocean_solo.F90 has changed since old CMIP5 setup. A call to the coupler has been commented out. May have upset forcings? CMIP5 worked. Doesn’t work on CMIP6.
  • Some discussion ensued about when and who might have changed the file.
  • Roger says Martin Dix sees differences in restarts happening in Red Sea. Nic suggested turning off red sea fix and see if reproducibility issues goes away.
  • Nic: what is plan of action of ocean_solo issues? Peter will send a diff to Nic. Roger will ask Martin about turning off Red Sea Fix.
  • Nic can help Peter with updating their GitHub fork of MOM.
  • Nic wondered if there is there a better way to do the Red Sea fix? Use sponges in localised areas? This is the way it is done in other models. Our method is not particularly standard. Specialised code runs over specific region doing clamping. Maybe do this more generically? Russ: reason it is done that way is to conserve the salt. Restoration doesn’t conserve salt. Nic: open channels? Russ: don’t want to change land masks. Not issue of channel size, just not enough mixing. They are mixing locations far apart. Can’t do cross-land mixing. Not on the same processor. Tenth and quarter don’t need need it this fix.
  • Russ: might be an issue about when Red Sea fix is called. Done so many steps after start of the model, so will see a different history of the model. Maybe should change to time rather than step based call. Nic reports that Fabio says the red sea fix never runs on first coupled run. After looking at code Nic says a single 2 day run will never do salinity fix. 2×1 day runs will do salinity fix twice. Code is not reproducible.
  • Marshall asked Nic if ACCESS-OM2 models are reproducible? Nic: don’t know, haven’t done that yet.

COSIMA Models

  • Russ fixed heat budget in ACCESS-OM2.
  • Nic used Russ’ offline kd-tree runoff regridding and implemented it online. Without conservation checks it is fast enough to run online. Russ: set up connections and read them in? Nic: build tree once at beginning of run, and tree searched at runoff frequency.  Being run on MATM core. Hopefully won’t slow other models as it is doing it in parallel while other models working.
  • Nic: this runoff regridding might be relevant to coupled model. Don’t know how runoff works, but believe it has to do with land/sea masks match as closely as possible. As they’re different resolutions might still lose some. This technique is guaranteed to get all runoff into ocean. Also if you change ocean mask, you have to also change your atmosphere land/sea mask. This would avoid this. Nic asked Peter/Roger if they were 100% certain all runoff goes into ocean? Peter didn’t know for sure. Roger reported that this was anecdotally a problem. Peter will pass idea on to Dave.
  • Russ: did you just implement nearest neighbour or spread? Nic: no spread. Russ: will blow up near amazon. Nic: doing a conservative remapping onto fine model grid and will then remap from each land grid cell with runoff. So will not dump all runoff in one location. Aidan suggested river spread module could be used to redistribute runoff, but Russ said better not to use river spread if can be avoided (can be across cells and so increase communication, slow model).
  • Aidan explained Andrea Dittus had salinity issues with her coupled chemistry model that were to do with a bad river routing table. Maybe this approach could help?
  • Aidan explained the JRA55 data set, as a replacement for CORE II and how the RYF data was created as a replacement for CORE NYF. There was interest amongst the group at using JRA55.
  • Matt explained CORE II is a weird reanalysis product which is a mish-mash of other products. Some of the component products have ceased so CORE II also ceased.
  • Aidan explained the JRA55 IAF forcing dataset is incompatible with MOM, as it is split into separate years. Aidan developed some rudimentary code to support time formats in the data_table, but this breaks on time interpolation.
  • Nic thinks we should use OpenDAP to overcome this. OpenDAP access via URLs is fully supported by netCDF library. Should work in MOM. Marshall wonders if it would be too slow. Aidan also pointed out that it would require an OpenDAP/THREDDS server which is not publicly facing as JRA55 has limits on redistribution. Nic made an issue for this on MOM5 repo already.

Benchmarking

  • Marshall: NCI needs benchmarking code/config ASAP. Want to package MOM benchmarks. Currently packing stock MOM-SIS-025. Can’t choose everything. Will dilute scores.
  • Marshall: Is the Hobart THREDDS data ok? Nic: Put up 2-3 years ago. Maybe worth running through it all to make sure it works ok.
  • Can’t use coupled model due to UM licensing.
  • Wants MOM6. Not sure which.
  • Do we want to include ACCESS-OM2? Nic: yes want OM2 tenth. Marshall: restricted by CPU count. Can’t really bench tenth model. 1000 CPUs was too big for Broadwell expansion. 500 was the limit. 1000 might be pushing it.
  • Aidan had a bunch of tenth configs when checking out optimal configurations for production. Will look into tenth layout configs.
  • Roger: looking at N96 benchmark from MetOffice that doesn’t run. How does Bureau do benchmarking? Marshall: BoM gets vendors to sign confidentiality contracts. Need lawyers but NCI might not.
  • Smallest benchmark. Maybe less than 1000CPUs.

Actions

New:

  • Aidan to tell TWG about JRA55 location.
  • Aidan investigate tenth degree MOM configs for benchmarks.
  • Possible bench-mark configs (everyone)
  • Nic to help Peter get his MOM repo up to date with MOM5 master branch, and then merge changes
  • Look into OpenDAP/THREDDS for use with MOM on raijin (Aidan, Nic, Marshall)

Existing:

  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)
  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Do longer runs with Nic’s 1 deg and 0.25 deg ACCESS-OM2-JRA55 configs (Andy and Aidan)
  • Try repeat year forcing with Nic’s configurations (Nic and Andy)
  • Create document outlining options for configuration sharing (?)
  • Test OpenDap netcdf (Aidan)

Technical Working Group Meeting, June 2017

Minutes

Date: 13th June 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS ANU)
  • Scott Wales (ARCCSS Melbourne Uni)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Justin Freeman (BoM)
  • Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale)

COSIMA Workshop

  •  Marshall: Good meeting. A couple of main messages:
    1. Must have convergence of source code
    2. System for efficient sharing of model configurations
  • First item we are well on the way to achieve. Second has never been addressed.

Configuration Sharing

  • Marshall: Should focus our energies on configuration sharing.
  • Marshall: having configurations in OceansAus is close to what we want?
  • Currently when using payu configs are saved into local git repos, free floating. This could work for everyone. Should gather config files for runs.
  • Aidan: can have a GitHub organisation and push configurations there, or push to own GitHub accounts and tag repositories to allow search and discovery. Use OceansAus or dedicated GitHub organisation for more canonical configurations.
  • Peter agreed it is ok to have many thousands of configurations. Scott agreed as long as there is a canonical version that is easily identifiable for others to fork from.
  • Nic felt that unless there was a good pay off most individuals would not make the effort to push their configurations up to shared spaces, or make the visible to others, but there is a pay-off for everyone if configurations are shared.
  • Nic: How do create a searchable index or database for others to find the stuff? May be bigger problem than storing data.
  • Marshall: Payu users can be forced to do it. Tracking history not seem such a big deal but a pay off for users down the track
  • Pete: if someone wants to use my config I give them my config and people branch off that. rosie-graph will show relationships between them.
  • Marshall: rose solves a lot of these problems already. We don’t benefit from that. Do we need to just use rose or emulate what it does best?
  • Scott: rose setup: metadata file in each repo branch, hook within repo which reads meta data file and adds to database. Currently there is not a lot of meta data in most branches.
  • Justin: does rose use git? Scott: should be back-end agnostic. Nic: possible solution? Use GitHub for configs, but use rose DB to share and index configurations?
  • Marshall: should we draft a plan?
  • Nic: Just try with rose? Gives us DB which is searchable and viewable. Doesn’t stop anyone from working in the way we want?
  • Marshall: Do we need accessdev to do this? Scott: I think it is a just a sqlite DB, so can just be on filesystem. Hooks in GitHub repo won’t work the same way to update it however.
  • Marshall: just start with something searchable? Nic: yeah, see what people are running.
  • Scott: Just start with a simple list. Nic: would want a GitHub repo and commit hash in meta data as a minimum.
  • Justin: make notes about what we think is possible. Common set of functionality will guide us. Rose sounds good, but may be too large a solution? Want something that won’t get in my way and lightweight. Don’t know too much about rose. Everyone agrees this is a good idea.
  • Scott: can just have a list of experiments. Don’t need the editor. Will send around link so Justin and others can take a look.
  • Marshall: really need a bunch of git repos and a way to organise them. Justin: metadata would be a great, to find things without bothering others.
  • Marshall: how do we share input data?
  • Justin: share through RDS? Jingbo working on this.
  • Nic: Justin said all our data sources should be pointing to URLs. Great idea! Never have to care about where data is coming from. Input through THREDDS URIs. Marshall: too ambitious? If you’re using netCDF interface, the library takes care of network. Should all happen under netCDF. Model thinks it is a file.
  • Aidan: slower? Nic: might not be slower?

COSIMA Models

  • Marshall: talking with Peter and Roger at optimisation meetings, seem to be bit-reproducibility issues with CM2. Has Nic ever looked at bit repro in OM2?
  • Nic: have checksums, and make sure they don’t change with code changes under same layout etc. haven’t noticed things aren’t reproducible in a typical short run. As soon as processor layout changes you’ll run into issues.
  • Peter: if run model 3 months and then 1 month at a time, get different answers. A few degrees over a few months. Restarts are not reproducible. CMIP5 worked ok. Now doesn’t work. Maybe need to check that the ice dumping ok.
  • Marshall: have faith in MOM5 core. Issues with flux exchange reproducibility. Anyone got issues with that?
  • Russ: Fabio Dias and Russ have been working on an energy leakage issue. MOM-SIS conserves, MOM-CICE has small leakage. Is a computational issue. ASCII diagnostics don’t close. Should close 10^-12 W/m2, only 10^-6 W/m2.
  • Marshall: Has CM2 been subjected to same scrutiny for flux exchange. How far has Dave Bi got?
  • Peter: does UM have this issue? Scott: bit repro is heavily tested. Processor decomp and run length shouldn’t be an issue.
  • Peter: maybe CICE is a bit more vulnerable? Nic: if someone hasn’t made an effort to make it reproducible, odds on that is isn’t. As far as Nic knows no checking has been done on the ACCESS-CM/OM specific boundary code. Suspicious this might be the issue.
  • Roger: reproducibility tends to be over the higher latitudes and higher altitudes. A couple of degrees over a few months.
  • Marshall: MOM5 is not reproducible over processor layout changes north of tripole. There is an expensive operation in the MOM5 flux exchange that isn’t turned on by default. Scott: MPI reduce is not generally reproducible unless special steps taken.
  • Nic: repro needs to be tested with correct compiler flags turned on. Normally optimised code will not reproduce. Roger did this, but will double check this for CICE. Should have fpmodel precise across all 3 models.
  • Russ: fixed ice salinity issue that Nic had heard of before from Dave Bi. Was a default namelist option that people don’t often change. Nic: will forward the conversation to Roger.
  • Marshall: if we routinely shared configs this would not be an issue.

Actions

New:

  • Create document outlining options for configuration sharing (?)

Existing:

  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)
  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Do longer runs with Nic’s 1 deg and 0.25 deg ACCESS-OM2-JRA55 configs (Andy and Aidan)
  • Try repeat year forcing with Nic’s configurations (Nic and Andy)

Technical Working Group Meeting, April 2017

Minutes

Date: 11th April 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Justin Freeman and Mirko Velic (BoM)

Updates on previous actions

  • Nic has transferred MOM code repos to a new organisation MOM-ocean
  • Official MOM website URL is now mom-ocean.science
  • MOM5 input files have been moved to a new location, administered by Paola Petrelli (ARCCSS CMS team) portal.sf.utas.edu.au/thredds/catalog/. Licensing has been confirmed as GPL by Stephen Griffies.

    COSIMA Workshop

    • Agreement that we want to present some summary of the work of the TWG to date. Important to show what we’ve done: Model sharing, Grid sharing, Important infrastructure activities.
    • Discussion around what to put cover in the presentation: What are the most important projects? What is most interesting? What do we want to work on?
    • Some ideas for important topics:
      1. Coherent sharing of experiments with others, as is possible with Rose/cylc
      2. Performance is always important, can motivate others to support our work
      3. Coherence around OASIS.
    • Nic will present a 1/10th coupled model update and results. Should have something decent by then
    • Need to raise issue of the coupler, present some options and get some feedback from the stakeholders. Justin is concerned about wave coupling in future. Inform wider group about what we’re thinking. Judge appetite for changing tools.
    • OASIS is not well liked, is difficult to use and performs poorly.
    • If someone has concerns about the coupler/MATM then we need to back that up with numbers. Marshall will start a new google discussion doc for this
    • Marshall has CM2 numbers that are difficult to understand. Probably not worth presenting to COSIMA
    • Aidan not presenting at present. Russ might talk about grid stuff. Matt might talk about some of the technical aspects of 0.25 deg run. Maybe parameterisation of bulk formulas.
    • There was a general feeling the TWG was functioning well and no significant change required. Matt suggested a face to face meet-up might be worthwhile. Mirko pointed out we need some metrics to show to others that TWG is a worthwhile venture and why we’re so pleased with ourselves

    Updates

    • Nic has had 3 week break. Visited GFDL. Presented his coupler at a coupling workshop. Talked to OASIS, MCT, ESMF and YAC (Yet Another Coupler). ESMF are building 2nd order conservative support into their remapping weight software which is good for us.
    • Matt: has been working with GFDL coupled model for Terry O’Kane. Using ACCESS 1 deg grid in CM2 with AM2 atmosphere. Several 100 years. Decadal forecasting. Atmosphere is same as CM2.1. Maybe 2.5 deg. Would be nice to have higher resolution, but for forecasting need ensembles. 10-40 ensembles. UM is too slow for this sort of work. There is some innterest in benchmarking for ocean structure, maybe against CM3. Maybe the go to 0.25 ocean, AM3 atmosphere model. Then put through data assimilation system. Then put in forecast mode. Look for predictive skill.

    COSIMA Models

    • Not made much progress on ACCESS-OM2-01. Technically coupled. Now trying to speed it up for multi-year runs. Currently running 600s timestep. Still in January.
    • Not currently JRA55 forced. Can’t talk about tenth ACCESS-OM2 performance yet.
    • CICE has some shocks on spin-up. Was aiming for 600s, but the current operational time step is 450s. Pulling back to 450s will be a lot easier. Goal to make 600s the prroduction time step. Russ can sometimes run 720s or even 900s for this tenth simulation.
    • Matt pointed out it is difficult to start a simulation with CICE when initial conditions have no sea ice. Better to create initial conditions from World Ocean Atlas.
    • Matt is now running COREII IAF. Sometimes blows up, just drops time step for a month, and if no issue, doesn’t try to diagnose problem, just goes back to normal time step. Matt is running 600s with a tenth MOM/SIS.
    • Aidan talked about issues with the current ACCESS-OM2-025+JRA55, discovered the scripts for MOM5 are not using the same OASIS build as the other components. Trying to work out why the model is running so slowly.

    Actions

    New:

    • Start a new google doc about coupler issues and MATM (Marshall)

    Existing:

    • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
    • Add Peter’s CICE5.1 config to OceansAus github repo (Nic and Peter)
    • Port MOM5 build system to cmake (Aidan)
    • Push updated MATM code with JRA-55 support to OceansAus github (Aidan)
    • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
    • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
    • Test Nic’s access-om model config on OceansAus (All)
    • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
    • Move to CICE5 on OceansAus repo (Nic).
    • Add new test cases to Jenkins test suite (Nic).

    Technical Working Group Meeting, March 2017

    Minutes

    Date: 14th March 2017
    Attendees:

    • Aidan Heerdegen, Scott Wales (ARCCSS, Acting Chair)
    • Nicholas Hannah (ARCCSS/Double Precision)
    • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
    • Peter Dobrohotoff (CSIRO Aspendale)

    Updates on previous actions

    • MATM update: Nic will do a code review and document probably beginning of April.
    • CICE5.1 on repo: This will be part of the ACCESS-OM-0.1 development when this code is incorporated into the model
    • Matt has 1/10th data, and diagnostics were ok. Needed them to explore how heat is going into the ocean, and how that is affected by bulk formula and forcing fields. Could be useful analyses for changing to JRA55.

    COSIMA Models

    • Nic time-stepping ACCESS-OM-0.1 with core forcing. Will not be working on this again for about a month. Is time stepping with CORE forcing CICE 4.1. Will add in cice5 and JRA55. Took about as long as he thought, but there were a number of issues that required fixing.
    • Nic wants to cleanup coupling code, remove cruft and commented out code. Also clean up date handling. There are twenty places where you have to change time step for example. Similar for run time. Need to set up regression tests to allow code changes without breaking stuff. In MOM and CICE namelists there are duplicate fields. Need code changes. Maybe C-style macros are a simple approach?
    • Nic Has developed some tools for generating model inputs required for ACCESS-OM. For example, creating the cice grid out of ocean grid. Using ESMF for creating grids and restarts. Started a wiki to document steps to develop config.

    Updates

    • Russ is working on some mixing models, more complex than KPP. GOTM was patched on to MOM. Have to advect around a couple of extra tracers but doesn’t seem to impact speed much. Supposed to be a lot better way of mixing.
    • Russ has been having issues with number of retries time outs with PBS, occurs in first few lines of code. Aidan suggested noting error message and PBS ids and passing on to NCI Help, as this is often the side effect of a bad node.
    • Nic also having issues with tenth. Crashes, with lots of MPI output and no error code. A lot of the processors were sitting in an MPP_global call. ACCESS-OM coupling code was making it fall over. Raijin doesn’t like MPP_global operations as implemented by FMS. Due to implementation issues. To globally distribute an array in FMS you use MPP_global, which uses multiple MPI_Sends, rather than MPI_broadcast.
    • Peter: status is similar to last month’s update.
    • Russ has found issues with MLD diagnostic in MOM. If only have 3 levels returns zero. Strange issues near coast, getting zeroes at those locations.
    • Russ will be working on bathymetry for COSIMA 1/10th configuration. Maybe need different bathymetry for climate and BRANS type runs? Aidan was adamant the goal was to have the same bathymetry for all COSIMA models
    • Russ is running his 1/10th degree simulation with a 900s time step, whereas Aidan’s configuration is 450s due to instability in Arctic
    • JRA-55 is now being housed centrally. All those who wish to use it can they please make their requirements (versions, update frequency) known to the ARCCSS CMS team, and Paola Petrelli in particular

    MOM5 Repo Move

    • Aidan has had some preliminary discussion with Stephen Griffies and Alistair Adcroft about having a new “official” home for the MOM5 source code repository. They favour a separate repo with the community supported model, and infrequent but regular updates of the “Official GFDL” repo version, which could be used by those needing a badge of officialness.
    • Nic felt that there was little point in having an “official” repo, especially if it just creates more work for little actual gain.
    • Nic was in favour of a standalone MOM github organisation, as it is bigger than any one of the groups that use it. It could also host all the older versions of MOM also. This was considered a good approach. If all partner organisations (COSIMA, ARCCSS, BoM, CSIRO, GFDL?) then gave this github organisation their support, similar to the CICE development model, this could tick the boxes for those that required an endorsed software product.

    Actions

    New:

    • Nic to organise transferring MOM code repos to a new organisation MOM-ocean

    Existing:

    • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
    • Add Peter’s CICE5.1 config to OceansAus github repo (Nic and Peter)
    • Port MOM5 build system to cmake (Aidan)
    • Push updated MATM code with JRA-55 support to OceansAus github (Aidan)
    • Get licensing for MOM5 input files (Marshall)
    • Work on hosting MOM5 input files on NCI THREDDS server (Marshall, Aidan)
    • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
    • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
    • Test Nic’s access-om model config on OceansAus (All)
    • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
    • Move to CICE5 on OceansAus repo (Nic).
    • Add new test cases to Jenkins test suite (Nic).

    Technical Working Group Meeting, February 2017

    Minutes

    Date: 14th February 2017
    Attendees:

    • Marshall Ward (NCI, Chair)
    • Aidan Heerdegen, Scott Wales (ARCCSS)
    • Nicholas Hannah (ARCCSS/Double Precision)
    • Justin Freeman and Mirko Velic (BoM)
    • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
    • Peter Dobrohotoff (CSIRO Aspendale)

    COSIMA Models

    • Nic is working on a COSIMA configuration with 1/10th degree ocean. Currently setting up all the files for a coupled ice/ocean. Using the existing ARCCSS 1/10th degree MOM5 ocean, but there is no 1/10th degree CICE. Have not previously coupled 1/10th MOM5 with OASIS. Plan is to use JRA-55 forcing.
    • OASIS won’t work as it has in the past, creating remapping weights and regridding forcings on to a 1/10th grid. Too slow. Currently takes 2-3 hours for the 0.25 deg grid. They use SCRIP for regridding, which has limitations: uniprocessor, conservative remapping not accurate in tripolar, Not quantified. Not too bad for 1/0.25 deg.
    • New method is to use ESMF. Doesn’t do 2nd order conservative remapping. First order conservative ok for now. Maybe revisit. Nic has set up test and evaluation cases to compare SCRIP and ESMF.
    • OASIS isn’t that great at regridding/remapping. If we’re using another tool for remapping, what is OASIS doing for us? Nic has written an OASIS replacement: TANGO. Simple and basic as possible. Uses ESMF. Two step process: generate regridding weights, and then distribute fields. Might be faster than OASIS, but performance isn’t a big issue for this.
    • Nic is writing a model configuration development toolchain? Supports different models. Documented here.
    • Justin agreed that OASIS is poorly designed, difficult to work with and not well liked. Suggestions to replace it have met with resistance from BoM. Might be able to push something under the TWG umbrella.
    • Aidan unhappy with MATM. Code has been changed ad-hoc to support new forcing datasets. Nic is happy to rewrite MATM it if Andy Hogg wants to. TANGO supports python/fortran/C++. Maybe re-write MATM in python? Not performance critical. Aidan not a fan of introducing a new dependency on python.
    • Aidan: working on ACCESS-OM2-025+JRA55. Problems with generating iced restart files.
    • Marshall: Working on CM2 with Peter’s help. ROSE suite needs some work, but in a reasonable state. Just profiling. Slight bug in OASIS restart file generation. Model would hang occasionally. Just a random hang. Due to a bug in MOM that Marshall had fixed. Source code had been reverted to a version that was not patched. Dave Bi had earlier described a bug that might be the same one?
    • Marshall has Profiler working in all 3 sub models. Some prelim numbers.
    • Need to update MOM source inside CM2. Use main trunk version and see what blows up. Marshall and Peter to work offline on getting their version up to date.
    • Do we need ACCESS branch on MOM repo? Definitely want everyone on the same repo as there are issues with some people using out of data code. Can we have more rigorous tagging of bug fixes for example? Which would allow bug-fixes to be incorporated in other versions. Nic thought that now we have better communication this may not such a problem?

    Updates

    • Peter is working on ACCESS-CM2 for CMIP6. Currently focussing on global atmosphere. MetOffice are atmosphere only, and incorporate into their coupled model. GA7.1 is next target for ACCESS-CM2. Not currently with CABLE. Prelim testing with GA7.1. PI will wait for UM10.6 for correct aerosol coding. Martin is focussing on 10.6. Won’t have a version of CM2 with 10.6 for a while. CABLE group working with 10.5, and always difficulty with incorporating CABLE in UM versions. CSIRO still doing present day in 10.3.
    • Russ found a resolution error when packing data that ended up destroying a 20 month assimilation run. Will be a complete spin up when restarted
    • Justin has been doing a lot of interpolation. Puddles natural neighbour library. Been using it a lot. Very flexible and nice library
    • Matt has provided WOMBAT code to Paul Spence so he can recompile with new libraries and with flags appropriate for new broadwell hardware on raijin.
    • Since shutdown Matt has also recompiled. Asks for 850 cpus, 6-7 hours / year. When processor count > 1.5K crashes too often.
    • Marshall: NCI trouble shooted 1.8.4 problems. MPI reduction operations no longer work 1.8.4.
    • Marshall: openmpi 2.0 is slower than 1.10.2.
    • Aidan reiterated: don’t use -03 with intel MPI.
    • Justin: BoM internal meeting regarding COSIMA project. All signed off. Developments over 1-2 year timeframe. Wavewatch 3 coupled into COSIMA code. Stefan and Mirko will be looking at that. Would like some feedback from TWG about assessing it, and getting back to the community. MOM+wavewatch coupled initially as a technical demo. Wavewatch 3 will be part of COSIMA. GFDL is interested in Wavewatch. Stephen Griffies was curious. People have coupled wave watch with MOM in past, so know it is possible. Maybe OASIS-MCT, or OASIS3. These were climate scale runs. Mark Hannah’s postdoc did this work.
    • Marshall: about github source for MOM under Breakaway Labs. Only custodian is Nic. Need some shared ownership of these codes? Move to GFDL? Justin: as an organisation BoM will look at the code more favourably if it is sitting under GFDL.

    Actions

    New:

    • Marshall will be away, Aidan to organise next meeting.
    • Update MOM source inside CM2 (Marshall).
    • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.

    Existing:

    • Add Peter’s CICE5.1 config to OceansAus github repo (Nic and Peter)
    • Port MOM5 build system to cmake (Aidan)
    • Push updated MATM code with JRA-55 support to OceansAus github (Aidan)
    • Get licensing for MOM5 input files (Marshall)
    • Work on hosting MOM5 input files on NCI THREDDS server (Marshall, Aidan)
    • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
    • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
    • Test Nic’s access-om model config on OceansAus (All)
    • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
    • Move to CICE5 on OceansAus repo (Nic).
    • Add new test cases to Jenkins test suite (Nic).
    • Aidan to provide Matt with location of tenth model test data. Check if capturing all the diagnostics Matt might be interested in.
    • Matt to provide Marshall with some test cases for the Xeon Phi test cases, maybe 1 deg configurations.

    Technical Working Group Meeting, October 2016

    Minutes

    Date: 11th October 2016
    Attendees:

    • Marshall Ward (NCI, Chair)
    • Aidan Heerdegen (ARCCSS)
    • Nick Hannah (ARCCSS/Breakaway Labs)
    • Justin Freeman and Mirko Velic (BoM)
    • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
    • Peter Dobrohotoff (CSIRO Aspendale)

    Code submissions

    • Mirko submitted major refactoring update to the nudging code for MOM. Three different options depending on namelist. One just sponge, one does nudging, another does adaptive nudging. Added instantaneous update from datafile. Wanted to reproduce the MOM4 behaviour. Tested, and now works. Was broken previously.
    • Can merge, but we need some testing on other coverage. Currently have a dozen test cases. Not sure any touch this, but will run them anyway. Justin suggests they provide a test case which covers some of these sections.
    • Nic asked that if possible functional and formatting changes be separate commits, as it makes approving pull requests much easier
    • Maybe not merge yet, but get testing working to cover this. Justin will look at adapting an existing MOM test case for this purpose.

    Exchange grids and smoothing

    • Justin was talking to Paul Sandery about exchange grids. An issue with tiling as a result of remapping. Was asking about how Russ implemented smoothing.
    • If you took interim fields end up with horrible pattern with convergence of winds with 1st or 2nd order remapping due to discontinuity. Russ wrote some code that does 2D smoothing within the surface boundary condition. Bypasses the exchange grid and used the flux exchange to native grids options(?). GFDL apply an interpolation when they read in via data override. So can use the data override to interpolate to the finer grid and can control this.
    • This is only a problem with conservative remapping with exchange grids.
    • Nic didn’t think this was a problem with standard MOM-SIS runs, but Russ said it should still be visible in the fluxes with coarse (1deg) forcing fields.
    • With ACCESS high res ocean the fields and fluxes are extremely blocky, so Nic smooths on the ice grid, before it comes into the ocean grid, on a tile by tile basis.
    • If you want local conservation, cannot get around this. In ACCESS can use linear interpolation and then post-process to get global conservation. Doesn’t work with local conservation.
    • Marshall suggested we have some test cases that don’t run the model but test coupling and fields
    • These effects most often seen when there is a big difference in resolution between model fields and input fields. Look at wind stress fields. Maybe some of the barotropic fields, height and definitely convergence in barotropic restart file.
    • Paul’s runs do not use conservative remapping. Don’t see the horrible features with some of the other schemes.
    • Nic: do we need a central document discussing this?

    OceanMaps 4

    • Justin is trying to prototype OceanMaps 4. Picking up on Paul Sandery’s work. He has been using MOM5-SIS and using bulk fluxes to link the models. Would like to standardise, or make these things available. Not sure how it connects to linkage project.
    • Nic felt it was good to know what Paul does. So far no code changes?.

    FMS

    • Aidan got a query from Dave Hutchinson, asking if latest version of FMS was included in the code on MOM5 repo. Marshall has updated FMS in the master branch to Ulm, but not to Verona, the latest version.
    • Move FMS to a submodule of MOM5 rather than manually included inline
    • Goal is for Rui Yang (NCI) to work on parallel netCDF in MOM5

    Model release naming and definition

    • Still an issue
    • Nic has put an access-om model on OceansAus. Has version controlled input files and code. Can be downloaded, compiled and run.

    CICE

    • ACCESS-OM models are using CICE4.?, but Peter is using 5.1.
    • There are many bug fixes and performance improvements in the version of CICE Nic has been working on that would be beneficial to Peter.
    • Peter is working on a refactoring of CICE5.1
    • We should align our work to the same version of CICE.
    • First step is for Peter’s version of CICE5.1 to be hosted on OceansAus and development work to be based from that so we can work together. Some discussion about the best way to do this.

    Actions

    • Just and Mirko work up test cases to cover the nudging code and give them to Nic.
    • Nic to add new test cases to Jenkins test suite.
    • Aidan to add mom-ocean.org and mom-ocean.org.au to uptime monitoring service (Uptime Robot).
    • Add Peter’s CICE5.1 config to OceansAus github repo
    • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
    • Marshall to move FMS to submodule of MOM5 github repo. Liase with Nic on implementation?
    • Others test Nic’s access-om model config on OceansAus

    Technical Working Group Meeting, September 2016

    Minutes

    Date: 6th September 2016
    Attendees:

    • Marshall Ward (NCI, Chair)
    • Aidan Heerdegen (ARCCSS)
    • Justin Freeman and Mirko Velic (BoM)
    • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
    • Peter Dobrohotoff (CSIRO Aspendale)

    General Discussion

    • Aidan described latest tenth model comparison runs, between GFDL50 and KDS75. Models are running well at an ocean time step of 450s, with excellent throughput on raijin. At 600s there is an instability related to a topographic feature off the northern tip of Severny Island. Some modification of the topography is required. Matt might be interested in looking at heat uptake and transport between the models.
    • Matt looking at surface parameterisations. Bulk formula etc. They have some warm biases that seem to be due to the choice of some of these parameters. Looking to optimise parameters to fix this.
    • Peter is working on ACCESS-CM2.. Working on UM 10.* model. Running Jules currently, but will run CABLE. Have run UM8.5/GA6 200+ years. Confusingly was also called ACCESS-CM2. 350 yrs on 0.25 ocean on GA6. All version 10.* versions are using the rose+cylc run architecture.

    Model release strategy

    • Need MOM releases tagged.
    • Marshall and Aidan in favour of MOM having a release strategy (slightly separate issue).
    • Justin felt that the COSIMA model is MOM+CICE+OASIS. Don’t want to tag MOM with COSIMA release names.
    • Can use a sub-module approach, bring in specific model revisions.
    • When available, Justin would like Nic’s latest model definition repo to be communicated to the TWG.

    New NCI hardware test

    • Marshall has done some very preliminary testing of MOM with a Knights Landing test cluster (~4k cores, 64 core / node Xeon Phi, 92GB/node). 1.3GHz cores. Faster interconnect (EDR v FDR). Supports AVX512 instruction set, so potentially double the number of floats/clock cycle. These are also lower power and cheaper, so could get many more cores than a traditional CPU architecture.
    • MOM is running, and it was very easy to do. Raijin binaries run fine, as does MPI.
    • Old binaries work fine. Ran 2.4x slower than raijin. As you would expect from clock speed alone.
    • Only ran 960 cores.
    • AVX512 enabled binary throws floating point errors a lot.

    Actions

    • Aidan to provide Matt with location of tenth model test data. Check if capturing all the diagnostics Matt might be interested in.
    • Matt to provide Marshall with some test cases for the Xeon Phi test cases, maybe 1 deg configurations.
    • Marshall and Aidan to look at COSIMA model release — liase with Nic Hannah.

    First meeting of the technical working group

    Minutes

    Date: 2nd August 2016
    Attendees:

    • Marshall Ward (NCI, Chair)
    • Aidan Heerdegen (ARCCSS)
    • Nic Hannah (ARCCSS/Breakaway Labs)
    • Justin Freeman (BoM)
    • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)

    Agreed to have monthly videoconference meetings.

    COSIMA model definitions on website

    • Who is the audience? Agreed should pitch it at novice researcher/student with no links to existing community
    • Should start by using ourselves as the audience, define them for our use and expand from there
    • First test case should be the ACCESS-OM-025 (Quarter degree MOM+CICE+OASIS-mct)
    • CICE is currently housed on CWS github repo
    • Agreed we need consolidated location for model code. Nic suggested a git repo pulling in model components using submodules. This makes it clean to install and allows for control of versions via explicit commits or tags
    • COSIMA name is taken on github, so agreed to use existing OceansAus group
    • Minimal technical detail on website, instead point to github repo

    Input data

    • Data store for model input files – decided to use the ua8 Ramadda location which is currently used for MOM5 test cases
    • Need data versioning

    Benchmarking

    • Need example outputs, and timings. Direct users to run the model and compare their outputs and timings to those supplied to verify model integrity and performance
    • Need continuous integration to test against these outputs. Nic currently running a dozen test cases on a Jenkins server for MOM5. Can add these tests too.
    • Essential to allow upgrading of model components with confidence they have not affected model integrity and performance.

    Tools/Utilities

    • Desirable to also have related tools and utilities on the OceansAus github site
    • Concerns that the site collect crufty tools, so make some standard for what constitutes a repo: utility, support and documentation?

    Model Support

    • How do we support these models? Decided the existing MOM mailing list would suffice for general user questions. Stephen Griffies has been interested in supporting a MOM-CICE configuration. Also github issues can be used for specific code issues.

    Mission Statement

    • Desirable to have a general mission statement about what the aims of the TWG
    • Any additional invitees? Perhaps Mirko from the BoM? Open invitation to anyone who wants to attend and contribute

    General technical discussion

    • Aidan described progress with improving vertical grid in MOM-SIS 0.1 deg config
    • Justin working on making his real time flythrough data-vis software available on raijin, Maybe use Russ’ BRAN2015 reanalysis dataset with recent El Nino
    • Matt is adding sea-ice to the OFAM grid and tuning parameters to reduce SST biases. Turning off neutral physics helped.
    • Ben Evans wants a WOMBAT test case, perhaps use the 0.25 deg runs Matt did with Paul Spence?
    • Marshall has updated FMS and he and Nic have used Jenkins testing to find and fix bugs. Close to being added to master branch of MOM5

    COSIMA website

    • A lot of enthusiasm for contributing blog posts to website from technical perspective.

    Actions

    • Aidan to add Nic to OceansAus group
    • Nic to create initial ACCESS-OM-025 repo with submodules for MOM5/CICE/OASIS-mct
    • Justin volunteered to test out initial model config as “new user”
    • Marshall to write draft mission statement