Technical Working Group Meeting, February 2017

Minutes

Date: 14th February 2017
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen, Scott Wales (ARCCSS)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Justin Freeman and Mirko Velic (BoM)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff (CSIRO Aspendale)

COSIMA Models

  • Nic is working on a COSIMA configuration with 1/10th degree ocean. Currently setting up all the files for a coupled ice/ocean. Using the existing ARCCSS 1/10th degree MOM5 ocean, but there is no 1/10th degree CICE. Have not previously coupled 1/10th MOM5 with OASIS. Plan is to use JRA-55 forcing.
  • OASIS won’t work as it has in the past, creating remapping weights and regridding forcings on to a 1/10th grid. Too slow. Currently takes 2-3 hours for the 0.25 deg grid. They use SCRIP for regridding, which has limitations: uniprocessor, conservative remapping not accurate in tripolar, Not quantified. Not too bad for 1/0.25 deg.
  • New method is to use ESMF. Doesn’t do 2nd order conservative remapping. First order conservative ok for now. Maybe revisit. Nic has set up test and evaluation cases to compare SCRIP and ESMF.
  • OASIS isn’t that great at regridding/remapping. If we’re using another tool for remapping, what is OASIS doing for us? Nic has written an OASIS replacement: TANGO. Simple and basic as possible. Uses ESMF. Two step process: generate regridding weights, and then distribute fields. Might be faster than OASIS, but performance isn’t a big issue for this.
  • Nic is writing a model configuration development toolchain? Supports different models. Documented here.
  • Justin agreed that OASIS is poorly designed, difficult to work with and not well liked. Suggestions to replace it have met with resistance from BoM. Might be able to push something under the TWG umbrella.
  • Aidan unhappy with MATM. Code has been changed ad-hoc to support new forcing datasets. Nic is happy to rewrite MATM it if Andy Hogg wants to. TANGO supports python/fortran/C++. Maybe re-write MATM in python? Not performance critical. Aidan not a fan of introducing a new dependency on python.
  • Aidan: working on ACCESS-OM2-025+JRA55. Problems with generating iced restart files.
  • Marshall: Working on CM2 with Peter’s help. ROSE suite needs some work, but in a reasonable state. Just profiling. Slight bug in OASIS restart file generation. Model would hang occasionally. Just a random hang. Due to a bug in MOM that Marshall had fixed. Source code had been reverted to a version that was not patched. Dave Bi had earlier described a bug that might be the same one?
  • Marshall has Profiler working in all 3 sub models. Some prelim numbers.
  • Need to update MOM source inside CM2. Use main trunk version and see what blows up. Marshall and Peter to work offline on getting their version up to date.
  • Do we need ACCESS branch on MOM repo? Definitely want everyone on the same repo as there are issues with some people using out of data code. Can we have more rigorous tagging of bug fixes for example? Which would allow bug-fixes to be incorporated in other versions. Nic thought that now we have better communication this may not such a problem?

Updates

  • Peter is working on ACCESS-CM2 for CMIP6. Currently focussing on global atmosphere. MetOffice are atmosphere only, and incorporate into their coupled model. GA7.1 is next target for ACCESS-CM2. Not currently with CABLE. Prelim testing with GA7.1. PI will wait for UM10.6 for correct aerosol coding. Martin is focussing on 10.6. Won’t have a version of CM2 with 10.6 for a while. CABLE group working with 10.5, and always difficulty with incorporating CABLE in UM versions. CSIRO still doing present day in 10.3.
  • Russ found a resolution error when packing data that ended up destroying a 20 month assimilation run. Will be a complete spin up when restarted
  • Justin has been doing a lot of interpolation. Puddles natural neighbour library. Been using it a lot. Very flexible and nice library
  • Matt has provided WOMBAT code to Paul Spence so he can recompile with new libraries and with flags appropriate for new broadwell hardware on raijin.
  • Since shutdown Matt has also recompiled. Asks for 850 cpus, 6-7 hours / year. When processor count > 1.5K crashes too often.
  • Marshall: NCI trouble shooted 1.8.4 problems. MPI reduction operations no longer work 1.8.4.
  • Marshall: openmpi 2.0 is slower than 1.10.2.
  • Aidan reiterated: don’t use -03 with intel MPI.
  • Justin: BoM internal meeting regarding COSIMA project. All signed off. Developments over 1-2 year timeframe. Wavewatch 3 coupled into COSIMA code. Stefan and Mirko will be looking at that. Would like some feedback from TWG about assessing it, and getting back to the community. MOM+wavewatch coupled initially as a technical demo. Wavewatch 3 will be part of COSIMA. GFDL is interested in Wavewatch. Stephen Griffies was curious. People have coupled wave watch with MOM in past, so know it is possible. Maybe OASIS-MCT, or OASIS3. These were climate scale runs. Mark Hannah’s postdoc did this work.
  • Marshall: about github source for MOM under Breakaway Labs. Only custodian is Nic. Need some shared ownership of these codes? Move to GFDL? Justin: as an organisation BoM will look at the code more favourably if it is sitting under GFDL.

Actions

New:

  • Marshall will be away, Aidan to organise next meeting.
  • Update MOM source inside CM2 (Marshall).
  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.

Existing:

  • Add Peter’s CICE5.1 config to OceansAus github repo (Nic and Peter)
  • Port MOM5 build system to cmake (Aidan)
  • Push updated MATM code with JRA-55 support to OceansAus github (Aidan)
  • Get licensing for MOM5 input files (Marshall)
  • Work on hosting MOM5 input files on NCI THREDDS server (Marshall, Aidan)
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Move to CICE5 on OceansAus repo (Nic).
  • Add new test cases to Jenkins test suite (Nic).
  • Aidan to provide Matt with location of tenth model test data. Check if capturing all the diagnostics Matt might be interested in.
  • Matt to provide Marshall with some test cases for the Xeon Phi test cases, maybe 1 deg configurations.

Technical Working Group Meeting, December 2016

Minutes

Date: 13th December 2016
Attendees:

  • Marshall Ward (NCI, Chair)
  • Nicholas Hannah (ARCCSS/Double Precision)
  • Justin Freeman and Mirko Velic (BoM)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff (CSIRO Aspendale)

Updates

  • Justin met with Gary Brassington to discuss project goals (wrt MOM?)
  • What is ACCESS-OM repo in OceansAus, should BoM use it?
  • Need to sort out the input data hosting
  • Get KDS75 to Justin
  • Russ is looking at KDS75 layer. Considering a much shallower coastling in topog.nc?

CSIRO/CICE

  • Andy Hogg insists we must use CICE 5
  • Nic is working to update, but it is not a priority
  • Peter will pass to Nic, to host at OceansAus (Marshall volunteered to work on this during ACCESS-CM profiling)

Profiling

  • Parallell NetCDF4 (pHDF5) implementation in MOM 5
  • Working, but performance is very slow (Since meeting: Performance is now comparable to MOM 5, other issues to sort out)
  • Vectorisation issues in MOM 5:
    1. 40×40 grid tiles too large to fit in L1, performance is mem bound (either higher cache or RAM)
    2. Small tiles (e.g. 6×6) fit in L1 but performance continues to be low
    3. gdb trace of asssembly shows frequent jumps outside of small loops constraining performance
  • Justin suggests long tiles in x-direction, Nic suggested 32-bit repr

MOM6

  • Nic has implemented MOM 6 automated testing and bug tracking
  • Example tests: Output invariance to field transpose and rotations, esp. wrt arithmetic associativity

Actions

New:

  • Get MOM01 KDS75 config to Justin (?)
  • Update MOM source inside CM2 (Marshall).

Existing:

  • Add Peter’s CICE5.1 config to OceansAus github repo (Nic and Peter)
  • Port MOM5 build system to cmake (Aidan)
  • Push updated MATM code with JRA-55 support to OceansAus github (Aidan)
  • Get licensing for MOM5 input files (Marshall)
  • Work on hosting MOM5 input files on NCI THREDDS server (Marshall, Aidan)
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (All)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Move to CICE5 on OceansAus repo (Nic).
  • Add new test cases to Jenkins test suite (Nic).
  • Aidan to provide Matt with location of tenth model test data. Check if capturing all the diagnostics Matt might be interested in.
  • Matt to provide Marshall with some test cases for the Xeon Phi test cases, maybe 1 deg configurations.

Technical Working Group Meeting, November 2016

Minutes

Date: 8th November 2016
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen, Scott Wales (ARCCSS)
  • Nick Hannah (ARCCSS/Breakaway Labs)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff (CSIRO Aspendale)

COSIMA Models

  • Peter reiterated ACCESS-OM2 uses CICE5.
  • COSIMA Ocean models should share code with ACCESS Coupled models. Need to upgrade ACCESS-OM2-025 to CICE5.
  • Dave Bi expressed interest to run ACCESS-OM models in future. Also happy to have CICE5 hosted on OceansAus github repo and work from there. Need to check licensing and merge codebases. Nic and Peter will work together to do this.
  • Aidan felt there needed to be an understood workflow for developing directly from the OceansAus CICE repo: develop in branches and pull to master?
  • Peter uses Rose/cylc and can pull code automatically from any repo.
  • Aidan tested ACCESS-OM-025 config on the repo. Worked well out of the box, but made some cosmetic changes to instructions to make it easier to use.
  • Aidan has also ported this configuration to use Marshall’s payu run tool, and ran with JRA-55 forcing for Ocean Heat uptake project with Susan Wijffels.
  • Nic concerned that MATM will not work with JRA-55 without code changes. Russ assisted Fabio in making some changes to MATM, will talk to Aidan offline about those.
  • Discussion about the flat file structure of the current COSIMA model. No strong feelings either way, so keep it flat
  • Discussion about making more general changes to the model code to improve input/output specification to make them easier to run. Marshall not keen on this idea for other codes because of deviating too much from “standard”, but MATM is “ours” so should be modified as much as we like. Aidan suggested “light touch” modifications, utilising optional namelist or pre-processor options as a compromise solution.
  • Discussion about what platforms we should support. Want it to be used in as many places as possible, so work towards that goal. First step is for Aidan to complete cmake support for MOM. Use this as a test for extension to other models.
  • Need a robust way of hosting the input files, which currently run to many GB. There are licensing issues. Talk to GFDL about their license, liase with data people to host the data set through NCI THREDDS server. Maybe grab as much data as possible from host institutions to avoid licensing issues.
  • Need a useable solution in the interim until licensing and hosting sorted out.

Actions

New:

  • Add Peter’s CICE5.1 config to OceansAus github repo (Nic and Peter)
  • Port MOM5 build system to cmake (Aidan)
  • Push updated MATM code with JRA-55 support to OceansAus github (Aidan)
  • Get licensing for MOM5 input files (Marshall)
  • Work on hosting MOM5 input files on NCI THREDDS server (Marshall, Aidan)

Existing:

  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Test Nic’s access-om model config on OceansAus (Others)
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Move to CICE5 on OceansAus repo (Nic).
  • Add new test cases to Jenkins test suite (Nic).
  • Aidan to provide Matt with location of tenth model test data. Check if capturing all the diagnostics Matt might be interested in.
  • Matt to provide Marshall with some test cases for the Xeon Phi test cases, maybe 1 deg configurations.

Technical Working Group Meeting, October 2016

Minutes

Date: 11th October 2016
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS)
  • Nick Hannah (ARCCSS/Breakaway Labs)
  • Justin Freeman and Mirko Velic (BoM)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff (CSIRO Aspendale)

Code submissions

  • Mirko submitted major refactoring update to the nudging code for MOM. Three different options depending on namelist. One just sponge, one does nudging, another does adaptive nudging. Added instantaneous update from datafile. Wanted to reproduce the MOM4 behaviour. Tested, and now works. Was broken previously.
  • Can merge, but we need some testing on other coverage. Currently have a dozen test cases. Not sure any touch this, but will run them anyway. Justin suggests they provide a test case which covers some of these sections.
  • Nic asked that if possible functional and formatting changes be separate commits, as it makes approving pull requests much easier
  • Maybe not merge yet, but get testing working to cover this. Justin will look at adapting an existing MOM test case for this purpose.

Exchange grids and smoothing

  • Justin was talking to Paul Sandery about exchange grids. An issue with tiling as a result of remapping. Was asking about how Russ implemented smoothing.
  • If you took interim fields end up with horrible pattern with convergence of winds with 1st or 2nd order remapping due to discontinuity. Russ wrote some code that does 2D smoothing within the surface boundary condition. Bypasses the exchange grid and used the flux exchange to native grids options(?). GFDL apply an interpolation when they read in via data override. So can use the data override to interpolate to the finer grid and can control this.
  • This is only a problem with conservative remapping with exchange grids.
  • Nic didn’t think this was a problem with standard MOM-SIS runs, but Russ said it should still be visible in the fluxes with coarse (1deg) forcing fields.
  • With ACCESS high res ocean the fields and fluxes are extremely blocky, so Nic smooths on the ice grid, before it comes into the ocean grid, on a tile by tile basis.
  • If you want local conservation, cannot get around this. In ACCESS can use linear interpolation and then post-process to get global conservation. Doesn’t work with local conservation.
  • Marshall suggested we have some test cases that don’t run the model but test coupling and fields
  • These effects most often seen when there is a big difference in resolution between model fields and input fields. Look at wind stress fields. Maybe some of the barotropic fields, height and definitely convergence in barotropic restart file.
  • Paul’s runs do not use conservative remapping. Don’t see the horrible features with some of the other schemes.
  • Nic: do we need a central document discussing this?

OceanMaps 4

  • Justin is trying to prototype OceanMaps 4. Picking up on Paul Sandery’s work. He has been using MOM5-SIS and using bulk fluxes to link the models. Would like to standardise, or make these things available. Not sure how it connects to linkage project.
  • Nic felt it was good to know what Paul does. So far no code changes?.

FMS

  • Aidan got a query from Dave Hutchinson, asking if latest version of FMS was included in the code on MOM5 repo. Marshall has updated FMS in the master branch to Ulm, but not to Verona, the latest version.
  • Move FMS to a submodule of MOM5 rather than manually included inline
  • Goal is for Rui Yang (NCI) to work on parallel netCDF in MOM5

Model release naming and definition

  • Still an issue
  • Nic has put an access-om model on OceansAus. Has version controlled input files and code. Can be downloaded, compiled and run.

CICE

  • ACCESS-OM models are using CICE4.?, but Peter is using 5.1.
  • There are many bug fixes and performance improvements in the version of CICE Nic has been working on that would be beneficial to Peter.
  • Peter is working on a refactoring of CICE5.1
  • We should align our work to the same version of CICE.
  • First step is for Peter’s version of CICE5.1 to be hosted on OceansAus and development work to be based from that so we can work together. Some discussion about the best way to do this.

Actions

  • Just and Mirko work up test cases to cover the nudging code and give them to Nic.
  • Nic to add new test cases to Jenkins test suite.
  • Aidan to add mom-ocean.org and mom-ocean.org.au to uptime monitoring service (Uptime Robot).
  • Add Peter’s CICE5.1 config to OceansAus github repo
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Marshall to move FMS to submodule of MOM5 github repo. Liase with Nic on implementation?
  • Others test Nic’s access-om model config on OceansAus

Technical Working Group Meeting, September 2016

Minutes

Date: 6th September 2016
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS)
  • Justin Freeman and Mirko Velic (BoM)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff (CSIRO Aspendale)

General Discussion

  • Aidan described latest tenth model comparison runs, between GFDL50 and KDS75. Models are running well at an ocean time step of 450s, with excellent throughput on raijin. At 600s there is an instability related to a topographic feature off the northern tip of Severny Island. Some modification of the topography is required. Matt might be interested in looking at heat uptake and transport between the models.
  • Matt looking at surface parameterisations. Bulk formula etc. They have some warm biases that seem to be due to the choice of some of these parameters. Looking to optimise parameters to fix this.
  • Peter is working on ACCESS-CM2.. Working on UM 10.* model. Running Jules currently, but will run CABLE. Have run UM8.5/GA6 200+ years. Confusingly was also called ACCESS-CM2. 350 yrs on 0.25 ocean on GA6. All version 10.* versions are using the rose+cylc run architecture.

Model release strategy

  • Need MOM releases tagged.
  • Marshall and Aidan in favour of MOM having a release strategy (slightly separate issue).
  • Justin felt that the COSIMA model is MOM+CICE+OASIS. Don’t want to tag MOM with COSIMA release names.
  • Can use a sub-module approach, bring in specific model revisions.
  • When available, Justin would like Nic’s latest model definition repo to be communicated to the TWG.

New NCI hardware test

  • Marshall has done some very preliminary testing of MOM with a Knights Landing test cluster (~4k cores, 64 core / node Xeon Phi, 92GB/node). 1.3GHz cores. Faster interconnect (EDR v FDR). Supports AVX512 instruction set, so potentially double the number of floats/clock cycle. These are also lower power and cheaper, so could get many more cores than a traditional CPU architecture.
  • MOM is running, and it was very easy to do. Raijin binaries run fine, as does MPI.
  • Old binaries work fine. Ran 2.4x slower than raijin. As you would expect from clock speed alone.
  • Only ran 960 cores.
  • AVX512 enabled binary throws floating point errors a lot.

Actions

  • Aidan to provide Matt with location of tenth model test data. Check if capturing all the diagnostics Matt might be interested in.
  • Matt to provide Marshall with some test cases for the Xeon Phi test cases, maybe 1 deg configurations.
  • Marshall and Aidan to look at COSIMA model release — liase with Nic Hannah.

First meeting of the technical working group

Minutes

Date: 2nd August 2016
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS)
  • Nic Hannah (ARCCSS/Breakaway Labs)
  • Justin Freeman (BoM)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)

Agreed to have monthly videoconference meetings.

COSIMA model definitions on website

  • Who is the audience? Agreed should pitch it at novice researcher/student with no links to existing community
  • Should start by using ourselves as the audience, define them for our use and expand from there
  • First test case should be the ACCESS-OM-025 (Quarter degree MOM+CICE+OASIS-mct)
  • CICE is currently housed on CWS github repo
  • Agreed we need consolidated location for model code. Nic suggested a git repo pulling in model components using submodules. This makes it clean to install and allows for control of versions via explicit commits or tags
  • COSIMA name is taken on github, so agreed to use existing OceansAus group
  • Minimal technical detail on website, instead point to github repo

Input data

  • Data store for model input files – decided to use the ua8 Ramadda location which is currently used for MOM5 test cases
  • Need data versioning

Benchmarking

  • Need example outputs, and timings. Direct users to run the model and compare their outputs and timings to those supplied to verify model integrity and performance
  • Need continuous integration to test against these outputs. Nic currently running a dozen test cases on a Jenkins server for MOM5. Can add these tests too.
  • Essential to allow upgrading of model components with confidence they have not affected model integrity and performance.

Tools/Utilities

  • Desirable to also have related tools and utilities on the OceansAus github site
  • Concerns that the site collect crufty tools, so make some standard for what constitutes a repo: utility, support and documentation?

Model Support

  • How do we support these models? Decided the existing MOM mailing list would suffice for general user questions. Stephen Griffies has been interested in supporting a MOM-CICE configuration. Also github issues can be used for specific code issues.

Mission Statement

  • Desirable to have a general mission statement about what the aims of the TWG
  • Any additional invitees? Perhaps Mirko from the BoM? Open invitation to anyone who wants to attend and contribute

General technical discussion

  • Aidan described progress with improving vertical grid in MOM-SIS 0.1 deg config
  • Justin working on making his real time flythrough data-vis software available on raijin, Maybe use Russ’ BRAN2015 reanalysis dataset with recent El Nino
  • Matt is adding sea-ice to the OFAM grid and tuning parameters to reduce SST biases. Turning off neutral physics helped.
  • Ben Evans wants a WOMBAT test case, perhaps use the 0.25 deg runs Matt did with Paul Spence?
  • Marshall has updated FMS and he and Nic have used Jenkins testing to find and fix bugs. Close to being added to master branch of MOM5

COSIMA website

  • A lot of enthusiasm for contributing blog posts to website from technical perspective.

Actions

  • Aidan to add Nic to OceansAus group
  • Nic to create initial ACCESS-OM-025 repo with submodules for MOM5/CICE/OASIS-mct
  • Justin volunteered to test out initial model config as “new user”
  • Marshall to write draft mission statement