Technical Working Group Meeting, September 2016

Minutes

Date: 6th September 2016
Attendees:

  • Marshall Ward (NCI, Chair)
  • Aidan Heerdegen (ARCCSS)
  • Justin Freeman and Mirko Velic (BoM)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)
  • Peter Dobrohotoff (CSIRO Aspendale)

General Discussion

  • Aidan described latest tenth model comparison runs, between GFDL50 and KDS75. Models are running well at an ocean time step of 450s, with excellent throughput on raijin. At 600s there is an instability related to a topographic feature off the northern tip of Severny Island. Some modification of the topography is required. Matt might be interested in looking at heat uptake and transport between the models.
  • Matt looking at surface parameterisations. Bulk formula etc. They have some warm biases that seem to be due to the choice of some of these parameters. Looking to optimise parameters to fix this.
  • Peter is working on ACCESS-CM2.. Working on UM 10.* model. Running Jules currently, but will run CABLE. Have run UM8.5/GA6 200+ years. Confusingly was also called ACCESS-CM2. 350 yrs on 0.25 ocean on GA6. All version 10.* versions are using the rose+cylc run architecture.

Model release strategy

  • Need MOM releases tagged.
  • Marshall and Aidan in favour of MOM having a release strategy (slightly separate issue).
  • Justin felt that the COSIMA model is MOM+CICE+OASIS. Don’t want to tag MOM with COSIMA release names.
  • Can use a sub-module approach, bring in specific model revisions.
  • When available, Justin would like Nic’s latest model definition repo to be communicated to the TWG.

New NCI hardware test

  • Marshall has done some very preliminary testing of MOM with a Knights Landing test cluster (~4k cores, 64 core / node Xeon Phi, 92GB/node). 1.3GHz cores. Faster interconnect (EDR v FDR). Supports AVX512 instruction set, so potentially double the number of floats/clock cycle. These are also lower power and cheaper, so could get many more cores than a traditional CPU architecture.
  • MOM is running, and it was very easy to do. Raijin binaries run fine, as does MPI.
  • Old binaries work fine. Ran 2.4x slower than raijin. As you would expect from clock speed alone.
  • Only ran 960 cores.
  • AVX512 enabled binary throws floating point errors a lot.

Actions

  • Aidan to provide Matt with location of tenth model test data. Check if capturing all the diagnostics Matt might be interested in.
  • Matt to provide Marshall with some test cases for the Xeon Phi test cases, maybe 1 deg configurations.
  • Marshall and Aidan to look at COSIMA model release — liase with Nic Hannah.