Technical Working Group Meeting, December 2017

Minutes

Date: 14th December 2017
Attendees:

  • Marshall Ward (Chair) (NCI)
  • Aidan Heerdegen, Andrew Kiss (ARCCSS/ARCClEx ANU)
  • Fanghua Wu (National Climate Center, China Meteorological Administration, Visitor ANU)
  • James Munroe (Memorial University of Newfoundland, Visitor ANU)
  • Russ Fiedler (CSIRO Hobart)

 

Output file metadata indexing

  • MAS database at NCI. POSIX info. ncdump blob. nodal style. Can put index on netcdf files and search by them.
  • James did similar thing for COSIMA cookbook running in user space. James has had no action on MAS DB so far.
  • Currently NCI is POSIX crawling hh5. James: need to switch on netcdf for certain directories.
  • James: Ben pitched MAS as a great innovation. Maybe Andy needs to formally ask Ben for this?
  • What can MAS deliver that existing DB cannot? James: stopped developing DB because of MAS. SQLite was 40-50K vars/files. Spinup of 1 deg model have 1M+ variables/metadata. SQLite already 1-2GB. Only scales to 1M rows. Can’t deploy postgres without admin access. Could host one, but should live on NCI resource. Makes sense to MAS.
  • James: just a user role in DB and switching on netcdf indexing — should be fine. Marshall will follow up with NCI MAS bods to make sure this happens soon.
  • Andrew was concerned that this will have on-going support. Use of MAS in other high profile projects (geoscience australia for example) means this is a critical piece of infrastructure.
  • James: can we just access their schema? Want to open source, not sure how. James: NCI has confluence, do they have bitbucket license? Marshall: no.
  • Need mom.out copied to hh5 also to be able index important info with f90nml. Russ: logfile has just namelist info.
  • Andrew: any equivalent for CICE and MATM? Maybe not? Andrew: need to make CICE and MATM print out namelists.
  • Marshall: get Ben/Andy to endorse official use of MAS by CoE.
  • Marshall: do we need to add attributes to files to accomodate this? James: does the executable spit out a version string? No. Marshall his build script puts a version string. Russ: version part of FMS? Russ: Marshall took version out when moved to oom version of FMS. See Issue #31 on GitHub (can’t find issue Russ refers to). Russ: already have a version.c
  • James: CSIRO wants some of the automated processing for decadal prediction. Can we apply to both?
  • James: make a MOM module? Marshall: make codebase a submodule of payu
  • Aidan talks about reproducible builds using spack. Reproducible builds require a package manager so that it can find and know about all the components of the build.
  • Marshall will put hashes in executable in MOM.

COSIMA Models

  • Andrew tenth degree runs: salinity crashes in the arctic. Recent crash: MPI Abort error code 111. Resubmit? Use broadwell
  • Andrew: has added regional runoff caps. Tighter caps in arctic rivers.
  • Paul Spence issue with regional outputs, had incorrect bounds. Might affect in future. High temporal resolution in small regions.
  • Russ: weird happened a while ago. Mixing velocity and tracer grids in a single file? At least for regional output. Mixing u and t grids? — Aidan look into it.
  • Migrating to FMS submodule. When Marshall updated to oom one of the open boundary cases broke. Took 2-3 weeks of scientific coding to fix.
  • Russ looking at CM2.5 and new FMS. AM4 has been released.
  • Marshall: will make FMS a submodule.  This works for decadal prediction people who will need this work done in any case.
  • COSIMA will do JRA55 IAF tenth run.
Wednesday meetings next year. 11.30am.

Actions

New:

  • CICE and MATM need to output namelists for metadata crawling (no-one assigned)
  • Get Ben/Andy to endorse provision of MAS to CoE (no-one assigned)
  • Make MOM (and other models) emit GitHub version hash (Marshall)
  • Collation errors on regional outputs (Aidan)
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Follow up with NCI MAS people (Marshall)

Existing:

  • Send link to spinup diagnostics spreadsheet to Russ (Andrew Kiss)
  • Nic add MPI barrier before ice halo updates timer to check if slow timing issues are just ice load imbalances that appear as longer times due to synchronisation.
  • Test Andy’s 5 year config with different netcdf library versions to check MATM error is not a just a library issue (Aidan)
  • Check current sea surface salinity restoring smoothing (Aidan)
  • Russ to add all his ocean bathymetry code to OceansAus repo.
  • Nic to help Peter get his MOM repo up to date with MOM5 master branch, and then merge changes
  • Look into OpenDAP/THREDDS for use with MOM on raijin (Aidan, Nic, Marshall)
  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)
  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Create document outlining options for configuration sharing (?)