{"id":417,"date":"2017-08-24T14:57:34","date_gmt":"2017-08-24T04:57:34","guid":{"rendered":"http:\/\/cosima.org.au\/?p=417"},"modified":"2017-08-24T15:02:27","modified_gmt":"2017-08-24T05:02:27","slug":"technical-working-group-meeting-august-2017","status":"publish","type":"post","link":"https:\/\/cosima.org.au\/index.php\/2017\/08\/24\/technical-working-group-meeting-august-2017\/","title":{"rendered":"Technical Working Group Meeting, August 2017"},"content":{"rendered":"<h2>Minutes<\/h2>\n<p>Date: 8th August\u00a02017<br \/>\nAttendees:<\/p>\n<ul>\n<li>Marshall Ward (NCI, Chair)<\/li>\n<li>Aidan Heerdegen (ARCCSS ANU)<\/li>\n<li>Nicholas Hannah (ARCCSS\/Double Precision)<\/li>\n<li>Russ Fiedler and Matt Chamberlain (CSIRO Hobart)<\/li>\n<li>Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale)<\/li>\n<\/ul>\n<h3>COSIMA Models<\/h3>\n<ul>\n<li>Nic: Toy model is good for yes\/no type tests<\/li>\n<li>Invite James Munroe to the TWG meetings. He could be of assistance getting a test suit together.<\/li>\n<li>Nic: using\u00a0Issues interface on github has been very helpful. Hasn&#8217;t written emails and has answers to problems. Steve and Russ been super helpful. Marshall: good that users have been Issues. Nic: every time I go to write an email, can I make this a github issue? Dave Bi, Siobhan have useful input. Marshall: how do we get them onto github too? Russ: Arnold has used it. Hopefully Dave and Siobhan will jump on to it also.<\/li>\n<li>\n<div>Peter: using a different MOM. Nic and Peter to meet to sort this out.\u00a0Roger: can cice be on the same repo?\u00a0Nic + Peter will liase.<\/div>\n<\/li>\n<li>\n<div>Invite Arnold to these meetings.<\/div>\n<\/li>\n<li>Marshall: Bit repo issues. Surprised ACCESS-CM2 has bit reproducibility problems. Nic found 4 issues that affected this. Payu was missing restarts. Also found a couple of issues where there was a different code branch for first coupling time step. Red Sea and another one. Added checksums to all coupling fields. And tested all of these. Now restarts ok from restarts, but 3 or 4 time steps starts to diverge. Probably still some small issue from restarts.\u00a0Wasn&#8217;t just one problem. Some of these were Nic&#8217;s issue with restarts and payu. Wasn&#8217;t just one problem. Some were particular with Nic&#8217;s setup.\u00a0When it is solved can talk to CM guys. The general\u00a0pproach is useful though. Concentrate on coupling fields.<\/li>\n<li>\n<div>Pete: combing log files and looking for checksums and numbers to compare between the runs.<\/div>\n<\/li>\n<li>\n<div>Nic: once I get through all this I can talk to Peter about the method and use it to work on Peter&#8217;s issues.<\/div>\n<\/li>\n<li>\n<div>Marshall: was Red Sea confirmed to be a repro issue? Did Russ fix it? Russ: Aspendale code was fixed. Marshall: was fixed in Aspendale but not main repo? Yes. Marshall: Arnold knew about that.\u00a0Peter: we had the fix in that case and hadn&#8217;t shared it.<\/div>\n<div><\/div>\n<\/li>\n<\/ul>\n<h3>CMIP6<\/h3>\n<ul>\n<li>Peter talked to David Karoly. Wondering about spin-up for CMIP6. 500-1000yr spinup. Is it possible to spin up ocean first by forcing an Ocean\/Ice model with JRA?\u00a0Then add CM when it is ready. Ice issues might make big difference to stratification. Aidan: model drift issues won&#8217;t help. Marshall: can you stabilise stratification with OM run? Russ: deep stuff takes thousands of years to get into steady state. Marshall: ask this at a MOM meeting? Russ: ask Ocean modellers.<\/li>\n<li>What is CMIP schedule? Peter+ Roger: about six months behind. Start production run early next year. Still working on configuration. Coupled cable model running now. Reproducibility issue has become more issues\u00a0which need to be nailed down.<\/li>\n<li>Marshall: when do you feel like you need to fix source\/versions? Roger: delaying until the end of the year. How long would a 500 year 1 deg MOM spin up take? Nic: 0.5 hour\/year. 50 years\/day without queuing issues. Have to take crashing into account.\u00a0Marshall: does 1 deg crash? Nic: probably not.<\/li>\n<li>Nic: created issue recently, wanted to 50 years in single submit. Memory leaks limit how long you can run. Maybe only 3 years.<\/li>\n<li>Aidan: Should add multi-year runs per submission ability to payu.<\/li>\n<\/ul>\n<h3>Bathymetry<\/h3>\n<ul>\n<li>Aidan: how do you deal with non-advective cells? Russ: it is potholes with no advective velocity possible. If you allow cells to fill if they&#8217;re too thin, can create cells that have no velocities.<\/li>\n<li>Russ to add his code to OceansAus repo.<\/li>\n<\/ul>\n<div>\n<h3>New HPC<\/h3>\n<\/div>\n<ul>\n<li>Marshall: Tender for new machine. Understand current limits of codes, and if new machine will work, and what we need to get more performance. Convinced MOM is a RAM bound code. Vectorisation is not making a difference. Want a machine with more RAM bandwidth, not more vectorisation. Away from KNL and SkyLake, towards IBM power and AMD.<\/li>\n<li>Peter: Met Office XC40 can run coupled model with 48*24 processors. They are 32 processor \u00a0Broadwell nodes. Marshall: maybe running more threads? Roger: they run 2 threads.<\/li>\n<li>Marshall: \u00a0bring errors that stop it working. Roger: ok, will get some info together.<\/li>\n<li>Marshall: incorrect Message Parsing and halo understanding. MPI messages in MOM5 are healthy. Get GB\/s bandwidth, even corners. Problems are related to library or load imbalance, or maybe CPU throttling. We are doing a reasonable job of MPI. Faster interconnect may be useful. Broadwell is 10-15% faster, as it has faster interconnect.<\/li>\n<\/ul>\n<h3>Actions<\/h3>\n<p>New:<\/p>\n<ul>\n<li>Invite Arnold Sullivan and James Munroe to TWG meetings.<\/li>\n<li>Add feature request to payu: multiple runs per submission<\/li>\n<li>Ask MOM Ocean meeting about 1000yr OM spin-up possibility<\/li>\n<li>Russ to add all his ocean bathymetry code to OceansAus repo.<\/li>\n<\/ul>\n<p>Existing:<\/p>\n<ul>\n<li>Aidan investigate tenth degree MOM configs for benchmarks.<\/li>\n<li>Possible bench-mark configs (everyone)<\/li>\n<li>Nic to help Peter get his MOM repo up to date with MOM5 master branch, and then merge changes<\/li>\n<li>Look into OpenDAP\/THREDDS for use with MOM on raijin (Aidan, Nic, Marshall)<\/li>\n<li>Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.<\/li>\n<li>Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future<\/li>\n<li>Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?<\/li>\n<li>Test Nic&#8217;s access-om model config on OceansAus (All)<\/li>\n<li>Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.<\/li>\n<li>Add new test cases to Jenkins test suite (Nic).<\/li>\n<li>Start a new google doc about coupler issues and MATM (Marshall)<\/li>\n<li>Ask Dale Roberts about effects of OpenMP for Roger (Marshall)<\/li>\n<li>Make a proper plan for model release \u2014 discuss at COSIMA meeting. Ask students\/researchers what they need to get started with a model (Marshall and TWG)<\/li>\n<li>Blog post around issues with high core count jobs and mxm mtl (Nic)<\/li>\n<li>Do longer runs with\u00a0Nic&#8217;s 1 deg and 0.25 deg ACCESS-OM2-JRA55 configs (Andy and Aidan)<\/li>\n<li>Try repeat year forcing with Nic&#8217;s configurations (Nic and Andy)<\/li>\n<li>Create document outlining options for configuration sharing (?)<\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Minutes Date: 8th August\u00a02017 Attendees: Marshall Ward (NCI, Chair) Aidan Heerdegen (ARCCSS ANU) Nicholas Hannah (ARCCSS\/Double Precision) Russ Fiedler and Matt Chamberlain (CSIRO Hobart) Peter Dobrohotoff and Roger Bodman (CSIRO Aspendale) COSIMA Models Nic: Toy model is good for yes\/no type tests Invite James Munroe to the TWG meetings. He could be of assistance getting&hellip;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[4,3],"_links":{"self":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts\/417"}],"collection":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/comments?post=417"}],"version-history":[{"count":2,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts\/417\/revisions"}],"predecessor-version":[{"id":419,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts\/417\/revisions\/419"}],"wp:attachment":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/media?parent=417"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/categories?post=417"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/tags?post=417"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}