{"id":687,"date":"2019-05-22T16:26:56","date_gmt":"2019-05-22T06:26:56","guid":{"rendered":"http:\/\/cosima.org.au\/?p=687"},"modified":"2019-05-22T16:26:56","modified_gmt":"2019-05-22T06:26:56","slug":"technical-working-group-meeting-may-2019","status":"publish","type":"post","link":"https:\/\/cosima.org.au\/index.php\/2019\/05\/22\/technical-working-group-meeting-may-2019\/","title":{"rendered":"Technical Working Group Meeting, May 2019"},"content":{"rendered":"<div>\n<h2>Minutes<\/h2>\n<p>Date: 15th May, 2019<br \/>\nAttendees:<\/p>\n<ul>\n<li>Aidan Heerdegen (AH) CLEX, Andrew Kiss (AK) \u00a0COSIMA, ANU<\/li>\n<li>Marshall Ward (MW)\u00a0GFDL<\/li>\n<li>Russ Fiedler (RF), Matt Chamberlain(MC) CSIRO Hobart<\/li>\n<li>Nic Hannah \u00a0(NH) Double Precision<\/li>\n<li>Rui Yang (RY) NCI<\/li>\n<\/ul>\n<\/div>\n<h3>Agenda<\/h3>\n<div><\/div>\n<div>&#8211; Follow up on migrating FMS to an external library<\/div>\n<div>&#8211; WOMBAT in harmonised MOM update and testing<\/div>\n<div>&#8211; Tenth load balancing<\/div>\n<div>&#8211; CICE IO bound in high core counts<\/div>\n<div><\/div>\n<h3>CICE IO bound in high core counts<\/h3>\n<div><\/div>\n<div>AK: Runs with new CICE executables NH compiled a while ago. Performance slowdown with compression level 5. Tested with level1 few % larger in size, 2500s -&gt; 1800s for IO time. 1300s without compression. Compresses well with low value because a lot of missing data with ice.<\/div>\n<div><\/div>\n<div>NH: Went from netCDF3 to netCDF4. Might be worth trying no compression. AK: have a run with compression level zero. RF:\u00a0Does impact on\u00a0walltime. MOM is waiting. Usually have CICE waiting on MOM, but when\u00a0outputting is the other way. MW: Compressing MOM before, now both? NH: Compressing and daily output an issue. AH: What is the chunking? RF: Uses default. AH: Some libraries chose weird values for time value? RF: No funny business, all sensible. RF: All these point to point gather, maybe not efficient. MW: Do you know where the time taken is? RF: Slowdown, but not sure split between gather and write. NH: Breaking new ground, daily output and running at scale, and unusual tile distribution. Increases the COMMS to gather. So many different new things. MW: On\u00a0sect robin still? AH: 10% of total runtime.<\/div>\n<div><\/div>\n<div>NH: With MOM do all this with post-processing to get performance of model best as possible.\u00a0Anything we do slowing model as whole, should post-process.\u00a0Didn\u2019t\u00a0think about that option when put change in. If slowing down as a whole, back out change and work out post-processing step. AK: Half the data in daily files is static. Totally unnecessary. Made issue to maybe output static\u00a0data to a file once. RF: Aggregate daily files to\u00a0monthly? AK: Slows down output from model. Less compressible? RF: Highly correlated, will compress easily. AH:\u00a0How much extra wait time? RF: The whole write time. AK: 25 or 18% in MOM runtime. AH: Monthly output issue disappears? RF: Yes. RY: CICE write to single file? RF: Yes through one processor. RY: Can we do it like MOM, each processor writes data to it\u2019s own\u00a0file. NH: Yes, good idea, but more complicated than MOM. CICE tiles are not located close to each other in space. RF: Could use PIO interface. Not compatible with centrally installed netCDF libraries. Bugs in\u00a0version of HDF. Need OpenMPI &gt; 1.10.4 \u00a0and netCDF &gt; 4.6.1. MW: PIO good candidate, RY can\u00a0help. CICE developers\u00a0looking into this? Stayed in touch with them? NH: Look at CICE6\u00a0GitHub. RF: Looked, but no active development on IO in any fundamental way.<\/div>\n<div><\/div>\n<div>NH: If we did\u00a0decide to go that way, good\u00a0opportunity to feed that back to CICE\u00a0community.<\/div>\n<div><\/div>\n<div>MW: NCAR as a developer of PIO, keen to get it into other models. If CICE is on their radar might get some\u00a0feedback there. RY: MOM has IO layer a bit like PIO. MW: Not a good idea to use PIO in MOM6.<\/div>\n<div><\/div>\n<div>RY: Tried PIO in MOM and found it was not a good candidate. MW: Yeah, MOM6 was already doing something like that.<\/div>\n<div><\/div>\n<div>RY: Parallel compression will be supported in future in netCDF.<\/div>\n<div><\/div>\n<div>RY: Been experimenting with my own version of library and got some positive results.<\/div>\n<div><\/div>\n<div>End\u00a0result: take compression out, take out\u00a0static fields. Post processing. Is anyone using daily fields. RF: We\u2019re interested in daily ice fields. Using data assimilation. MW: Shorter runs though? RF: 20 years.<\/div>\n<div><\/div>\n<div>NH: Instead of\u00a0writing individual daily files, should write to a single file, static fields\u00a0won\u2019t be replicated, maybe benefit from some netCDF buffering. AH: Big code change? NH: Not sure. AK: Has a file naming convention for different frequencies. Frequency part of filename. NH: Saying could already output daily into monthly files? AK: No, filename encodes\u00a0time and frequency. Doesn\u2019t seem to write repeatedly to any of it\u00a0\u2019s output files. AH: Define unlimited dimension.<\/div>\n<div><\/div>\n<div>NH: Make a\u00a0GitHub issue. If high priority could get some time. MW: Make the issue in the CICE repo,\u00a0inform them what we\u2019re\u00a0doing. They mentioned an NCAR\u00a0community board.<\/div>\n<div><\/div>\n<div>AH: Make a\u00a0namelist option and recompile? Compression level as option?<\/div>\n<div><\/div>\n<div><\/div>\n<h3>Tenth load balancing<\/h3>\n<div>AK: RF suggested a smaller core count of 799.\u00a0Doesn\u2019t change\u00a0wall time which is a win. How low can we go? RF: Worked out a few more configs. Slight change of tile size, 720 would be ok. 36&#215;36 or 40&#215;30.. Running some quick tests with tool under \/short\/v45\/masking.\u00a0Run and output\u00a0masks and where tiles get located. Also number of processors\/blocks you need. AH: Put code on COSIMA\u00a0GitHub? RF: Just a quick little thing. AH: \u00a0Yes but useful.<\/div>\n<div><\/div>\n<div>AH: Down from 1380. Big win. Total\u00a0core count? AK: not sure. RF: Total just over 5000. AH: Still\u00a0running on normalbw? AK: Yes. AH: Wait on normal crazy. RF: Look at skylake? Usually empty. RY: Yes new nodes, not large total core count. AK: Get 6mo\/submit\u00a0without daily outputs. Daily over by 30\/45mins with ice. dt=600s.<\/div>\n<div><\/div>\n<div>NH: If\u00a0no-one else\u00a0to fix, and\u00a0no-one else to fix, assign NH to issue.<\/div>\n<div><\/div>\n<h3>WOMBAT<\/h3>\n<div><\/div>\n<div>RF: Got Matear up to speed. Ran a few tests. One or two bugs yet to be fixed. A couple of fields that weren\u2019t coming through from OASIS properly. Was the\u00a0ice field, wasn\u2019 t coming through correctly. Got it going with external fields\u00a0forcing it. Figured out changes to get it running\u00a0properly with full ACCESS mode. Running some tests cases\u00a0after bugs fixed. MC: Now\u00a0running with calculated gas\u00a0exchange coefficients. RF: The way it was originally written the way fields were ingested into MOM. MC:\u00a0Using the same wind field in BGC and wind mixing? RF: Yes, all\u00a0together. MC: Level of the wind? In ACCESS-ESM was getting lowest atmospheric wind. MC: CICE will\u00a0send a 10m wind through OASIS? RF: Not FMS coupler, this is just OASIS 10m wind. MC: ACCESS-ESM case?<\/div>\n<div><\/div>\n<div>AH: Hakase\u00a0could be used as a guinea pig.\u00a0Any of these changes affect ACCESS-CM2? RF:\u00a0Shouldn\u2019t. AH: Do we need to do any bit repro tests? RF:\u00a0Shouldn\u2019t change anything.<\/div>\n<div><\/div>\n<h3>migrating FMS to an external library<\/h3>\n<div>AH: I put my hand up to do the change and test.<\/div>\n<div><\/div>\n<div>MW: FMS updated to Xanadu a couple of weeks ago. AH: So a good time to try it\u00a0out. MW: Already tried it,\u00a0put some MOM patches in to fix some issues. AH: On the GFDL FMS repo? MW: They have opted not to take the parallel netCDF using MPI IO patch RY and I worked on. Have set up a branch with parallel IO, and Xanadu has been merged into that branch. May want to use branch with parallel netCDF extensions. Ongoing conversation with this. They may merge it in. Can use what you want. Your call as to what to use.<\/div>\n<div><\/div>\n<div>RF: Any whitespace issues? MW: FMS and MOM6 live on different planets. They don\u2019t interact much. Don\u2019t collaborate with FMS guys.<\/div>\n<div><\/div>\n<div>MW: Alistair getting miffed at the red buttons on the jenkins server. He\/I will look at some GFDL independent solution. Happy for NH to be involved as much or as a little as he wants. NH: They should be more blue than red. MW: Happened in March due to checksumming? NH: Bitrot, Jenkins is fragile. Scott often fixes it. Good idea, happy to\u00a0help in any way. May be easier to set up on raijin. Does one qsub and runs them all under one\u00a0sub. MW:\u00a0slurm is sort of designed to do that. NH: slurm is awesome. MW: slurm is better. NH: like it a lot more. MW:\u00a0Good for running multiple\u00a0jobs per submission. Blurs the line between MPI and scheduler. Some sort of meta-scheduling. Place jobs on ranks within the request. AH: More flexibility.<\/div>\n<div><\/div>\n<h3>Actions<\/h3>\n<ul>\n<li>Update MOM build to use external FMS library (CMake) &#8211; AH<\/li>\n<li>Finish WOMBAT integration &#8211; RF<\/li>\n<li>Make CICE compression issues &#8211; AK<\/li>\n<\/ul>\n<div><\/div>\n<div><\/div>\n","protected":false},"excerpt":{"rendered":"<p>Minutes Date: 15th May, 2019 Attendees: Aidan Heerdegen (AH) CLEX, Andrew Kiss (AK) \u00a0COSIMA, ANU Marshall Ward (MW)\u00a0GFDL Russ Fiedler (RF), Matt Chamberlain(MC) CSIRO Hobart Nic Hannah \u00a0(NH) Double Precision Rui Yang (RY) NCI Agenda &#8211; Follow up on migrating FMS to an external library &#8211; WOMBAT in harmonised MOM update and testing &#8211; Tenth&hellip;<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[6],"tags":[4,3],"_links":{"self":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts\/687"}],"collection":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/comments?post=687"}],"version-history":[{"count":1,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts\/687\/revisions"}],"predecessor-version":[{"id":688,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/posts\/687\/revisions\/688"}],"wp:attachment":[{"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/media?parent=687"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/categories?post=687"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/cosima.org.au\/index.php\/wp-json\/wp\/v2\/tags?post=687"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}