Technical Working Group Meeting, December 2018

Minutes

Date: 11th December 2018
Attendees:

  • Marshall Ward (MW) (Chair) NCI
  • Aidan Heerdegen (AH) and Andy M Hogg (AMH) CLEX, Andrew Kiss (AK)  COSIMA, ANU
  • Russ Fiedler (RF), Matt Chamberlain (MC) CSIRO Hobart
  • Nic Hannah (NH) Double Precision

COSIMA Models

Profiling

MW: Been profiling CICE, score-p profiling doesn’t work. Been timing by time step. Anomalously long time spent at step 72. AH: could it be atmosphere being updated. JRA55 is 3 hourly. Not sure timestep. MW: Seem to have lost my logs. Not sure best way to handle it.

CM2 Harmonisation update

AH: Peter has been testing release candidate. Russ supplied a diag_table which just outputs fields for first 2 time steps which is really good for seeing code issues. Russ found some bugs introduced by me. A couple of logic errors with preprocessor flags and omission of a couple of lines that got lost in translation. Confident latest update has squashed all the bugs. MW: Not old bugs? AH: Did find some old issues. Russ found a stuffed iceberg file. RF: Not related, but is something they were using for CMIP6. AH: Did find some old bugs, had to emulate the lack of reproducibility from a the readsea salinity fix timing bug to be able to closely reproduce CM2 output. Put a flag in to do the wrong thing to do the same as theirs, will remove before merging. MW: I thought reds fix had been changed to be faster but not reproducible. RF: That’s right, but not issue. This has to do with timing. Aidan fixed it, but not compatible with what they are using. AH: Just need something that reproduces CM2 output.

Narrator: The new way of doing salt fix will reproduce over time steps, but is not bit reproducible with the old algorithm. Don’t see that effect in these tests.

AH: Peter has a test suite which is old CM2, and a copy which uses updated MOM. He compiles the new code manually and runs the two suites side by side. Both use Russ’ diag_table. Just find out which fields don’t match. Most are the same, few different, seem to be affected by the same issue. Once we’re good for a few time steps then maybe look at them after a few months RF: Once chaos starts, hard to say. As long as nothing gross happening. Unless there is something further on with coupling. AH: Yes, look after a month and check it looks close. MW: Not trying to be bit reproducible? AH: Just want to fix my bugs. RF: Make sure you’re getting the same forcing fields. Can see out in the open ocean hardly any change. Just noise. This means we’re close. Saw the outline of where the forcing field is supposed to be. The bug in the forcing field data showed up, which indicated the issue. AH: Once we’ve confirmed fixed, will merge PR and then move on to ESM.

MW: Will the CM2 code remain in step with the MOM5 code? RF: CSIRO Aspendale not doing much code development at the moment. AH: Peter is pulling directly from his GitHub repo, but once it is harmonised they will pull directly from the MOM5 repo. They will want to have a tag and pull from the tag. RF: Yes they will want frozen versions. AH: Should have some automated tested, if we find a bug, should be able to updated CM2 code and confirm doesn’t change important answers.

AH: Short answer: Lots of progress. I made lots of bugs and Russ found them. Thanks Russ. NH: Yes thanks Russ.

Model reproducibility and payu bug

NH: working on documentation, wiki, tech report and model paper. Like to do more. Wiki doc easier as a brain dump. Made sure ACCESS-OM2 Jenkins tests are passing. Takes time something always seem to go wrong. Six tests passing and useful. Repro test working and now reproducing across restarts. Wasn’t working due to 1. payu bug, 2. red sea fix and 3. compiling with repro.
NH: Doing 2 runs with and without that payu bug on 1 and 0.25 degree. Doing 4 years as individual 1 year submits. Make sure bug not too serious. The way the coupling field restarts are done not good. Ocean has to write out a restart for cice (o2i.nc). Copy of restart file missing. Had in the past. Refactor with libaccessom2 and change of payu model driver didn’t carry this over. Means every first forcing fields that the ice model gets at the beginning of a new submit for the first coupling step are from the beginning of the run, not the previous run. Ice model is getting the wrong forcing for the first 3 hours.
MW: Has it been fixed? All runs affected? AK: Yes fixed now. Scope which runs affected. Only since YATM? NH: Yes. If your run uses YATM it will have this problem. Around the time the bug introduced. Restructured how config.yaml organised. Created libaccessom2 driver, and bug came in at that point. MW: Used to have oasis driver that did that. NH: Restart repro test existed but failing for other reasons, not being kept up to date. If that test was passing and then started failing, then would have been noticed. Doing a post mortem to see if there is anything significant on a 5 year run. Gut feeling, just in the ice. RF: Will just be the SST that it sees. If running a month at a time significant. Yearly not so important. Also depends what was in the initial coupling field. NH: Initial field correct, probably January. RF: Didn’t get updated for changes to landmasks? NH: Land has been eliminated so not necessary. NH: Any run which is a multiple of 1 year, problem is smaller. AH: Quarter and 1 degree aren’t that affected, tenth most affected. NH: Could do 1 month 1 degree runs. AH: Good idea. Don’t forget about runspersub option, could do 50 in a single submit. MW: payu restart flag now works as well. Could be useful for testing reproducibility. NH: This could be a problem in other cases as well. Existing restart is based on a specific time. May be correct for the specific model it was created for. RF: Should be matched to initial condition, with correct fields. MW: This is a cold start? NH: Needs to be created each time based on start time of your forcing. AH: Write code into model to read in IC and write back out to coupling fields? NH: Something like that might be good.
AK: Bunch of fields SST, SSS, SS velocity, SS slope, frazil ice formation energy. RF: SST and SSS only ones not zero in a cold start. AK: Replace by initial condition for entire experiment NH: There is a single file in the ACCESS-OM2 input directory that all experiments use. NH: Could diff that against what it should have been. MC: That is cold start bug, not so important. Warm start bug fixed? NH: yes fixed in latest version version 0.11.2. AK: People aren’t using that? MW: No, because it was broken. Now fixed. AH: Arguably should delete payu versions with the known warm start bug. Or back port the fix? MW: Don’t have framework to back port fix. AH: How many versions affected? NH: Put a warning message/assert in that stops and doesn’t let it load. MW: happy to delete old versions. Some people use a specific payu versions. Easy to put warnings in module files. Can also delete old ones. Not a huge problem.
AH: figure out which payu versions affected. Make a decision based on that. MW: Only those with libaccessom2. AH: Don’t delete straight away. Turn off modules first. See if there are people affected. AK: Could be people not using access-om2. AH: yes, but can use new versions. Need to make sure people not using buggy code. AK: Possibly move to new space. AH: yes, but might not be necessary. MW: May be impossible to back port fixes. Driver might not be functional. No problem doing backports, not sure how.
AH/MW: Might not need to back port, should:
  1. Confirm payu/0.11.2 working correctly
  2. Set as default version
  3. Determine which payu versions affected
  4. Turn off affected modules in modulefile and issue message about bug, what module to load and to email climate_help if users still has issues
  5. When complain assess individual cases
  6. If necessary move payu module to non-app path
  7. Delete old versions?
2 week time frame.
MW: People shouldn’t be encouraged not to specify module versions.
MW: Make sure 0.11.2 working correctly. Works for NH and AH. AK a good test for it as running. AK: Not running at the moment. Can we use old mppnccombine with payu/0.11.2. AH: Yes. MW: Use whichever you want. AH: works better for 1 deg in any case.
MW: added a restart directory feature. run 0 uses the restart and reset counters back to zero. AK: Had been copying stuff. MW: I’ve been symlinking and other hideous things. AK: Documents what you did better. AH: Used to have problems with drivers trying to delete symlinks when cleaning up restart directories.
AH: Will finish manifest this week. Chatted with Marshall and reimplementing it a bit differently. Will make NH’s job a lot easier. Run config has all the files, just need to clone and run. NH: awesome.
NH: Want any post-mortem or checking on tenth model for the payu bug? Could do some short 1 month runs. AK: Not sure what we would do with the information. Diagnosis without treatment. Interesting from an academic viewpoint. Planning to do a longer re-run with other changes and will be fixed in that. Interesting to see a couple of months and see scope of issue. Is it negligible? Maybe tell people AH: Choose a worse case: Southern summer? NH: Ok, might do that.

OpenMPI

MW: Been using OpenMPI/3.0.3. Working well. Speeds same as 1.10. uses ucx by default. Turn off all flags, except error aggregate if you want. Can try 3.1.3, had some issues. Likely the version on the next machine.
AH: Test on Jenkins with new OpenMPI? MW: Good idea
MW confirmed that using hyperthreading option in payu is harmless (might even be on by default).

COSIMA Models

Bathymetry

RF: Wanted to get rid of Ob river? 1150 looks good. Need an inlet to keep runoff in correct place. See GitHub issue. Plot shows 0.25 degree cell size is cut off.
AMH: Need to get rid off the Ob. Russ’ plot at 1150m looks good, maybe smooth out corners. RF: Have to look at index space, straight edges, no inlet, things like that. Depth is minimum depth, 10m, a lot more shallow in actuality. AK: Only real reason to keep it is to have the runoff in the right place. Had to smooth to stop model crashing. Main reason to keep is to make sure runoff is mapped correctly. AMH: Where is runoff coming from? Take it too far up and might get remapped to the wrong embayment. Why I like the minimal change. It is stable. AK: Yes since Russ’ fix that stops salinity drop below zero with ice formation. AH: If your map had water at depth zero, as opposed to land, then can follow the water along until it is > 0. Say this is water, use for remapping but not for model. AK: Need a separate file? AH: Not necessarily. Remapping using it’s own logic anyway. AK: Remapping takes no account for topography. NH: Could make the distance function smarter, use a directional weight, something like AH suggested, or take into account topography. AK: Go downslope.
RF: Other problem was Southhampton Island. Just taking out inlet was sufficient. AMH: Keep Island separated from mainland? RF: Yes. Hasn’t been causing problems? AK: No. AMH: Will leave cells smaller than 1150m. AK: Yes, but not too bad. Also an abrupt change in spacing. RF: Yes tripolar grid has discontinuity. AH: Cut of at 1150m, what was it before? AK: 880m. All crashes I had with ice remap error were less than 1100m. Those can be eliminated with closing channels. AMH: Worried about Southampton. AK: Never had issues there. Will be getting new constraints. Had to put damping on Kara Strait, and had issues with seamount off tip of Severny. AMH: Ok, keep it at 1150m and see.
AK: In quarter degree Baffin Island is attached to Canadian mainland. Tenth has much more open water. A lot of it extremely shallow (less than 100m), so unlikely important for sea water transport, but likely important for ice transport. AMH: And therefore fresh water transport. AH: Who will do this? RF: Planning to do it today or tomorrow. AMH: Awesome, thanks.

Profiling

AMH: getting different numbers between IAF and RYF due to AK needing more ice time steps in IAF case. He can’t run with ndtd=2, so load imbalanced to cice. ntdt=2 with minimal. AK: Time difference is due to value of ndtd. Ruth still getting bad departure points with minimal. Reduces ocean time step for a single submit. I reduced ndtd instead. AMH: This has caused a load imbalance. Not the same as our optimisation that NH targeted. NH used ndtd=2 in optimisation. AK using 50% more time.
MW: What optimisation? AMH: When NH looking at load balancing. AK using 50% more time steps, and taking 50% more time.
NH: Now have a rebalanced tenth minimal with ndtd=3. With the bathymetry changes might not need it. AH: Hold off on that until AK can tell if we need it. AK: May still on occasion need to reduce time step every 5 or 10 years, preferable to ndtd=3. IAF variability means can’t guarantee it will work with every year.
MW: OASIS timing issue. Struggling to define main loop time. Looking at 1 deg, outputting time of every time step. Not literally useful due to overhead. AH: Give you scaling? MW: Not sure.
MW: timing between 170-200ms per step. Step 32 get a big number. 36s in one, 72s in the other. Is it just waiting? Doing IO? Maybe some sort of OASIS thing happening to bootstrap. Get infrequent huge time steps. Run again and don’t get them. Going to remove the largest timestep. Anyone know what is causing this?
NH: What are you profiling? MW: Just the coupling step. Reporting the coupling code.
MW: Does it do a lot of IO on that first coupling step? NH: Yes it does on the first step. What about CICE diagnostics? Are they printing to ice_diag.d. Should be consistent. If it goes away?
RF: CICE does IO through one PE, so does a global collective. MW: Could be IO and MPI collective issues. Not sure if this is legitimate timing or not?
NH: Not sure what the bigger picture is, but find targeting specific routines to look at load imbalance. NH: definitely look into CICE diagnostics.
MW: Timing so inconsistent. AH: Run a bunch of use the minimum. Turn off all diagnostics. AH: For the paper MOM scales well. Need to say something about CICE scaling. Doesn’t need to be the final word. MOM gives some leeway and these are the best configurations …
NH: Happy to help. Can do more fine grained stuff. Do some counting. MW: like score-p but it dies with CICE.

Grid scale noise

 RF: Chris Chapman problem with submeso scale stuff (see issue). There is a smoothing feature in submeso but says it doesn’t reproduce. Think I found a bug. Does smoothing of mixed layer. Possible to put mixed layer into rock with smoothing, doesn’t seem to be any check. Might get some others to look at it. If they agree we might be able to fix it and reduce the checkerboard. AK: This in MOM6? Also in MOM5? RF: There is a namelist parameter, says not to use because not repro, but because buggy. No reason it shouldn’t reproduce.
MW: Is this filtering a numerical mode? AK: KPP purely numerical, so adjacent columns can decouple. RF: Will point out code and see if people agree. AK: Get fixed and could be good to put in for next tenth degree run.

Technical Working Group Meeting, November 2018

Minutes

Date: 13th November 2018
Attendees:

  • Marshall Ward (MW) (Chair) NCI
  • Aidan Heerdegen (AH) CLEX, and Andrew Kiss (AK) COSIMA, ANU
  • Russ Fiedler (RF), Matt Chamberlain (MC) CSIRO Hobart
  • Nic Hannah (NH) Double Precision

Payu update

MW: payu is now python3 compatible. Can be run from a .local install. No longer uses modules. Tagged 0.11.1. AH: will also install into conda environment on raijin. NH: sounds great. Have used in conda python27 environment. Good to have python3.

MW: Want to get this to a position where I can leave it to others to support. Might get GFDL interested in it. Have time to wrap up a lot of things.

AK: How rigid is the 3 digit output in archives? MW: Only a print statement. Should work with higher numbers. AH: Won’t list nicely. MW: Had meant to add a format option.

MW: Try out the new payu, want to make it the new one.

TWG Organisation

AH: Peter Dobrohotoff sends his apologies, cannot make meeting.

AH: Do we need to decide on a new chairman? MW: Happy to resign straight away. MW: Considering going to GFDL in February to bootstrap remote working. AH: Nic did you want to take over? NH: No, convinced it’s not a good idea. AH: Sort something out next meeting. Last one of the year?

OM2/CM2 MOM5 Harmonisation

AH: Peter Dobrohotoff sends his apologies, cannot make meeting.

AH: Shame, as PD tested harmonised MOM5 but used incorrect namelist options. Losing some momentum as would like to have that checked off so we could start harmonised ESM.

MC: Wrong namelist options? RF: Didn’t have correct namelist options to include my new mixing scheme.

AH: They were also concerned about background mixing, in case that as having an effect. Could point them to the relevant PR/Issue on GitHub with all the plots and documentation showing it was working correctly.This is a valuable resource and a good way of working.

AH: Richard Matear contacted me and wanted to know status of ESM harmonisation and what he could do to help it progress.

ESM MOM5 Harmonisation

ESM is using an version of MOM5 updated to the beginning of the year with WOMBAT added by MC. It was decided to not continue with ESM harmonisation until CM2 bedded down, as it requires some of the code changes from the CM2 harmonisation.

MW: Is cylc suite used for testing using the MOM repo compilation? AH: I believe Peter is currently turning off the automatic compilation in the suite and using the repo compilation script on the command line to create the MOM executable. MW: I think improving/streamlining and harmonising the cylc suite is as important as harmonising the code. AH: I would have liked a test suite that incorporates this, but this is the way Peter has been comfortable working. Can’t progress ESM until have the all clear that CM2 is working correctly.

MW: Richard & Matt are using the ESM model? Not using GFDL stack? MC: GFDL in decadal project. This harmonisation will get WOMBAT in the MOM5 master branch, which has long been a goal. Decadal project not using UM. Research effort there is data assimilation and reanalysis that they’re running, rather than updating the model. Won’t hear much from them. When AH gets the WOMBAT code in, please contact us. Have experience using payu runs at 0.25 deg. Will fix you up with files and help when it comes to testing. AH: I will mostly be relying on others to run stuff. MC: Not running under payu currently. Out of the loop at the moment. Did run one of Kial’s runs. I need help running with payu. AH: Yes we can do that together. Holger is trying to get payu to run with ESM, making slow progress. MW: Who is supporting the ESM? Surprised Richard will contact AH. AH: Someone told him it was part of this code harmonisation. MC: Richard’s interest is WOMBAT in MOM5. AH: ESM will be the CLEX coupled model. MW: Who will be responsible for ESM? AH: Tilo is doing a CMIP6 submission with ESM. CLEX will be wanting to use all of Tilo’s runs to spin off their own experiments. At that point CLEX CMS will support the model with payu on NCI HPC systems. MC: Tilo is doing the work of the equivalent of the entire CM2 team to get ESM working. We support Tilo somewhat, and there are others who are no longer formally part of the team but contribute. Richard has interests in this space also. AH: Tilo can benefit from work we’re doing. Scott found a 10% improvement in UM speed. MC: Speed/efficiency not a priority right now. Focus on land model, forcing etc. Not #1 priority. AH: If they’re open to that input, they can still get the benefit even if it isn’t their focus.

MC: Have to go to another meeting.

AH: Do we need to move the meeting? MW: Happy to move to another day if it works for people. AH: Doodle poll? RF: Next year MW: Ok, do something next year.

Minimal 0.1 MOM

AH: Working really well. Would like to get it to run a little faster so she could get 2 months per submit. Which would improve her throughput a lot. Does NH have any ideas to speed it up? Seems like MOM model has 10 minutes of spare time. Is it CICE bound? NH: That might be the initialisation time. AH: Don’t MOM timings take into account initialisation? NH: Not until recently. Marshall fixed it. MW: Clocks weren’t showing MPI initialisation time, just MOM initialisation time. AH: Could be 10 minutes? MW: I would be surprised. Would guess 5 mins, less than 10. Big model, spends a lot of time on field exchange.

AH: Not compiled with AVX2. Will that help? MW: Did AVX and AVX2 test. Could see the difference, wasn’t large enough to bother making non-compatible binaries. NH: Slack me the path and I can take a look. Haven’t spent a lot of time trying to optimise it. MW: Can sometimes improve time by changing layout. GFDL tried very long tiles that means halo updates are only north/south. NH: She might have an older config. I switched to Sandybridge for efficiency. MW: I think you can get 7-8% speed up going from AVX to AVX2. AH: I advised Ruth to use broadwell due to memory requirements making for better throughput. AH: I suggested she have higher diag_steps, but RF pointed out global scalars mean she is doing daily global MPI calls. RF said doesn’t necessarily have to be this way? Could it be changed in FMS? RF: Yeah, all the diagnostic code. In many cases time average can be commuted with area average. Every timestep doing a MPP sum or MPP global sum, can do local sums on local process and call MPP sum when you need to. Could rewrite the MOM code to do cumulative average and do an instantaneous output. Sort of fudge an average. MW: Wouldn’t make much difference to speed. RF: Depends how much those global sums are hurting you if any, but it might be a single global sum acts as a synchronisation point and doesn’t really matter. MW: I told NCI MOM didn’t do collectives because they were so fast they didn’t show up in profile. So unlikely to help a lot. RF: MOM collectives are very simple. If not doing bitwise stuff, just taking a collective of one number. MW: Caveat, only tested at 0.25 deg, so can’t know for sure it is the same at 0.1 deg. Should do it because it will eventually start to bite. Could do a profile?

AH: AK only does 2 months/submit, so maybe we would all be better running a minimal config? MW: doesn’t bode well for exascale. AH: So many constraints with PBS etc. AK: Optimised for model+machine+queue constraints not for the model on its own. NH: Could just bump all the cpus by 10%? AH: You’ve got a nice sweet spot there, first try AVX2 and see if we can get the speed up we need.

MW: Vectorisation can help. AH: Would bigger tiles help? MW: So much time moving in and out of L1 cache that it doesn’t make much difference. AH: Broadwell got bigger caches? MW: Bigger L3 but 12 more cores. NH: Is she using 600s timestep without crashing? AH: Had an ice remap crash after a couple of years. AK: Unclear if that will happen again. Doing RYF, and I found once the crashes started happening they kept happening. RF: Using latest bathymetry? AK: Yes. RF: Any difference? AK: No idea. NH: I think it  is generally more stable.

MW: Can play with barotropic halo. Barotropic solver has halo of 10 so it can do it’s work every 10 steps. Might be able to get some speed up by playing with that. AH: Put path to Ruth’s control directory on TWG slack channel.

NCRIS ACCESS meeting

NCRIS is doing a scoping study to see if it is feasible for a team of 15-20 people to support ACCESS modelling in Australia, which would be used for submission for funding from NCRIS. The meeting was to get feedback to help write the submission.

Some discussion of the experience of the meeting.

Calendar issues

RF: MOM uses Proleptic Gregorian calendar type, but does not use the correct calendar attribute when outputting the file. It sets it as Gregorian instead. So, when using days since 01/01/0001 there is a jump in October 1582 depending on which calendar is used. Get a 2 day offset for IAF files because of this incorrect calendar attribute. Found python netCDF interface uses udunits calendars and has problems. Had to force it to proleptic gregorian to read dates correctly. Big issue when dealing with daily data. Output files need to be fixed. Could change the calendar attribute to Proleptic Gregorian or change units to be days since a year after 1582. MW: GFDL use since 1900? RF: Yes, as this is what Ferret uses.

AH: Had a lot of date issues using python. Uses date library from numpy as there is limited date range available due to nanosecond resolution. We often have to do date offsets anyway, so probably don’t see this issue as much. Should we put proleptic gregorian into MOM? MW: Shouldn’t we change the start date? RF: That is the easiest thing. There is a lot of broken software that doesn’t treat these calendars correctly. MW: should tell GFDL about this RF: Looked at the code and made some changes, but not uploaded. AK: Is MOM using the correct dates? As with coupling to CICE etc? RF: Works ok internally. AH: Arguably a bug if they’re using proleptic and not using correct attribute. RF: Yes. CMIP6 accounts for this. Checks for dates before 1582 and requires using proleptic gregorian. Future runs should have an offset of some later date.

RF: Getting huge number of messages from restoring files starting at year 0000. Restoring files on a time modulo axis and created from Ferret, which automatically treats any file with a start year of zero or one as modulo. However year zero does not exist and is incorrect. Just need to change that attribute in the restoring files, won’t make any difference to operation but save a huge number of warning messages. MW: I get 482,000 lines of errors. I would be very happy if this is fixed. MW: Someone should change those fields. RF: I don’t have access. MW: Should go and edit the public forcing fields. What specific files? AH: If you’re talking about the ACCESS-OM2 configs, NH has the most ability to change them. MW: salt_sfc_restore? RF: Yes, and temperature, chlorophyll. Anything seasonal, a restoring. MW: Anything that says “months since 0000”? AH: Yes change to 0001. RF: Anything that uses that date (zero years) can be changed. AK: Anything that isn’t JRA that isn’t multiyear? RF: Maybe runoff? Do we use the JRA runoff? The problem really is the stuff MOM reads directly, like sponges. NH: I am happy to look into this. I might be the only one with access, hope not. Have been thinking about this for a while. Changed the OASIS code as well to ensure DEBUG_LEVEL zero does not output anything. Was outputting thousands of lines. Also an Andy Hogg GitHun issue and this was the next one on my list. MW: So far I can only find salt restore and ssw shortwave. RF: Shouldn’t be using that. Should be using GFDL formulation which reads in chl.nc.

AK: Some files in those ACCESS-OM2 input tarballs that aren’t used. Should they be removed? NH: Posted on slack about this? AK: Yes, but not sure they aren’t used by someone. NH: Bit messy how this is done. Should really just have a bunch of files and grab what they need. Would save a lot space. Currently versioning sets of files rather than individual files.

mppnccombine-fast

AK: Some issues on GitHub. RF: Been discussing this with AH and Scott Wales. An attribute needs to be removed. AH: Biggest issue is regional outputs having incorrect dimensioning. Which has been fixed. Also fixed the unlimited dimension getting squashed. Also another issue with passing too many files on the command line due to an MPI issue. Requires a change to payu as globbing is now done internally so any glob needs to be quoted. It’s on my list of tasks.

MW: Original tool used a pattern? AH: Didn’t implement that in mppnccombine-fast, maybe we should? MW: Stopped doing that in payu to support some coupled FMS codes where tiles didn’t start at zero, but could go back to the old way. AH: Does using the pattern work with masked configs when tiles are missing? I can’t recall. MW: Not sure.

ACCESS-OM2 disk usage

AH: AK and I went through some of the 0.1 deg output directories and found we could get significant space savings in the ice diagnostics AK: Ice outputs are not compressed, daily data is in individual files half of which is grid data. Can get a 8 fold decrease in size. Out of 20TB of total data can save 12TB of space. AH: Want to make a post processing script to run this automatically. AK: Yes, also delete all the zero length log files AH: This was to clean up for archiving. MW: payu should do this, maybe not looking in the right places. NH: FORTRAN has an option when closing a file to delete if empty, so looking into that. Also some CICE logs just have one line at the top with exactly the same text. AH: Yes we found those, matched the same number of bytes and deleting them. MW: If payu isn’t deleting zero length files not sweeping through submodes. AH: A lot tidier after cleaning. AK: Yes an hour well spent.

COSIMA Update

Over the last day we have completed initial spinup runs of the ACCESS-OM2 model.
The model has been run at 3 different resolutions, as listed below:

ACCESS-OM2 = [MOM5.1 + CICE5.1 + OASIS3-MCT + YATM + JRA55v13-do]

ACCESS-OM2:

  • 1° resolution, 50 levels
  • 252 cores
  • 48 yrs/day – 160 SU/yr
  • 5 InterAnnual Forcing (IAF) cycles complete
  • Numerous Repeat Year Forcing (RYF) cases

ACCESS-OM2-025:

  • 0.25° resolution, 50 (KDS) levels
  • 1824 cores
  • 16 yrs/day – 2800 SU/yr
  • 5 IAF cycles complete
  • Additional IAF and RYF cases run

ACCESS-OM2-01:

  • 0.1° resolution, 75 (KDS) levels
  • 5744 cores
  • 2.2 yrs/day – 63 kSU/yr (provided dt=600 sec)
  • Minimal config with 2064 cores, ~1 yr/day
  • 40-year RYF spinup with variable parameters, tweaks, date fixes during spinup.
  • Single IAF run from 1985 to 2017

For a full recount of today’s COSIMA meeting presentation, see the COSIMA Update slides.

Technical Working Group Meeting, October 2018

Minutes

Date: 16th September 2018
Attendees:

  • Marshall Ward (MW) (Chair), Rui Yang (RY), NCI
  • Aidan Heerdegen (AH) and Andrew Kiss (AK), CLEX ANU
  • Russ Fiedler (RF), Matt Chamberlain (MC), CSIRO Hobart
  • Nic Hannah (NH) Double Precision
  • Peter Dobrohotoff (PD), CSIRO Aspendale

TWG Organisation

MW: Taken position at GFDL. Starting 3-6 months. Need new TWG Chairman. Need to organise meetings. Not much communication with other working groups. AH: Anyone who is interested think about it, we can decide at a subsequent meeting.

MW: As I am leaving, noone left at NCI following ocean model development. NCI will appoint a new person, but RY is attending for some knowledge transfer.

OM2/CM2 MOM5 Harmonisation

AH: there is a cm2_release_candidate branch on MOM5 repository. Contains all substantive code changes from Hailin’s fork on Peter’s repo.

AH: Need a rose suite to support MOM5 compile script. Might get Scott Wales to help make the suite. MW: I might be able to help AH: Used original MOM_compile script? MW: Not sure. AH: Currently pulls in a build script from a totally different svn branch.

PD: Yes MOM5 in git repo. One of the directories (exp) has the same build script as you’re using AH. PD: I cloned your repository, copied over compile script and environment file, pressed go and it compiled. Problem at link time. Don’t have an opinion about build script being in repo. Rose suites do “blossom”. Ok to compile from command line at the moment. Can consult with AH offline.

MW: Are AH and RF happy with the code changes itself? AH: last set of changes are that crucial. Steve Griffies would have liked more atomic changes. Need to run, see if it is different, if it is, figure out how different and if it is important.

AH: Next harmonisation target is ACCESS-ESM-1.5, adding WOMBAT BGC. This will go into the main MOM5 repo. In theory will also be in the CM2 version of MOM5. It won’t be turned on, but we should check that it doesn’t make a difference to CM2 results.

AH: Seems straightforward, as MC had already put WOMBAT BGC into MOM5, but there have been some changes since then. MC: Pull 3 years of MOM5 changes into my own branch. RF: ocean_sbc is what hooks into WOMBAT. The components we’ve added in, like 10m winds and sea ice coverage is what WOMBAT wants. What we’ve got there now is compatible, except WOMBAT assumes 10m winds aren’t masked, and uses sea ice coverage to do masking. MC: Yep. RF: The way we do it, it is already masked. So might need a change to WOMBAT, or a flag. MC: does multiple masking matter? RF: if it’s multiplying by ice fraction, don’t want to multiply a second time. MC: Around the fringes? RF: No difference to open ocean or full ice coverage. RF: Pretty close to correct. Changed the interfaces. A lot of things in ocean_model can be kept in ocean_sbc. I can go with it with Aidan.

MW: Only time pressure is when adopted in CLEX? AH: No. Some people would like this to be in the ACCESS-ESM-1.5 CMIP runs. I don’t know what the politics situation is like. MC: Tilo is anxious to get control runs going ASAP. If there is a changed to a stable version he will run with it. Catia and Fabio are anxious to get extra diagnostics in for their experiments, but not central to ESM effort. Tilo will start as soon as he has his carbon cycle stuff fixed. MW: Pressure point on RF? RF: I’ll look at it. Just need to throw in a couple of the hooks into WOMBAT, but think they’re there. Should be straightforward.

AH: made a PR, link on TWG slack channel. Cherry picked out commits that seemed necessary. If make code changes please pull down latest code before submitting changes. Can delete fork if necessary and start again. RF: Yes, done that a few times.

MW: Harmonisation on track? AH: Holger is working on payu version for ACCESS-ESM-1.5. MW: CLEX specific? PD: CLEX is picking up ESM as climate model. We are all working in the same direction. Lots of non-CMIP science coming out of these models. Shouldn’t dismiss payu as something we don’t care about.

COSIMA Models

NH: Running minimal 0.1 degree config. Around 2K cores. Maybe not actual minimum, but decent compromise. Good efficiency. With dt=600s, around 5KSU/month. Models well balanced. Ice model not slowing things down and only using 350 cores. MW: sectrobin? NH: yes but probably doesn’t matter.

NH: Thanks for heads up for NCAR tripolar efficiency fix for CICE. RF: Surprised it makes a difference at low core counts. NH: Not sure it does, just wanted everyone to know it is now in the code. NCAR say they have checked they get identical results, confirmed no difference. One month in 2.5 hours with dt=600s. Can’t squeeze in 2 months/run. AK: What diagnostics? NH: Just monthly. Same as AK’s, changed daily to monthly, just in ice. AK: Currently have 3D daily prognostic fields. NH: Might slow things down a bit. Because this config is small it is nicely balanced. Fitting so much work into each ICE PE, there is more chance they are balanced. Using 8 blocks per core. AK: ndtd=3? NH: no, try with ndtd=2 to begin with, and seems to be going ok.

NH: Currently crashing off tip of Severny Island. High velocities at tip. Crashing after 14 submits (months). Surprised it took so long to crash. Done some work smoothing bathymetry. Doesn’t seem to have helped, now trying Rayleigh damping. RF: What month? NH: October RF: Is there ice there? NH: Don’t think so RF: Had a look at other months. A jet of warm salty water coming up from the south along the coast. Those sea mounts are there. NH: Almost completely levelled them. Still a dip. Cleared seamounts before and in the dip. Velocities are very high there. Highest velocities that far north by a long way. Wondering if it is an extreme situation. AH: I tried the truncate_velocity option north of a certain latitude. Didn’t work, had a temp or salt blow up, so don’t bother. MW: usually a no-no. AH: Had the same issue with MOM-SIS-01 with CORE-II NYF, same crash, same time every year. RF: Interesting that same problem with a different bathymetry. AH: Severny Island pokes a long way north, any flow coming that direction gets funnelled along the coast. Could stop crashes with Rayleigh damping at depth in small area NE of sea mounts. Steve not happy as a solution, but one small spot places ocean timestep limit on the whole global model. I think we should use Rayleigh damping if it stops this. RF, NH: Agreed.

AK: Same crashes in same location when I’ve attempted 600s timestep, so wound it back. Put Rayleigh drag in Kara Stratit NH: Yes I have those. AK: Can give some idea of scale of drag required. Also that drag might be pushing more water around the Severny Island. AH: You already have Rayleigh drag in your model? NH: yes, all of AK’s additions. Understand some of the frustration with this model. Small config, easier to run and test. Want to push timestep as far as possible. AK: Sounds like a good strategy. Though concerned by oscillations in vorticity field in shallow area south of Bearing Strait. Some sort of numerical glitch. Goes away with 450s timestep. Seem to get stuff like this when timestep is pushed up. AH: Any idea where it is coming from? AK: Not sure which terms/equations involved. Dispersion gets worse as CFL gets higher. Not sure. NH: Explore some of these things, as MOM-SIS-01 was running at 600s right? AH: Yes with Rayleigh damping. AK: Fanghua was using MOM-SIS-01 with this bathymetry, couldn’t go higher than 450s. Added damping and did a lot of work to track down issues. AH: Bathymetry has changed since then? AK: Yes, problem with ocean that shouldn’t have been. NH: Didn’t realise Fanghua used same bathymetry. AK: Similar. Would have had one full of potholes.

RF: Anyone used new bathymetry I made? Couple of cells filled also, but mostly partial cells. In bathymetry directory, added about a month ago. NH: Will try it.

NH: Want to get recent CICE changes into 6K PE model using one of AK’s restarts. Crashing with ice remap transport errors. MW: Include tripole changes? NH: yes. Also sectrobin code change (also doesn’t change answers). Experimenting with sectrobin and blocks to get a more efficient setup. MW: That is what I am running and trying to understand. If I do a git pull from yours will I expect crashes? NH: Crashes not due to code, just model instability. Tested that code doesn’t change answers. MW: Will try that.

AH: Which is the correct bathymetry file? Some discussion, turns out the new file is

/g/data3/hh5/tmp/cosima/bathymetry/topog_05_09_2018_1m_partial.nc

AK: To overcome ice crashes like that, use ndtd=3 to give ice more time. NH: You haven’t had ice remap crash since using this? AK: Correct. CFL issue, ice moving more than one grid cell per timestep. NH: Ice is going unrealistically fast, 35 m/s. MW: How does it do this? AH: Instability? NH: Yes. AK: Is sea surface slope high? RF: Diagnoses slope, derives slope assuming geostrophic properties. Not passing slope from ocean model. If you do, get checkerboard unless smoothed.

AH: Is ratio of PEs in minimal model same as for large model? NH: In 1/10 ratio is about 1:4 ice:ocean. Minimal model it is 1:5.

RF: Bugfixes found in CICE6 should be back ported. Were using the wrong mask in the EVP solver for updating the halos. Stops bit reproducibility. NH: I saw that bug list. Know where they are. Will bring them across. RF: Found different types in u and t masks (one logical, one 0/1).

MW: Latest profiling shows EVP taking most of the time, and in particular EVP halos. Wonder if these have any effectives RF: Purely a masking issue. Could be the cause of the strange stuff due to tripolar join. Only 5 lines of code. MW: Huge patch? NH: No. Not messy. This is not a big change of code.

AH: With CM2 with old versions of CICE5 with UM hooks etc. How serious an issue before back port to CM2 version? MW: Not time to go into that too far.

NH: Since CICE6 is just incremental improvement of CICE5, maybe we should use that in future?

Miscellaneous

MW: Ben arranging meeting with Team Leaders in this space. Set meeting on Nov 7. NH to be contacted? NH: I think I am going. MW: Discussing infrastructure needs for next 10 years. Would be good to have a consistent view on what is required. Meeting at a high level. MW: RY and I are going.

AH: Doing another payu training for CLEX, covering mppnccombine-fast, file tracking and ACCESS-OM2 configs, how to get them and what to do. Anyone at CSIRO interested?

MW: Will go over more profiling info on slack.

MW: Will merge latest payu versions. Can run without patching python version. AH: Yes can also run in a conda environment, which maybe tick’s portability box for NH

AH: people on payu/dev should move to payu/0.10.

PD: COSIMA meeting where harmonised code delivered. Amazing! Well done.

Actions

New:

  • Check ACCESS-ESM-1.5 PR / WOMBAT integration (RF, AH)
  • Backport CICE6 bugs into CICE5 (NH)
  • Forward training email to PD (AH)

Existing:

  • Create even 5 blocks per PE map for CICE (RF)
  • Update model name list and other configurations on OceansAus repo (AK)
  • Shared google doc on reproducibility strategy (AH)
  • Pull request for WOMBAT changes into MOM5 repo (AH, RF)
  • Compare out OASIS/CICE coupling code in ACCESS-CM2 and ACCESS-OM2 (RF)
  • After FMS moved to submodule, incorporate MPI-IO changes into FMS (MW)
  • Incorporate WOMBAT into CM2.5 decadal prediction codebase and publish to Github (RF)
  • Move FMS to submodule of MOM5 github repo (MW)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (MW and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (NH)
  • Look into OpenDAP/THREDDS for use with MOM on raijin (AH, NH)
  • Add RF ocean bathymetry code to OceansAus repo (RF)
  • Add MPI barrier before ice halo updates timer to check if slow timing issues are just ice load imbalances that appear as longer times due to synchronisation (NH).
  • Redo SSS restoring with patch smoothing (AH)
  • Get Ben/Andy to endorse provision of MAS to CoE (no-one assigned)
  • CICE and MATM need to output namelists for metadata crawling (AK)
  • Provide 1 deg RYF ACCESS-OM-1.0 config to MC (AK)
  • Update ACCESS-OM2 model configs (AK)

COSIMA 2018 Report

Aims & Goals

The third meeting of the Consortium for Ocean Sea Ice Modelling in Australia (COSIMA) was held in Canberra on 7-8 May 2018. This annual COSIMA workshop aims to:

  • Establish a community around ocean-sea ice modelling in Australia;
  • Discuss recent scientific advances in ocean and sea ice research in a forum that is inclusive and model-agnostic, particularly including observational programs;
  • Agree on immediate next steps in the COSIMA model development plan; and
  • Develop a long-term vision for Australian scientific advances in this area.

Participants

The 2018 workshop is our largest workshop yet, with 30 talks and 49 participants.

Attendees included:

Gary Brassington (Bureau of Meteorology), Matt Chamberlain (CSIRO), Chris Chapman (CSIRO), Fabio Dias (UTAS/CSIRO), Prasanth Divakaran (Bureau of Meteorology), Peter Dobrohotoff (CSIRO), Catia Domingues (UTAS), Matthew England (UNSW), Russ Fiedler (CSIRO), Annie Foppert (CSIRO), Leela Frankcombe (UNSW), Bishakhdatta Gayen (ANU), Angus Gibson (ANU), Stephen Griffies (NOAA/GFDL), Nicholas Hannah (COSIMA), Aidan Heerdegen (ANU/CLEX), Petra Heil (AAD & ACE CRC), Andy Hogg (ANU), Ryan Holmes (UNSW), Shane Keating (UNSW), Andrew Kiss (ANU), Vassili Kitsios (CSIRO), Veronique Lago (UNSW), Clothilde Langlais (CSIRO), Andrew Lenton (CSIRO), Kewei Lyu (CSIRO), Jie Ma (CSIRO), Simon Marsland (CSIRO), Paige Martin (University of Michigan), Josue Martinez Moreno (ANU), Richard Matear (CSIRO), Laurie Menviel (UNSW), Mainak Mondal (ANU), Ruth Moorman (ANU), Adele Morrison (ANU), Terry O’Kane (CSIRO), Peter Oke (CSIRO), Ramkrushnbhai Patel (UTAS), Paul Sandery (CSIRO), Abhishek Savita (UTAS-CSIRO), Kate Snow (NCI), Paul Spence (UNSW), Kial Stewart (ANU/UNSW), Veronica Tamsitt (UNSW/CSIRO), Mirko Velic (Bureau of Meteorology), Marshall Ward (NCI), Luwei Yang (IMAS, UTAS), Rui Yang (NCI), Jan Zika (UNSW)

Status

The workshop was structured to focus on scientific questions on Day 1, particularly in the first two sessions. In these sessions, topics ranged from from Antarctic shelf processes to oceanic convection, from reversibility of the Earth system to frictional drag. The final session on day 1 focussed more on technical issues, including assessment of the optimisation status of existing models. On Day 2, talks focussed more on strategic issues, including an outline of Bluelink, ACCESS, CAFE and coastal programs. These strategic talks transitioned to small-group discussions (see synthesis below). The workshop finished with a tutorial on the COSIMA Cookbook framework for model analysis.

The Australian landscape in ocean-sea ice research involves a number of interleaving programs, each of which was represented at this workshop.  The figure below outlines the linkages between these programs:

By way of explanation:

ACCESS-CM2/-ESM1.5 will be Australia’s input to CMIP6, and use MOM5 and CICE at 1°.

CAFE is the decadal prediction system in development, which uses MOM5.

ARCCSS/CLEX, ARC CoE programs, use high-resolution ocean-sea ice models for process studies.

Bluelink/OFAM is the ocean forecasting and reanalysis system which will adopt ACCESS-OM2-01 in future versions.

CSHOR is the Centre for Southern Hemisphere Oceanographic Research; it focuses on observational studies but we hope to establish two-way interactions with this program.

Coastal Modelling includes the Australian coastal oceanography community, as well as Antarctic nearshore programs within AAD and ACE-CRC.

A major theme of the workshop was to review the status of the ACCESS-OM2 model which is the focus of COSIMA. In short, we have had success with model releases at 1° and 0.25° resolution – these models are now actively being used for scientific runs, and are available for download and use by the community. They include a recent upgrade to the file-based atmosphere (YATM) and new JRA55-do forcing datasets. The 0.1° version of the model has progressed significantly in the last year; there are outstanding tasks to evaluate model output and further optimise the model configuration.

The COSIMA Cookbook tutorial was attended by about a third of participants, and some progress was made. The aim of this tutorial was to entrain more active users to the system and encourage input from those users. The Cookbook is similar in style to the analysis system being developed for CAFE and it may be possible to merge elements of each framework at some stage in the future.

Program

Where available, talk files are linked from the presenter’s name.

Monday 7 May
10:00 Arrival & Morning tea
10:30 Session 1 (Chair – Andy Hogg)
Stephen M Griffies (NOAA/GFDL): Understanding and projecting global and regional sea level: More reasons to include refined ocean resolution in global climate models
Andrew Kiss (ANU): Overview of the ACCESS-OM2 model suite
Andrew Lenton (CSIRO): Ocean Reversibility in ACCESS-ESM
Catia Domingues (UTAS): Global and spatial temporal changes in upper-ocean thermometric sea level
Fabio Dias (UTAS/CSIRO): Mean and seasonal states of the ocean heat and salt budgets in ACCESS-OM2
Adele Morrison (ANU): Circumpolar Deep Water transport towards Antarctica driven by dense water export
Jan Zika (UNSW): Getting an ocean model to obey: Prescribing and perturbing exact fluxes of heat and fresh water
12:30 Lunch
13:30 Session 2 (Chair – Clothilde Langlais)
Petra Heil (AAD & ACE CRC): ACCESS-OM2-01 sea ice
Paul Sandery (CSIRO): Sea-ice data assimilation and forecasting using an Ensemble Transform Kalman Filter
Paul Spence (UNSW): Does the Southern Ocean have sleep apnea?
Veronique Lago (UNSW): Impact of projected amplification of Antarctic meltwater on Antarctic Bottom Water formation
Ryan Holmes (UNSW): Numerical Mixing in the COSIMA Models
Luwei Yang (IMAS, UTAS): The impacts of bottom frictional drag on the sensitivity of the Southern Ocean circulation to changing wind
Vassili Kitsios (CSIRO): Stochastic subgrid turbulence parameterisation of eddy-eddy, eddy-topographic, eddy-meanfield and meanfield-meanfield interactions
Matt Chamberlain (CSIRO): Using transport matrices to probe circulation in ocean models
15:30 Afternoon tea
16:00 Session 3 (Chair – Petra Heil)
Nicholas Hannah (COSIMA): ACCESS-OM2 Software Development
Marshall Ward (NCI): ACCESS-OM2 performance analysis
Rui Yang (NCI): Parallel IO in MOM5
Angus Gibson (ANU): Towards an adaptive vertical coordinate in MOM6
Jie Ma (CSIRO ): Investigating interannual-decadal variability of Indian Ocean temperature transport in an eddy-resolving model
Paige Martin (University of Michigan): Frequency-domain analysis of energy transfer in an idealized ocean-atmosphere model
17:30 Close
19:00 Workshop dinner (Debacle24 Lonsdale St Braddon)
Tuesday 8 May
9:00 Session 4 (Chair – Andrew Kiss)
Andy Hogg (ANU): Are we Redi for 0.25° ocean-climate models?
Kial Stewart (ANU): The Repeat Year Forcing for JRA55-do
Terry O’Kane (CSIRO): Coupled data assimilation and ensemble initialization with application to multi-year ENSO prediction
Gary Brassington (Bureau of Meteorology): Ocean forecasting status and outlook
Peter Oke (CSIRO): Bluelink activities and plans
Matthew England (UNSW): A proposal for future projection simulations using COSIMA ocean-ice models
Richard Matear (CSIRO): CSIRO Decadal Climate Forecasting, update of the project’s progress
Simon Marsland (CSIRO): Preparing ACCESS for CMIP6
Clothilde Langlais (CSIRO): Downscaling towards the coast – a perspective on where the coastal modelling group would like to go
11:00 Morning tea
11:30 Discussion: COSIMA planning and strategy
13:00 Lunch
14:00 Strategy and planning summary
14:30 COSIMA Cookbook tutorial
16:00 Close

Synthesis of Discussion

Tuesday afternoon included discussions of present and future needs and directions of the COSIMA community, via breakout sessions on the topics Sea Ice, Coastal / Forecasting, Coupled Modelling, Process Modelling, Biogeochemistry, and Technical. The overall threads of  these discussions are summarised here.

Open and accessible code, configurations, output and analysis

Transparency, accessibility and reproducibility of model code development, run configurations and output data were named as priorities by many groups. Nic Hannah’s proposed REDB (Reproducible Experiment Database, http://redb.io) was widely supported as a means to tie together and curate the source code, configurations, output and analysis of model experiments. Using consistent shared codebases was also a priority. Containerisation was suggested as a method to make experiments self-contained. Extension of the database to include idealised experiments was also suggested.

Model evaluation

There is a need for more model evaluation against observations. Several groups highlighted the importance of better integration of observations for model validation and a desire for this functionality to be better supported in the COSIMA Cookbook. Comparison of CICE to SIS-1 at 1 and 1/4 deg was also suggested.

Technical validation is also needed – e.g. BGC, bit reproducibility, broadened test suite, regression testing. Model performance and stability priorities include: resolve crashes, balance load, MPI benchmarks and stress testing.

Usability

Suggestions included a glossary for beginners, an online portal for control runs, and to minimise difficulty of running new model configurations. Standardised output files and naming conventions would facilitate analysis. Improved functionality and versatility of the COSIMA Cookbook was also suggested.

Documentation was a priority for many, in particular an ACCESS-OM2 documentation paper, but also open/evolving documentation as the models develop.

Parameter selection was also a concern for many – how to choose appropriate parameters (e.g. for ice or BGC), how to assess model sensitivity to parameters, how to document why parameters were chosen or altered. Data assimilation was suggested a way to improve ice parameter selection, including assimilation of under-ice observations (e.g. temperature). BGC was suggested as a way to constrain the dynamics.

It was pointed out that the payu run management software underpins model runs, yet formal funding for its continued development is presently lacking.

Model enhancements

Suggestions for enhanced modelling capability included: interannual forcing, WOMBAT BGC, coupling to an atmosphere model, 1-way nesting, coupling to wavewatch, explicit tides, wet/dry cells.

Community coordination, synergies and strategy

Suggestions included a streamlined process for providing community feedback and deciding on priorities, and for community involvement in developing the BGC component. It was also suggested to foster engagement with atmosphere and sea ice specialists, and have a more formalized ice group. The technical team is also seeking more input from scientists, especially regarding sea ice.

Regarding modelling strategy, it was suggested to have intelligent model diversity (not too many versions), a consensus on standard perturbation experiments, and to decide on resources to commit to MOM5 vs. MOM6.

Summary of Priority Tasks

The following list of tasks was identified as a priority for the near term. Volunteers to lead or assist with tasks much appreciated.

  1. IAF Runs: With the addition of YATM, we now have the facility to run Interannual Forcing (IAF) runs from the JRA55-do forcing dataset in ACCESS-OM2. Once YATM has been tested, we will conduct IAF runs at all resolutions, starting with 1°.
  2. Model Documentation: Production of a model documentation paper is a high priority for the coming months. This will be achieved by:
    1. Writing a larger technical documentation report (https://github.com/OceansAus/ACCESS-OM2-1-025-010deg-report) that will be stripped down to feed into a paper; and
    2. Inviting community evaluation of existing model output.
  3. Model evaluation and analysis: We propose the COSIMA Cookbook as a framework for users to contribute model analyses. In particular, we encourage observational comparisons with existing model output, and also encourage users to submit bug reports and feature requests via https://github.com/OceansAus/cosima-cookbook/issues
  4. WOMBAT: In the coming months we will look to implement the WOMBAT biogeochemistry model (already running in MOM5) into the ACCESS-OM2 framework.
  5. Capability gaps: The COSIMA community has been able to leverage expertise from a number of different programs. However, our community as a whole remains subcritical in several areas, including sea ice modelling and atmospheric dynamics.
  6. REDB: Nic Hannah proposed a new system for tracking simulations and the output data. This system was identified by many discussion groups as a potential solution to some of our collaboration roadblocks. We will investigate the viability of such a system.
  7. MOM6: Plan is to begin transition to MOM6, building up experience in the latter half of 2018.

Recommendations for COSIMA 2019 workshop

  • Institute a James Munroe award for contributions to COSIMA
  • Extend to a 2.5-day workshop to allow more time for discussion (not extra talks)

Technical Working Group Meeting, June 2018

Minutes

Date: 12th June 2018
Attendees:

  • Marshall Ward (MW) (Chair), NCI
  • Aidan Heerdegen (AH) and Andrew Kiss (AK), CLEX ANU
  • Russ Fiedler (RF), Matt Chamberlain (MC), CSIRO Hobart
  • Peter Dobrohotoff (PD), CSIRO Aspendale
  • Justin Freeman (JF) BoM Melbourne
  • Nic Hannah (NH) Double Precision

TWG Meeting

JF:  Would be able to attend more regularly if there was a calendar invite which would enable him to schedule the meeting. How do we integrate calendars for Justin

COSIMA Models

AK: Bathymetry error in tenth model in Cumberland Sound, Baffin Island. Causes model blow ups.
RF: Yes blast it out. Russ will do it today. AH: Do we need any changes to restart/input files? Russ: if below zero  for eta_t, might have to set to zero. Otherwise will complain about penetrating rock.
AK: tenth very unstable over the weekend.
MW: longjmp error means the backtrace is failing. Memory go so severely corrupted that can’t properly debug.
“nearest_index array must be monotonically increasing error”
AK: Sweep and resubmit and works.
AK: More errors since turned on diagnostics for Adele. RF: are these globals? MW: could be FMS bugs because MPI is being strained and things are out of order.
AK: daily outputs in regional area: temp, salt, uhrho_et, vhrho_nt, rho_dzt. RF: spewing output from a lot of processors as regional outputs do not use io_layout, so every affected processor outputting data. AK: only doing for 2 years and then turn it off. It has slowed it down. Become erratic in timing. RF: Some processors not outputting the field, not sure why it should make it unstable.
AK: Put up as an issue.
AH: What is the current model config for tenth, and performance? AK: 4.5K on MOM, 2K on cice. Runs with 450s timestep, 1.5 hr/mo. Now running at 400s. Crash in Baffin Island goes away with shorter timestep.
AH: Try and get tenth running faster. Ice no longer holding back timstep. AK: Was running 540s before Baffin island issues.
MW: netCDF4 v4.4 has FPE turned on. Built by a different person. Historically always had FPE disabled. AK: 4.2.1.1 in MOM. 4.3.2 in CICE. 4.4.1 in matm. OASIS has default. AK: waiting for yatm build to be signed off. Ben M suggested we should be using openmpi/1.10.7 (optionally with debug). Number of bugs fixed between 1.10.2 and 1.10.7.
AK: Want to try out orange layout with CICE. Currently 2000 cores with no landmasking and 1 block / processor. Could be run a lot cheaper. Currently MOM bound. Should be to run well below 2000 cores. Waiting for yatm to be sorted out. Trying some frankenstein builds and back porting to matm.
AK: Timing is very inconsistent. RF: Ocean eta and plot diagnose has a collective. Does a sum. Somewhere it has hung. All depend on this function. MW: Could be load imbalance in CICE.
MW: MPI_Comm_split hangs or fails intermittently.
AK: No stock runs since looking at runtime. MW: thinks his profiling was wrong because of lack of ice. AK: looking at the load imbalance there is ice. MW: ran from rest for 10 days. CICE would normally do work that wasn’t captured. MW: tried to redo profiles and all runs stopped working. Shocking.
MW: moved on to yatm. Putting scorep into yatm had issues, so not redone the profiles with realistic ice.
AK: will spin off run with no diagnostics as point of comparison. MW: at dt=300s, 100s/day seemed reproducible. Andrew’s 50% slower. Maybe more stuff happening. One of two issues that need to be resolved. CICE bound results different, second is MOM slowdown. Matching MOM-SIS important goal.
MC: how much longer running spinup? When switch to IAF? AK: will switch to IAF ASAP. Andy is running RYF @ quarter. Then Paul Spence will run IAF quarter. Currently 34 years of spin-up with 84/85 repeat year. MC: Will start from year zero? AK: there are biases in RYF, so not sure if we should spin off from this run. Might depend on how many years we have to get done.
MC: will there be multiple cycles of IAF? AK: depends. MC: start at WOA or from RYF spin up.
AH: For the model documentation paper there will be the standard 5 x IAF (JRA55) protocol for 1 degree and 0.25 degree. The MOM meeting discussed strategy for 0.1 degree. Andy Hogg thought the tenth was just too expensive to run this protocol and might have to run only one cycle of IAF, or maybe spin up with RYF and then run IAF from 85 onwards. Whatever was done would be repeated in a second quarter degree run to provide a point of comparison between the different resolutions.
RF: interested from 93 once the satellites go up.
JF: wanted to get up to speed. Looks over minutes when they come out, very useful. Mirko has been doing some runs. Will try and join in regularly. BoM will take up ACCESS-OM2 when up to speed. Will be OceanMaps version, used for forecasting.
AH: Andy running KDS50 for 0.25 deg for RYF spin up. Found KDS75 too unstable.
JF: Mirko is testing COSIMA models in back end. Mirko getting up to speed what we’ve done. Need the 75 level (COSIMA) grid. Will do some hindcast runs and compare with OceanMaps. Don’t have experience with sea ice model . Don’t know how it will affect forecasting. Need to look at the ice parameterisation. Also need to look at data assimilation. Will talk to Russ and Matt. At some point will be able to contribute back, will work from GitHub repo, using same codebase.
AK: run parameters and namelists on git repo are a long way out of date. JF: can we make sure these are updated. AK: Still in a state of flux. Still bedding down YATM configurations. Will do best.

ACCESS OM2/CM2 Code Harmonisation

AH: What is the other significant code difference in CM2 that Russ wanted to reimplement? RF: wave mixing scheme. Gets added into KPP. Comes via CVMix package. Two ways to implement. 1. 10m winds to come in via sbc. 2. Can empirically calculate them in MOM6. Russ has implemented this scheme under CM2.5 framework. Run for a while. Had to put in a limiter because it caused too much mixing. Dave reckoned it didn’t make difference. Haven’t looked at the most recent results. Running with CM2.5 coupled model.
RF: Also another scheme Russ wants to implement. Slightly different to ACCESS-CM2. Both schemes already in MOM6. One of them is in CVMix. That is what Dave has implemented in MOM5. Taken routine out of CVMix and plopped it into KPP module to give enhanced mixing. Also need 10m wind information to come in. Need changes in surface flux code. Russ has done this. Russ has implemented same thing, just change in the way winds get through. Not sure why ACCESS-CM2 didn’t see difference.
RF: Occasionally get massive mixing coefficients in KPP so put in a limiter.
RF: will put code changes into master branch. AH: when you have done this I can pull into CM2 and can test. RF: Griffies wants it in MOM5.
PD: followed along in slack channel. Not sure about all technical details. Big difference after 10 days between harmonised code and CM2 codebase. Has this been solved? How far along are we with this? Spinups will not have harmonised code if we don’t have a frozen version soon. ESM and CM2 groups want to know how close we are. We haven’t helped much to this point. How can I contribute.
PD: copied suite. Ran it. Thought was tracking down bug. PD: couldn’t find preprocessed source files. MW: do we run cpp? I get the right source code lines and don’t see .f90 files. No we don’t … which is why Peter couldn’t find them.
RF: why was red sea fix timing different? CM code has a fix? AH: might be because my fix uses relative time, not absolute model time. RF: timing fix should have absolute origin. AH: I’ll check.
AH: I don’t think there is that much more to go for the harmonisation
PD: when can I run harmonised MOM?
RF: when I can find some time to put in there. Now we have a way forward. Hopefully in a week or two.
PD: will put runs on ASAP. If harmonised code not ready, won’t be in spinups.
AH: will lease with Peter and tell him as soon as something is ready.
MW: if there are differences what do  they use? AH: they will use the MOM5 repo as far as I know.

Actions

New:

  • Edit tenth bathymetry to remove Cumberland Sound (RF)
  • Create calendar invites to TWG Meeting (AH)
  • Update model name list and other configurations on OceansAus repo (AK)
  • Check red sea fix timing is absolute, not relative (AH)

Existing:

  • Shared google doc on reproducibility strategy (AH)
  • Follow up with Andy Hogg regarding shared codebase (MW)
  • MW liase with AK about tenth model hangs (AK, MW)
  • Pull request for WOMBAT changes into MOM5 repo (MC, MW)
  • Compare out OASIS/CICE coupling code in ACCESS-CM2 and ACCESS-OM2 (RF)
  • After FMS moved to submodule, incorporate MPI-IO changes into FMS (MW)
  • Incorporate WOMBAT into CM2.5 decadal prediction codebase and publish to Github (RF)
  • Profile ACCESS-OM2-01 (MW)
  • Move FMS to submodule of MOM5 github repo (MW)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (MW and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (NH)
  • Look into OpenDAP/THREDDS for use with MOM on raijin (AH, NH)
  • Add RF ocean bathymetry code to OceansAus repo (RF)
  • Add MPI barrier before ice halo updates timer to check if slow timing issues are just ice load imbalances that appear as longer times due to synchronisation (NH).
  • Nudging code test case (RF)
  • Redo SSS restoring with patch smoothing (AH)
  • Get Ben/Andy to endorse provision of MAS to CoE (no-one assigned)
  • CICE and MATM need to output namelists for metadata crawling (AK)

COSIMA 2018 Workshop Details

The 2018 COSIMA workshop will be held on 7 & 8 May at the Australian Centre for China in the World on ANU Campus.

You can download the latest draft of the COSIMA Workshop Program (updated 3rd May). The program includes instructions for uploading your talk, guidelines on how to contribute to our discussion and some preparatory homework for anyone attending the COSIMA Cookbook Tutorial.

Please contact Andy Hogg or Andrew Kiss if you have any queries.

Sea-ice working group: Notes 20180228

MEETING NOTES COSIMA SEA-ICE WORKING GROUP MEETING:

28 Feb 2018, 10:00 - 11:30am
Meeting held on Zoom.

Present: Andrew Kiss, Adele Morrison, (anyone else at ANU?), Andrew Hogg, Siobhan O'Farrell, Simon Marland,
         Fabio Diaz, xxx (1 other PhD student at CSIRO Aspendale), Will Hobbs, Petra Heil and from 11am Paul Spence.
Apologies: Nicolas Hannah, Matt E.

=======================================================
1) Recap of the AMOS presentation on COSIMA results so far.
ACCESS-OM2 suite (Andrew Kiss)
Based on MOM5, CICE5, OASIS3, Repeat year atm forcing.
Tripolar grid 1/10^o, 75 vertical resolution
Slow to run: 17hours/yr; 5559 PEs; dt = 450s   --> eddy resolving
Tripolar grid 1^o, 75 vertical resolution
              6mins/yr;  252 PEs; dt = 5400s   
------
Previous issue: 
Negative solution: Russ Fiedler fixed
         If salinity locally is less than 5 PSU, then salinity comes from the 
         surrounding.
------
Sea-ice thickness (1month average): Shows neat DKPs in Arctic.
=======================================================
2) Simon M: 
Fabio is working: Restoring 1^oC: NCEP/NCAR reanal rather than core runoff.
=======================================================
3) ACCESS-OM2 CICE config

Current CICE config in the coupled ocean/sea-ice mode:
4 ice layers plus snow
5 thickness categories
mushy ice TD
classic EVP Dynamics
melt poinds 
JRA55 - do V1.3 1984/85 repeat years forcing, 0.5625^o, 3 hourly
SSS restoring to WOA 2013 V2
1200 PUs for CICE + 4358PUS for MOM + 1 for Matm
--> No polynyas in 1/10^o even though JRA has good katabatics (Adele Land)
     --> look at LHF + net ice production
---
Ice volume in 1/10^o in Arctcic is too high: in access of 30Mio km^3
  Piomass comparison
--- 
Issues: 
* spin up
* compatible physical parameters in Mom & CICE   <-- not working
- TC using mushy ice TD but Bitz & Lipscome is xxx (? cannot read my notes)
- EVP or EAP? --> EVP seems to do ok producing DKPs  --> Use revised EVP.
=======================================================
4) What to use for validation? 
Simon M: Tamura's polynya ice prodution.
=======================================================
5) IcePack has been released (Petra)
Can it be ported to NCI?
=======================================================
6) Timing of COSIMA configs? 
Nick H. is still using/working on MOM5.
Hence MOM6 work is delayed. 
 But Los Alamos are doing MOM (?? which version) with CICE6.
=======================================================
7) Next steps: 
Start MOM6 with SIS2 for ease!!!
 1) Timing of advance and retreat   --> Siobhan OF: That is controlled by JRA forcing.
 2) Ice motion?  AVHRR data to compare  <-- Follow this one up.
=======================================================
8) Various:
* John Spence: 
Compare namelists with those used by others.
PH: Check with what the Arctic high-res folks use. 

* SOF: European updata: LIM2/3, Gelato (CERFACS and UKMet).
  --> Access: Where to couple ?  <-- new UKMet model: 2019 to do runs.
                                     --> C-grid (NEMO?)
=======================================================
=======================================================
Next meeting: 
At COSIMA workshop, Canberra, 08/09 May 2018
=======================================================

ACCESS-OM2 Update

Over the last few months, COSIMA folks have been working hard on releasing our ACCESS-OM2 suite of models. The current status is that we have now completed a 500 year spinup for 3 different cases using the JRA55-do (Tsujino et al., 2017, personal communication) forcing dataset. Some preliminary results can be seen in the figures below. We are also spinning up a CORE-NYF comparison case. For a more complete analysis have a look in the COSIMA Cookbook.

Plans in the coming weeks are to finalise spinups of our 0.25° case (ACCESS-OM2-025), and to begin running our flagship 0.1° simulation, ACCESS-OM2-01.

 

Technical Working Group Meeting, November 2017

Minutes

Date: 14th November 2017
Attendees:

  • Aidan Heerdegen (Chair), Andrew Kiss (ARCCSS ANU)
  • Fanghua Wu (National Climate Center, China Meteorological Administration, Visitor ANU)
  • James Munroe (Memorial University of Newfoundland, Visitor ANU)
  • Nicholas Hannah, Anthony (Double Precision)
  • Russ Fiedler and Matt Chamberlain (CSIRO Hobart)

COSIMA Models

  • Discussion around publicising 1/10th model spin up, in case interested parties would like diagnostics saved.
  • Bluelink are interested in full JRA55-do IAF style spin up, and would want 15-20 years of daily full 3D U,V,T,S and eta fields from that. What is required to construct ensembles/climatologies.
  • Nic looking into ACCESS-OM2-01 performance issues. Lots of time in ice coupling field halo updates. In serial so holding up ocean when it does this. Definite target for optimisation. Should use OASIS to fill the halos when it does the coupling step? Russ disagrees. OASIS shouldn’t know anything about what goes on in models. Gridding using block trains, a 1:1 mapping between grids. If you do this have a 1:many mapping. No longer have identical grids when put in halo information, might break optimisations. When Russ looked at 1/4 deg, hold up was due to synchro just before that. Not sure about 1/10th. Want a barrier just before calling clock before halo update. See if synchronisation issue, or actual time take with halo distribution. 5 halo distributions being done. Heaps more done in CICE itself. Nic: land imbalance between ice processors? Russ: yes my hunch. Load imbalances change a lot with resolution and processor layout. Nic: a problem doing halo updates without considering where field is used. Russ: agree. Velocities need updating, not sure about tracers.
  • Fanghua has been running the new tenth bathymetry with the MOM-SIS-KDS75 config. With JRA55 RYF forcing time step now 450s (from 150s initially). Runoff data now a problem with very low salinities in the arctic at about 7m depth, even with 150s timestep. Created new runoff data, spread more into the ocean but still have issues. Russ saw very high salinities in the Arctic (Laptev Sea). Might be brine rejection from forming sea ice from ice free start. Suggests decreasing salinity restoring timescale from current 60 days to 10 days or even 1 day, to get the model over the initialisation. Andrew suggested issue could be resolved with initial sea ice climatology. There were issues with these files and not been used for a long time. Recent poster to mom users google group has identified some of the problems.
  • Nic’s online runoff redistribution may help, as it is possible to specify maximum runoff per cell, which can help in these areas with very large runoff. Would require ACCESS-OM2-01 config.
  • Nic currently working on getting ACCESS-OM2-01 working with Russ’ new bathymetry. Had a couple of attempts. Getting close, various technical glitches with masks and so on.
  • Andy Hogg has MATM issue when running ACCESS-OM2-1deg for more than 4 years at a time. There is an error on netCDF open call, which comes from HDF layer. Nic ran valgrind, found a bunch of errors, and so recommends everyone update their MATM, but this did not fix the 5 year issue. Determined this was not a memory errors, but an HDF library error. Russ suggested using some HDF library calls to try and determine why the crash occurred. Also try different versions of the netcdf library.
  • Nic suggested we could change MATM to make few file open calls. Aidan has a new payu feature that allows multiple runs per PBS submission, so decided not a priority as MATM needs complete rewrite.
  • Regridding. Nic: need to choose which interpolation schemes to use for which fields. 2nd order cons for everything? Russ: Velocity should not be conservative. Momentum is not conserved. Patch for velocities, T and S. Will give smooth flux fields. Nic: 2nd order cons will be very smooth. Russ: do whatever is cheaper for T, S. U,V should be as smooth as possible. Patch should be 1st order cons, possible 1nd order.  AK: 1st order cons is piecewise constant (bad for wind stress curl). 2nd order is piecewise linear? So similar to bilinear. Need to go to patch for smoother. Russ: tried 2nd order cons, see problems at corners, nodes and edges with wind stress curl. Coarse to fine get artefacts. Patch should work. AK: half of the fields are fluxes. Those should be conservative (2nd order ideally). The remaining are not fluxes, don’t see strong argument for conservative. Is there an issue with different interpolation schemes from different fields? Will bulk formula at fine scale be an issue? Russ: will get jumps in some of the calculated fields. Quantities like T, S should be done with patch, end up with smooth fluxes. AK: Surface stress bulk formula, does it take atmosphere stability into account? Any drag coefficient? Russ: it does. Looks at a profile, figure out a profile. AK: Use SST and 10m T to determine stability? MC: Yes. Say warm atmosphere sitting over cold surface, that’s stable so air would slide over. Daytime, warm surface, near neutral stability so not so sensitive. Possible for temperature and humidity to have small effect on drag coefficient. AK: If we use different interpolation method for 10m winds/T, will it cause issues? Russ: Small jump in sensible heat maybe? Just go with patch or bilinear for all scalar quantities. Velocity go for patch. How will it take into account rotation in tripolar? Presume it is handled  well? AK: only an issue with velocity. Checked with current forcing fields and was ok. Will check new fields the same way.
  • AK: Final decision:
    • patch (the smoothest available) for u_10, v_10
    • 2nd order conservative for fluxes (rain, rdls, rsds, runoff_all and snow)
    • patch or bilinear for non-flux scalars (q_10, slp, t_10) suggest trying patch and only using bilinear if performance with patch is bad
  • Nic: what does MOM-SIS do? Aidan: Thought Steve said bicubic, used to use bilinear but wasn’t smooth enough. Smoother the better.
  • AK: Should WOA salinity restoring fields be smooth in the same way? Nic: What do we currently do? Nic: bilinear? Aidan did it. Russ: not a big issue if salinity restoring not too strong.

Task follow ups

  • Should be using GFDL FMS code directly. Would work better to collaborate with GFDL. Use same code, submit bug reports easily.
  • Once we have FMS as submodule, use all pre/post processing code from GFDL. Make MOM5 leaner, easier to keep updated. Russ: what is the latest FMS version? Aidan: don’t know, and it is hard to tell. Russ: noticed there are new features, like new diagnostic output options, e.g. RMS on the fly, statistics. So things like diag_manager has been updated. Could be some other powerful tools.
  • Aidan: Currently huge step to upgrade. Small step, but could be really good. Not sure how Marshall did it, but not simple.
  • Nic has updated the access-om2 repo structure. Every single test case/experiment is in it’s own repo. Makes it easier for users to grab config without worrying about other configs and source code. OceansAus now has more experiment repos. Aidan: Andy has an issue with git clashes with multiple runs in a single repo. This will fix this.
  • Blog posts?

Actions

New:

  • Will have a December meeting. Tue 12th.
  • Determine if COSIMA intend to do IAF JRA55 spinup of tenth model (Aidan)
  • Send link to spinup diagnostics spreadsheet to Russ (Andrew Kiss)
  • Nic add MPI barrier before ice halo updates timer to check if slow timing issues are just ice load imbalances that appear as longer times due to synchronisation.
  • Test Andy’s 5 year config with different netcdf library versions to check MATM error is not a just a library issue (Aidan)
  • Check current sea surface salinity restoring smoothing (Aidan)

Existing:

  • Russ to add all his ocean bathymetry code to OceansAus repo.
  • Nic to help Peter get his MOM repo up to date with MOM5 master branch, and then merge changes
  • Look into OpenDAP/THREDDS for use with MOM on raijin (Aidan, Nic, Marshall)
  • Nic to present MATM code re-write proposal to TWG for feedback before sign-off. Will then be presented to Andy Hogg for approval.
  • Nic create a discussion document (on COSIMA?) to document current approaches and strategies for future
  • Move FMS to submodule of MOM5 github repo (Marshall). Liase with Nic on implementation?
  • Work up test cases to cover the nudging code (Justin, Mirko) and supply them to Nic.
  • Add new test cases to Jenkins test suite (Nic).
  • Start a new google doc about coupler issues and MATM (Marshall)
  • Ask Dale Roberts about effects of OpenMP for Roger (Marshall)
  • Make a proper plan for model release — discuss at COSIMA meeting. Ask students/researchers what they need to get started with a model (Marshall and TWG)
  • Blog post around issues with high core count jobs and mxm mtl (Nic)
  • Create document outlining options for configuration sharing (?)