Data available: 0.1° 1958-2018 ACCESS-OM2 IAF run

Announcement:

25TB of model output data from a 1958-2018 spinup run with COSIMA’s ACCESS-OM2-01 0.1-degree global coupled ocean – sea ice model is now available for anyone to use (see conditions below) on this path at NCI:

/g/data/ik11/outputs/access-om2-01/01deg_jra55v140_iaf

You will need to be a member of the ik11 group for access – apply at https://my.nci.org.au/mancini/project-search if needed.

The 01deg_jra55v140_iaf spinup was run under interannually-varying JRA55-do v1.4.0 forcing from 1 Jan 1958 to 31 Dec 2018, starting from rest with World Ocean Atlas 2013 v2 climatological temperature and salinity. The run configuration is based on that used for Kiss et al. (2020) but has many improvements which will be documented soon.

There are many outputs available for the entire run, with additional outputs also enabled for the later years (see below for details).
MOM5 ocean model outputs are saved under self-explanatory filenames in
/g/data/ik11/outputs/access-om2-01/01deg_jra55v140_iaf/output*/ocean/*.nc
and CICE5 outputs are in
/g/data/ik11/outputs/access-om2-01/01deg_jra55v140_iaf/output*/ice/OUTPUT/*.nc
(there are too many files to list with ls, so narrow it down, e.g. by including the year, e.g. *2000*.nc)
We recommend using the COSIMA Cookbook to access and analyse this data: https://github.com/COSIMA/cosima-cookbook.

In addition, from 1 Jan 1987 to 31 Dec 2018 we have daily-mean 3d temp, salt, u, v and wt data. However, this is currently stored at
/scratch/x77/aek156/access-om2/archive/01deg_jra55v140_iaf/output*/ocean/ocean-3d-*-1-daily*.nc
and not yet available on /g/data/ik11 (to access this you will need to be a member of project x77 – apply here). It amounts to 51TB and we are considering ways to reduce this storage requirement, for example by restricting the geographical or depth or time range, reducing numerical precision or vertical resolution, and/or averaging over longer time intervals. If you have an interest in this daily 3d data please let us know what form of data reduction is compatible with your needs.

Annual restarts (on 1 Jan each year) are also available at
/g/data/ik11/outputs/access-om2-01/01deg_jra55v140_iaf
for anyone who may wish to re-run a segment with different diagnostics or branch off a perturbation experiment.
Summary details of each submitted run are tabulated (and searchable) here and the model configuration used for the spinup is here.

Conditions of use:
We request that users of this or other COSIMA model code or output data:

    1. consider citing Kiss et al. (2020) [doi.org/10.5194/gmd-13-401-2020]
    2. include an acknowledgement such as the following:
      The authors thank the Consortium for Ocean-Sea Ice Modeling in Australia (COSIMA; www.cosima.org.au), for making the ACCESS-OM2 suite of models available at github.com/COSIMA/access-om2. Model runs were undertaken with the assistance of resources from the National Computational Infrastructure (NCI), which is supported by the Australian Government.
    3. let us know of any publications which use these models or data so we can add them to our list

Details of model outputs available at /g/data/ik11/outputs/access-om2-01/01deg_jra55v140_iaf
You may find this partial list of diagnostics useful for decoding the MOM diagnostic names.

  • 1 Jan 1958 to 31 Dec 2018
    • MOM ocean data
      • Daily mean 2d bottom_temp_max, bottom_temp, frazil_3d_int_z, mld, pme_river, sea_level_max, sea_level, sfc_hflux_coupler, sfc_hflux_from_runoff, sfc_hflux_pme, surface_salt, surface_temp_max, surface_temp, u_surf, v_surf, vorticity_z_surf
      • Monthly mean 3d age_global, buoyfreq2_wt, diff_cbt_t, dzt, pot_rho_0, pot_rho_2, pot_temp, salt, temp_xflux_adv, temp_yflux_adv, temp, tx_trans, ty_trans_nrho_submeso, ty_trans_rho, ty_trans_submeso, ty_trans, u, v, vert_pv, wt
      • Monthly mean 2d bmf_u, bmf_v, ekman_we, eta_nonbouss, evap_heat, evap, fprec_melt_heat, fprec, frazil_3d_int_z, lprec, lw_heat, melt, mh_flux, mld, net_sfc_heating, pbot_t, pme_net, pme_river, river, runoff, sea_level_sq, sea_level, sens_heat, sfc_hflux_coupler, sfc_hflux_from_runoff, sfc_hflux_pme, sfc_salt_flux_coupler, sfc_salt_flux_ice, sfc_salt_flux_restore, surface_salt, surface_temp, swflx, tau_x, tau_y, temp_int_rhodz, temp_xflux_adv_int_z, temp_xflux_ndiffuse_int_z, temp_yflux_adv_int_z, temp_yflux_ndiffuse_int_z, tx_trans_int_z, ty_trans_int_z, wfiform, wfimelt
      • Monthly mean squared 3d u, v
      • Monthly max 2d mld
      • Monthly min 2d surface_temp
      • Daily snapshot scalar eta_global, ke_tot, pe_tot, rhoave, salt_global_ave, salt_surface_ave, temp_global_ave, temp_surface_ave, total_net_sfc_heating, total_ocean_evap_heat, total_ocean_evap, total_ocean_fprec_melt_heat, total_ocean_fprec, total_ocean_heat, total_ocean_hflux_coupler, total_ocean_hflux_evap, total_ocean_hflux_prec, total_ocean_lprec, total_ocean_lw_heat, total_ocean_melt, total_ocean_mh_flux, total_ocean_pme_river, total_ocean_river_heat, total_ocean_river, total_ocean_runoff_heat, total_ocean_runoff, total_ocean_salt, total_ocean_sens_heat, total_ocean_sfc_salt_flux_coupler, total_ocean_swflx_vis, total_ocean_swflx
    • CICE sea ice data
      • Daily mean 2d aice, congel, dvidtd, dvidtt, frazil, frzmlt, hi, hs, snoice, uvel, vvel
      • Monthly mean 2d aice, alvl, ardg, congel, daidtd, daidtt, divu, dvidtd, dvidtt, flatn_ai, fmeltt_ai, frazil, frzmlt, fsalt, fsalt_ai, hi, hs, iage, opening, shear, snoice, strairx, strairy, strength, tsfc, uvel, vvel
  • 1 Jan 1987 to 31 Dec 2018 only
    • MOM ocean data
      • monthly mean 3d bih_fric_u, bih_fric_v, u_dot_grad_vert_pv
      • daily mean 3d salt, temp, u, v, wt (not yet available on ik11 – see above)
    • CICE sea ice data
      • daily mean 2d aicen, vicen
  • 1 Jan 2012 to 31 Dec 2018 only
    • MOM ocean data
      • monthly snapshot 2d sea_level
      • monthly snapshot 3d salt, temp, u, v, vert_pv and vorticity_z

cheers
Andrew

COSIMA Model Output Collection

An increasingly important aspect of model simulations is to be able to share our data. Over the last few years we have been working on methods to routinely publish our most important simulations. This publication process is designed to allow any users, worldwide, to be able to pick up our model output and test hypotheses against our results. It will also allow journal publications to be able to cite our model output.

Currently we have  5 different datasets within the headline COSIMA Model Output Collection, which can be found here:

 http://dx.doi.org/10.4225/41/5a2dc8543105a

For users with NCI access this data is housed under the cj50 project.

We are planning to add new datasets in the coming months.

Technical Working Group Meeting, March 2020

Minutes

Date: 18th March, 2020
Attendees:
  • Aidan Heerdegen (AH) CLEX ANU
  • Matt Chamberlain (MC) CSIRO Hobart
  • Rui Yang (RY), Paul Leopardi (PL) NCI
  • Nic Hannah (NH) Double Precision
  • Marshall Ward (MW) GFDL

Scalability of ACCESS-OM2 on gadi

(Paul’s report is attached at the end)

PL: Looking at scaling. Started with ACCESS-OM2, but went to testing MOM5 directly with MOM5-SIS. Using POM25, global 0.25 model with NYF forcing. The model MW developed for testing scaling prior to ACCESS-OM2. Had to add specify min_thickness in ocean_topog_nml.

PL: Tested the scaling of 960/1920/3840/7680/15360, with no masking. Scales well up to some point between 7680 and 15360.

PL: Tested effect of vectorising options (AVX2/AVX512/AVX512-REPRO). Found no difference in runtime with 15360 cores. MW: Probably communication bound at the CPU count. Repro did not change time.

MW: Never seen significant speed up from vectorisation. Typically only a few percent improvement. Code is RAM bound, so cannot provide enough data to make use of vectorisation. Still worth working toward a point where we can take advantage of vectoeisatio.

PL: Had one “slow” run outlier out of 20 runs. Ran 20% slower. Ran on different nodes to other jobs, not sure if that is significant. MW: IO can cause that. AH: Andy Hogg also had some slow jobs due to a bad node. AK: Job was 20x slower. Also RYF runs become consistently slower a few weeks ago. MW: OpenMPI can prepend timestamps in front of output, can help to identify issues.

PL: Getting some segfaults in ompi_request_wait_completion, caused by pmpi_wait and pmpi_bcast. Both called from the coupler. NH: Could be a bad bit of memory in the buffer, and if it tries to copy it can segfault. PL: Thinking to run again using valgrind, but would require compiling own version of valgrind wrapper for OpenMPI 4.0.2. Would be easier to Intel MPI, but no-one else has use this. Saw some cases similar when searching which were associated with UCX, but sufficiently different to not be sure. These issues are with highest core count. MW: Often see a lot of problems at high core counts. NH: Finding bugs can be a never ending bug. Use time wisely to fix bugs that affect people. MW: Quarter degree at 15K cores would have very small tile sizes. Could be the source of the issue. AH: This is not a configuration that we would use, so it is not worth spending time chasing bugs.

PL: Next testing target is 0.1 degree, but not sure which configuration and forcing data to use. Will not use MOM5-SIS, but will use ACCESS-OM2 for direct comparison purposes. AK: Configurations used in the model description paper have not been ported to gadi. Moving on to a new iteration. Andy Hogg is running a configuration that is quite similar, but moving to new configurations with updated software and forcing. Those are not quite ready.

PL: Need a starting configuration for testing. Want to confine to scalability testing and compiler flags. NH: ACCESS-OM2 is setup to be well balanced for particular configurations. Can’t just double CPUs on all models as load imbalance between submodels will dominate any other performance changes. Makes it a problematic config for clean configurations for things like compiler flags. MW: Useful approach was to check scalability of sub-model components independently. Required careful definition of timers to strategically ignore coupling time. MOM was easy, CICE was more difficult, but work with Nic’s timers helped a lot. Try to time the bits of code that are doing computation and separate from code that waits on other parts. Coupled model is a real challenge to test. Figure out what timers we used and trust those. Can reverse engineer from my old scripts.

PL: Should do MOM-SIS scalability work? MW: Easier task, and some lessons can be learned, but runtime will not match between MOM-SIS and ACCESS-OM2. Would be more of a practice run. PL: Maybe getting out of scope. Would need 0.1 MOM-SIS config. RY: Yes we have that one. If PL wanted to run ACCESS-OM2-01 is there a configuration available? AK: Andy Hogg’s currently running configuration would work. PL: Next quarter need to free up time to do other things.

MW: Might be valuable to get some score-p or similar numbers on current production model. Useful to have a record of those timings to share. Scaling test might be too much, but a profile/timing test is more tractable. RY: Any issues with score-p? Overhead? MW: Typical, 10-20%, so skews numbers but get in-depth view. Can do it one sub-model at a time. Had to hack a lot scripts, and get NH to rewrite some code to get it to work. score-p is always done at compile time. Doesn’t affect payu. Try building MOM-SIS with score-p, then try MOM within ACCESS-OM2. Then move on to CICE and maybe libaccessom2. PL: Build script does include some score-p hooks. MW: Even without score-p MOM has very good internal timers. Not getting per-rank times. score-p is great for measuring load imbalance. AH: payu has a repeat option, which repeats the same time, which removes variability due to forcing. Need to think about what time you want to repeat as far as season. AK: CICE has idealised initial ice, evolves rapidly. MW: My earlier profile runs had no ice, which affects performance. MW: Not sure it is huge, maybe 10-20%, but not huge.

MW: Overall surprised at lack of any speed up with vectorisation, and lack of slow-down with repro. PL: Will verify those numbers with 960 core config.

AH: Surprised how well it scaled. Did it scale that well on raijin? MW: The performance scaling elbow did show up lower. AH: 3x more processors per node has an effect? MW: Yes, big part of it. AH: 0.1 scaled well on raijin, so should scale better on gadi. 1/30th should scale well. Only bottleneck will be if the library can handle that many ranks.

NH: If repro flags don’t change performance that is interesting. Seem to regularly have a “what trade off does repro flags have?”, would be good to avoid. MW: Probably best to have an automated pipeline calculating these numbers. NH: People have an issue with fp0 flag. MW: Shouldn’t affect performance. NH: Make sure fp0 is in there. MW: Agree 100%.

ACCESS-OM2 update

AH: Do we have a gadi compatible master branch on gadi? AK: No, not currently. NH: At a previous TWG meeting I self-assigned getting master gadi compatible. Merged all gadi-transition branches and tested, seemed to be working ok. Subsequent meeting AK said there were other changes required, so stopped at that point. gadi-transition branches still exist, but much has already been merged and tested on a couple of configurations. Have since moved to working on other things.

NH: Close if AK has all the things he wants into gadi-transition branch. Previous merge didn’t include all the things AK wanted in there. Happy to spend more time on that after finishing JRA55 v1.4 stuff.

JRA55-do v1.4 update

NH: Made code changes in all the models, but have not checked existing experiments are unchanged with modified code.

NH: v1.4 has a new coupling field, ice calving. Passing this through to CICE as a separate field. In CICE split into two fields, liquid water flux and a heat flux. MOM in ACCESS-CM2 already handles both these fields. Just had to change preprocessor flags to make it work for ACCESS-OM2 as well.

NH: Lots of options. Possible to join liquid and solid ice at atmosphere and becomes the same as we have now. Can join in CICE and have a water flux but not a heat flux.

Strange MOM6 error

AH: A quick update with Navid’s error. Made a little mpi4python script to run before payu to check status of nodes, and all but root node had a stale version of the work directory. Like it hadn’t been archived. Link to executable was gone, but everything else was there. Reported to NCI, Ben Menadue does not know why this is happening. Also tried a delay option between runs and this helped somewhat, but also had some strange comms errors trying to connect to exec nodes. Will next try turning off all input/output can find in case it is a file lock error. Have been told Lustre cannot be in this state.

MW: In old driver do a lot of moving directories from work to archive, and then relabelling. Is it still moving directories around to archive them? Maybe replace with hard copy of directory to archive. MOM6 driver is the MOM5 driver, so maybe all old drivers are doing this. Definitely worth understanding, but a quick fix to copy rather than move.

NH: Filesystem and symbolic links might be an issue MW: Maybe symbolic links are an issue on these mounted filesystems. AH: There was a suggestion it might be because it was running on home which is NFS mounted, but that wasn’t the problem. MW: Often with raijin you just got the same nodes back when you resubmit, so maybe some sort of smart caching.

 

Scalability of ACCESS-OM2 on Gadi – Paul Leopardi 18 March 2020

 

 

Technical Working Group Meeting, February 2020

Minutes

Date: 27th February, 2020
Attendees:
  • Aidan Heerdegen (AH) CLEX ANU, Angus Gibson (AG) ANU
  • Russ Fiedler (RF), Matt Chamberlain (MC) CSIRO Hobart
  • Rui Yang (RY), Paul Leopardi (PL) NCI
  • Nic Hannah (NH) Double Precision
  • Marshall Ward (MW) GFDL

New installed payu version

Version 1.0.7 is now installed in conda/analysis3-20.01 (analysis3-unstable

AH: payu is now 100% gadi compatible. Default cpus/node is now 48 and memory 192GB/node. Python interpreter, short path and manifests are scanned to automatically determined from model config and manifests. Using qsub_flags to manually specify storage flags no longer works, as automatically determined storage flag option is appended and the manually specified one no longer works.

RF: Paul Sandery having issues getting 0.1 deg model working. [AH: turns out it was a typo in config.yam]

AH: No need for the number of cpus in a payu job to be divisible by the number of CPUS in a node. Request however many the job uses, and payu will pad the request to make sure the PBS submission is requesting an integer number of nodes if ncpus is greater than the number in a single node. PL: Rounds up for each model? AH: No, just the total. MW: Will spread models across ranks, so a rank can have different models on it.

AH: Andy Hogg ran out 80 odd submits with the tenth model. Occasional hang, resubmit ok. Might be more stable than raijin.

AH: Navid has MOM6 model that cannot run more than a couple of submits without it crashing with an error that it cannot find the executable. Weird error, let me know if you see anything similar.

NH: Caution with disks and where to put things. Reading input files can be very slow sometimes, or not, and then files not there and turn up later. If executable is missing, running off a disk that is not good? MW: Filesystems are very complicated on gadi? NH: Less certainty of performance with such a different system with data file systems being mounted separately. I’d look at this.
PD: Good place to look if disk has got caught up doing too many tasks. gdata just hangs, saving text file takes a while. Due to being on login node? Get similar delays with interactive job on execute node.
AH: People reporting issues with login delays. Probably a disk issue? Navid’s job is not being run from gdata, but from scratch. Inclined to blame new system of mounting. Could we use jobfs. MW: Like in the old days when we ran on the node? Good luck! AH: Could just do some tests. NH: Concerning if scratch is slow.
AH: Not sure if filesystems are mounted with NFS. MW: That is what we do on gaia, and have tons of problems with mount on demand. Biggest frustration with using GFDL machine. It’s a nightmare. At least NCI have lustre know-how. AH: Used to have a lot of problems with NFS cache errors in the past, files disappearing and reappearing. Does sound similar to Navid’s problem.
MW: Raijin’s filesystem was quite good. Why the change? AH: Security. Commercial in confidence stuff. I think it is overblown. Can’t seen anyone else’s jobs on the queue. Can’t even check it other people are running on the project. Are moving to 2-factor auth also.

What is required to get gadi transition into master for ACCESS-OM2

AH: Andrew Kiss is on personal leave but sent around an email:
re. gadi-transition, we could proceed like so:
– we’ve also been transitioning libaccessom2 to use submodules for its dependencies instead of cmake https://github.com/COSIMA/libaccessom2/issues/29 which would require this commit https://github.com/COSIMA/libaccessom2/tree/53a86efcd01672c655c93f2d68e9f187668159de (not currently in gadi-transition branch)
– get the libaccessom2 tests working https://github.com/COSIMA/libaccessom2/issues/36
– there’s a gadi-transition branch libaccessom2, cice and mom that could be merged into master. They use openMPI4.0.2
– there’s also a gadi-transition branch for all the primary (ie JRA, non-minimal) configurations but the exe paths would need to be updated before merging to master
– the access-om2 gadi-transition branch would then need to be updated to use the correct submodules for model components and configurations. We also want to remove the core and minimal config submodules https://github.com/COSIMA/access-om2/issues/183
also fyi the current gadi build instructions are here
AH: Feels urgent that people can use on gadi. Any comments on Andrew’s email?
PL: Transition to submodules finished? AH: That is on a separate branch. NH: I did that work. Put it in a dev branch. Not intending to be part of gadi transition to have least number additions. AH: Agree if that is the easiest. Master is broken for gadi, so anything that works is an improvement. If there is no feedback can do this offline. Could make a project to be explicit about what is required. NH: Given that gadi-transition does work. Andrew and Andy use it. Wouldn’t hurt to put it in now. Work that PL has done to make sure it does reproduce ticks that box. So ready to go. Able to reproduce if we need to. I’ll merge it and do some interactive testing. Then people can use it and I can do automatic testing.
PL: What branch will it be merged into? A lot of branches in a lot of repos.
NH: Isolate gadi-transition branches and merge into master straight away. Not bother with other development branches at this stage. Want to get something in master that people can use. In future bring everything into dev as discussed, with master staying stable, just bug fixes, until decide to update from dev. I’ll go through the branches and just bring in the gadi transition stuff. PL: So dev will have submodule changes and master will not? NH: For the time being. With previous discussion we’ll be slower moving on master, to make sure it is working. Having dev will allow us to move that more rapidly. People can run off dev at their own risk. AH: Submodules will remain a named feature branch and pulled into dev at some future time. Should discourage having personal development branches on the main repo. If you want to experiment do it on your own fork. Branches on the main repo should be master, dev or named feature to keep it clean and everyone can understand what they mean.

Stack array errors and heap-array option

AH: Apologies minutes from last TWG meeting are not on the COSIMA website. There is an IT issue with the server. We wanted to follow up with stack array errors.
AH: Did ever test on raijin with same compiler? Is there any way we can do comparative test? Use raijin image? Any more from Dale about this stack stuff? PL: Haven’t heard anything. AH: Last meeting some mention of there being a limit on UM stacksize. RY: Already fixed Ilia’s issue. Fixed by making stacksize unlimited. RF: Always run with unlimited stack size. When had problem only fixed by setting heap arrays small or zero. When I went into code and made array allocation from automatic to allocatable the error went away.
MW: If I have an automatic array I get three different heap allocations for three different compilers. RF: This option forces all arrays on to the heap.
AH: This was fixed a while ago Rui? RY: Not clear this is the same problem. Ilia’s issue was the end of 2019 when gadi first on line. Not sure it is the same issue.

BGC Update

AH: Russ forwarded an update to Andy Hogg.
RF: Work was completed on raijin in 2019. BGC code in to MOM and CICE. Required changes in CICE: moving arrays around to different modules due to scope issues which allow optional fields to be sent. Main one is to send 10m winds to ocean, not just the wind stress. Holding off to issue PR until gadi transition done so could go in clearly.
NH: Will be useful for JRA1.4 work.
RF: Hakase will be using it for BGC. Passing algae between ice and ocean components. To add new field, need to add field to code, but don’t have to be passed. Just picked up from namcouple using the flags in OASIS to see if it’s registered.
AH: Can this be the next cab off the rank after gadi-transition, before AKs science tweaks. Not relying on any changes in Andrews branches? RF: Would like to get gadi transition out of the way and then test these changes. Not tested on gadi yet.
How to proceed? Testing?
I’ve held off issuing a pull request until the dust settles wrt the gadi transition. There’s a bit of code rearrangement in order to allow optional fields (10m wind speed but this can be extended) to be passed from CICE.
The flags ACCESS-OM-BGC (tested) and ACCESS-ESM (untested) enable compilation of the BGC code. The 10m winds need to be added to the namcouple files and the MOM coupling fields namelist.
Work done on raijin last year. Changes in CICE to move arrays around in modules due to scope issues. Main one is to send 10m winds to ocean. No just wind stress. Holding off until gadi-transition done.
NH: Useful for stuff I’m doing with JRAv1.4.
RF: Hakase will use for BGC, passing algae between ice and ocean components. Have to change code to add fields. Don’t need to hard code as much. Once field in there optional to pass. Using the OASIS flags to see if registered.

JRA55-do counter-rotating cyclones

RF: Fortunately Paul Sandrey’s started in 1988. Last reverse cyclone in 1987. Cafe 60 use whole month window, so washed out on the average.
One of the RYF runs has reverse cyclone (83-84). Tell Kial.

Scaling

PL: Thanks to Marshall for getting me up to speed on scaling tests and sharing scripts. Can reproduce diagrams so can compare between raijin and gadi.
 AH: Any more performances numbers? PL: Now in a position to answer questions, just need to know what questions to ask.
AH: ACCESS-OM2-01 currently running around 5K cores, would love to be able to scale to 10K, 20K even better. MW: MOM scaled to 50K. AH: CICE doesn’t scale as well. MW: Any work on CICE distributions? RF: Nope. Would need to be done again at higher core counts. MW: Current one working really well. AH: On NH’s to-do list was to experiment with layouts and load balancing. MW: Alistair is very interesting in load balancing sea ice models. Particularly icebergs. Has some quasi lagrangian code in SIS2 to load balance icebergs. Maybe some ideas will translate or vice versa.
PL: For the moment will just look at MOM and see how it scales at 0.1? AH: Maybe just try doubling everything and see if it scales ok? MW: Used to make those processor heat maps to get the load imbalance of CICE. Would be good to keep an eye on that while working with scaling. Tony Craig (CICE developer) is very interested.

 Atmosphere/coupled models

 PD: Still using code frozen for CMIP runs. Extending number of runs in ensemble.
AH: People in CLEX are keen to run CM2. PD: Not aware, maybe through someone else, maybe Simon or Martin? CM2 and ESM-1.5 runs have been published under s38 project.
AH: Scott Wales doing an ultra high resolution atmosphere run over Australia, under  the STRESS2020 project. PD: Atmosphere only, do you know what resolution? I’ve also done some high res atmosphere only runs. On a project to improve turbulent kinetic energy spectrum in UM. Working on code to put stochastic back scatter into low res N96 (CMIP6) atmosphere. Got some good results injecting turbulent kinetic energy into small scales to improve artificial dissipation associated with semi-lagrangian timestep in UM. To test this is to see how improved N96 results compare to N512 runs using STRESS2020 resources. Working with Jorgen Fredrikson. Should talk to Scott.
AH: At the moment Scott is targeting 400m over Australia. PL: Convection resolving? AH: Planning a 2 day run to simulate Cyclone Debbie. Nested 400m run for Australia, inside BARRA at 2.2km. 10500×13000. PD: We’re going global. MW: How many levels? Same as global? PD: 85. AH: Major problem is running out of memory. MW: More cores should mean less memory. Maybe their Helmholtz server imposes some memory limit on the ranks. AH: Currently waiting for large memory nods to come online.

New FMS

MW: New FMS version coming. Targeting auto tools and getting rid of mkmf. If you’re on MOM5 you can use your frozen version. Completely rewritten IO in FMS. Now a thin wrapper to netCDF. No more magic functions like save_restart, write_restart. They have been replaced by lower level ops to allow model developers to have more control. Not sure MOM5 significance. AH: API compatible? MW: Keep compatible with old API as long as they can. Could dump it in and slowly integrate. Only raising in case you want to do more innovative stuff with IO. PL: Affects MOM6 mainly? MW: MOM6 is one of the main targets. PL: Parallel IO support? MW: Part of the reason. They want parallel IO in atmosphere model which NCAR now uses it. Now an important model. This implements the hooks for that work. RY: MPI-IO still there or be replaced by PIO? MW: It is. RY: Simpler to do one? MW: They’ve sent a patch to get MOM6 working with that now. Doesn’t work currently. Not sure about the progress, but know you were interested in PIO. RF: We’re interested from the ICE point of view. New version of BRAN will need daily inputs in CICE. Performance is terrible as IO is collected on to one processor.  MW: FMS will not help CICE, but a test case if PIO is a valid solution.

COSIMA 2019 Report

This report summarises the fourth meeting of the Consortium for Ocean Sea Ice Modelling in Australia (COSIMA), held in Canberra on 3-4 September 2019. Shweta Sharma has provided a more informal (and entertaining) report here.

Aims & Goals

The annual COSIMA workshop aims to:

  • Maintain and grow the established community around ocean-sea ice modelling in Australia;
  • Discuss recent scientific advances in ocean and sea ice research in a forum that is inclusive and model-agnostic, particularly including observational programs;
  • Agree on immediate next steps in the COSIMA model development plan; and
  • Develop a long-term vision for ocean-sea ice model development to support Australian researchers.

Participants


Attendees included Alberto Alberello (U Adelaide), Christopher Bladwell (UNSW), Fabio Boeira Dias (UTAS/CSIRO), Gary Brassington (BOM), Matt Chamberlain (CSIRO), Navid Constantinou (ANU), Prasanth Divakaran (BOM), Kelsey Druken (NCI), Matthew England (UNSW), Ben Evans (NCI), Hakase Hayashida (IMAS, UTAS), Petra Heil (AAD & AAPP), Andy Hogg (ANU), Ryan Holmes (UNSW), Maurice Huguenin (UNSW), Yi Jin (CSIRO), Andrew Kiss (ANU), Andreas Klocker (UTAS), Qian Li (UNSW), Kewei Lyu (CSIRO), Simon Marsland (CSIRO), Josue Martinez Moreno (ANU), Richard Matear (CSIRO), Ruth Moorman (ANU), Adele Morrison (ANU), Eric Mortenson (CSIRO), Jemima Rama (ANU), Paul Sandery (CSIRO), Abhishek Savita (UTAS/IMAS/CSIRO), Callum Shakespeare (ANU), Shweta Sharma (UNSW), Callum Shaw (ANU), Taimoor Sohail (ANU), Paul Spence (UNSW), Kial Stewart (ANU), Veronica Tamsitt (UNSW), Mirko Velic (BOM), Nick Velzeboer (ANU), Jingbo Wang (NCI), Xuebin Zhang (CSIRO), Xihan Zhang (ANU), Aihong Zhong (BOM), plus those who attended via video conference.

Program

Tuesday 3rd September

Session 1 (Chair – Navid Constantinou)

Andrew Kiss (ANU): ACCESS-OM2 update
Simon Marsland (CSIRO): ACCESS and CMIP6
Hakase Hayashida (IMAS, UTAS): Preliminary results of biogeochemistry simulation with ACCESS-OM2 and plans for OMIP-BGC and IAMIP
Ben Evans (NCI): Addressing the next HPC challenges for Climate and Weather

Session 2 (Chair – Andreas Klocker)

Veronica Tamsitt (UNSW): Lagrangian pathways and residence time of warm Circumpolar Deep Water on the Antarctic continental shelf
Ruth Moorman (ANU): Response of Antarctic ocean circulation to increased glacial meltwater
Kewei Lyu (CSIRO): Southern Ocean heat uptake and redistribution in theoretical framework and model perturbation experiments
Fabio Boeira Dias (UTAS/CSIRO): High-latitude Southern Ocean response to changes in surface momentum, heat and freshwater fluxes under 2xCO2 concentration

Session 3 (Chair – Simon Marsland)

Xuebin Zhang (CSIRO): Dynamical downscaling of climate changes with OFAM3
Matt Chamberlain (CSIRO): Multiscale data assimilation in Bluelink Reanalysis
Paul Sandery (CSIRO): A data assimilation framework for ocean-sea-ice prediction
Prasanth Divakaran (Bureau of Meteorology): OceanMAPS 3.3 Developments

Wednesday 4th September

Session 4 (Chair – Veronica Tamsitt)

Ryan Holmes (UNSW):  Atlantic ocean heat transport enabled by Indo-Pacific heat uptake and mixing
Eric Mortenson (CSIRO): Decoupling of carbon and heat uptake rates of the global ocean over the 21st century
Christopher Bladwell (UNSW): Diahaline transport in global ocean models
Abhishek Savita (UTAS, IMAS, CSIRO): Uncertainty in the estimation of global and regional ocean heat content since 1970
Gary Brassington (BOM): Comparison of ACCESS-OM2-01 to other models and observations

Session 5 (Chair – Qian Li)

Xihan Zhang (ANU): Gulf Stream separation in ACCESS-OM2
Alberto Alberello (U Adelaide): Impacts of winter cyclones on sea ice dynamics
Petra Heil (AAD): Sea ice in the ACCESS-OM2-01: Exploring near-coastal processes

COSIMA Discussion (Chair – Paul Spence)

Open discussion highlighted a number of potential avenues for work in the near-term, as well as some suggestions for directions that could be included in a future COSIMA funding bid.

Near-term Priorities

  • Running the COSIMA cookbook on the VDI is becoming untenable, and recent improvements in the cookbook have not been widely adopted. This should be a priority, possibly with a tutorial session at the CLEx Annual Workshop?
  • Start investigating coupled data assimilation for parameter estimation, especially for sea ice.
  • Start serious perturbation experiments with ACCESS-OM2-01, potentially including:
    • SAMx (RYF forcing with SAM Extreme Years).
    • Adding katabatic winds?
    • Tropical mixing and the AMOC.
    • Influence of the Amundsen Sea Low on the Southern Ocean.
    • Turbulent Kinetic Energy and winds in the Southern Ocean.
  • OMIP2 contribution for CMIP6.
  • Improve communication of COSIMA achievements.
  • Begin work on nesting regional MOM6 models.

Longer term suggestions

  • Better connection with Paleo community.
  • Improve links with the wave community.
  • Start running ensemble simulations?
  • Do we need to move to CICE6?
  • Capacity to run future scenarios based on coupled model output.

It was agreed that COSIMA V will be held in 2020, hosted by Xuebin Zhang in Hobart.

Additional discussion points are given here.

Awards

The COSIMA Most Selfless Contributor Awards for 2017, 2018 and 2019 were presented in absentia to

  • 2017 James Munroe
  • 2018 Marshall Ward
  • 2019 Russ Fiedler (pictured)

in appreciation of their tireless efforts which have greatly improved the software used by the COSIMA community.

Technical Working Group Meeting, August 2019

Minutes

Date: 14th August, 2019
Attendees:

  • Aidan Heerdegen (AH) CLEX ANU, Angus Gibson (AG) RSES ANU, Andrew Kiss (AK)  COSIMA ANU
  • Russ Fiedler (RF), Matt Chamberlain (MC) CSIRO Hobart
  • Rui Yang (RY) NCI
  • Marshall Ward (MW) GFDL
  • Nich Hannah (NH), Double Precision
  • James Munroe (JM), COSIMA

PIO work with CICE

NH: PIO code in CICE not as complete or thorough as netCDF code. Nothing to suggest it won’t work. Relies on NCAR PIO library, and a CESM utility library. Dependencies which are not part of CICE. Built PIO dependency on raijin, ran into CESM dependency. Can either remove dependency or remove code.
NH: Initially thought to use the MOM approach. Tile and collate. Russ’ comments encouraged to try PIO. Will be supported in future and will be supported in CICE6. Nothing working, but will soon test with 1 degree.
RF: Real bottleneck with high freq output. Worth a go. Attempt to put this into FMS by Hartnett. AH: Different to parallel netCDF? NH: PIO is wrapper around parallel netcdf. Written by NCAR to simplify parallel netcdf. Another layer. On GitHub, continuing to be maintained. RY: Wrapper that does work to match computing to IO domain. Not so useful for MOM5 as it has io_layout already.
MW: Harntnett motivated by FE3 (forecast model) rather than ocean. Not sure what project even involved in.
NH: Big test is handling interesting CICE layout, difference between cartesian grid and PE layout. MW: PIO will support explicit decomposition and other approaches.
NH: Parallel netCDF version on raijin only links with OpenMPI3.0. RY: New machine launched soon. OpenMPI 1.* will be dropped. No new software depending on 1. MW: OpenMPI 2 is not good. Should use 3.
NH: Probably have to test this with OpenMPI 3.0 RY: 3.1.3. Switch everything to that. Good test for new machine. AH: Working now? RY: My fault. Used unmatched openMPI library. Everything looks fine. OpenMPI 2/3/4 with Intel 19. All working. 1 deg & 0.25 deg working. Tenth not working. MW: I was able to run tenth with 3.1.2/3.1.3.
MW: One of the intel compilers broke MOM. A compiler bug with types in types.
AH: Should  start an issue for testing RY: Will email MW directly. RY: Not a MOM bug.
MW: Tried MOM-SIS tenth? Good test. RY: From earlier this year do have this working. This is testing for new machine, so ACCESS-OM2.

OMIP date restart protocol

RF: Talked to Griffies. GFDL take ensemble approach. Run for N years using true dates. At finish reset back to start date with correct calendar. Storing new stuff in different directory. End up with 5 sequences of 55 years. All dates are correct. No issues with leap years going wrong. Think this is the best way to go.
AK: Came to conclusion that this was right way to go, mostly due to leap year issue. Problem is, can we get the model to do that, but Maurice and Ryan had issues. Issue with CICE getting the correct date. CICE has a flag “use_restart_dates”. Suggested set this to false, and set the dates in access_restart.nml, but CICE is not picking up dates. Looks like libaccessom2 is not passing them on to CICE. Some confusion about exactly what they have done. Some instructions on Wiki for restarting, from restarting IAF from RYF at tenth, but doesn’t work for other people. NH: I’ll look at it. AK: Will send issue. NH: Didn’t realise it was happening. CICE date handling is not great.
AH: Downside with ensemble, difficult to get metrics across the whole time series. RF: Need extra meta-data added in. Maybe which cycle you’re in. An extra variable which gives the actual number of days since the start of the run. Down with post-processing. Might be able to concatenate files using extra meta-data. AH: Always have issues with missing leap years if it spans a century. But only daily is an issue. AK: Cookbook do something. MC: Pretend it is no leap? JM: Data looking at as time series? AH: Extra metadata, say offset day is a good idea. RF: Add buffer in netCDF file so don’t need copies. mppnccombine can add padding. usually done with nccreate, make sure the header has some space. hbuf?

Strategy for CICE updates for flexibly adding fields

RF: Way CICE drivers work, variables you want are either hard coded, or muck around with pre-processing to compile them in and out. Wondering if anyone looked at doing it on the fly. Using error codes coming back when setting up variables, so have flexible number of variables passed in and out. Would like this to pass total wind speed, to harmonise code. Also Hakase wants it for some BGC stuff. Phytoplankton through to the ice. So specify the variables, work out if they’re there or not.
NH: Would want the exe to handle configuration with different sets of coupling fields. Sometimes include total wind speed, sometimes not. RF: would know complete set, if not there skip it. Currently have to be hard wired in, or make another driver. NH: Way to do it, start with superset in namcouple, and code would exclude certain variables. RF: Maybe if variable not in namcouple, return an error code, but ignore error. NH: Shouldn’t be too hard to do. NH: OASIS does return error codes that could be used. Either abort or return error code. If aborting could change that. AH: Restart fields? NH: Should do behind the scenes.

Paths for JRA55-do forcing files. Some changes to support v1.4

AH: JRA55-do not part of Input4MIPs, part of CMIP6. Have to use the copy that is CMIP6. Encodes all the metadata in filename, consequently doesn’t currently work with YATM. Circumvented by creating symbolic links that worked with YATM. When I did this couldn’t reproduce. Not sure if this is actually an issue with the fields being different or not.
AH: Tried to use testing framework NH developed for this using jenkins. The historical test that tests against known checksums doesn’t seem to actually compare them. Not sure if that is intentional. Would like to use framework, as NH has done a great job with it.
MH: MOM6 has diag_mediator, supports CMOR name alongside internal model name. Porting to MOM5 is a big task, but idea is good and saved them a lot of work. Could create a thin wrapper to translate to CMOR name if that helps. AK: How integrate with YATM? MW: Don’t know. At FMS level, so only help with 1 model (MOM). AK: YATM access the JRA files. So libaccessom2 change. AH: Looked at YATM code. Generates filename form date. Input4MIPS has current year and next year, so would require code changes. Might just be easier to create a file with date->filename mapping? AH: Possible to do. Would need to add a token for year+1. Possible to do. Probably best to do it that way.
AK: Also need code changes with v1.4. Solid and liquid runoff are separate. What to do with solid runoff? Griffies either use iceberg model, or melt them and add them to runoff. Take account latent heat of fusion? Assuming solid runoff is at zero, which could be a problem. Put in a request to download v1.4. Scripts they have should automatically download it, but not. MW: Think GFDL only has v1.3.
MW: Fields go to end of 2017, is 2018 downloaded? Looking in wrong place? Looking in ua8. AK: Should look in qv56. AK: qv56 up to feb 2018. AH: If not automatically downloading, we should ask. What does the OMIP protocol say about end date? AK: JRA55 can find out about 2018. RF: It is specified, but would like latest for ongoing runs.

Testing FMS merge

AH: Putting FMS in as a sub-repo. Just needs testing. If it reproduces checksums for a month we’re sure it is ok? Is that sufficient?
NH: When Marshall upgraded FMS, went through every MOM test. Including 0.25. Can’t recall how strict we were. AH: Testing framework still there? NH: It is there. Because it never gets used, might be rotted a bit. Can give Jenkins URL of PR and it would do it. We should work together to get that working.

New NCI HPC hardware announcement

RY: System by end of the year. 2 phases, install new machine with Cascade Lake nodes. Short period gabi and raijin run simultaneously. After that skylake and broadwell will be merged with new machine and SandyBridge nodes removed. 100 GPU installed. 16 skylake k-80 nodes. PBS pro again. Storage and network infiniband. 200GB/s transfer speed. OS is CentOS 8. AH: Trying to figure out total core count for new machine. Do you know what core count will be? RY: Not clear on exact number. Can check with system guys if they know the exact number. If 32 cores/node, 150+K processors. AH: Will runtimes be extended for new machine. Find 5 hours too low for high core count jobs. Reduces flexibility. RY: Queue time limits are per project. Quite flexible. Contact NCI help. AH: Have asked for time limit changes in past, but usually time limited. RY: Have been asked by other users, not sure about the policy. Good time to ask and get a better policy for the new machine.

New Repeat Year Spinup

We are currently undertaking a new spinup simulation using our highest resolution ocean-sea ice model: ACCESS-OM2-01. Our goal is to run at least 50 years of this simulation in the third quarter of 2019, using the  1990-91 Repeat Year Forcing strategy (that we call RYF9091). After the first month of the quarter, we have managed to complete 23 years of the simulation.

Early indications are that temperature and ice biases are reduced in the new simulation (compared with our previous RYF8485 spinup) but that many large scale circulation metrics, such as ACC transport) are largely unchanged.

Globally Averaged temperature from RYF9091 spinup.
ACC Transport from RYF9091 spinup

For those who are interested, this simulation is being stored in /g/data3/hh5/tmp/cosima/access-om2-01/01deg_jra55v13_ryf9091. Timeseries describing the simulation, periodically updated, can be found here, and additional diagnostics are available upon request.

COSIMA IV, ANU, 3-4 Sept 2019

Announcing that our 4th Annual COSIMA workshop will be held at ANU on Tuesday 3rd and Wednesday 4th September.

The workshop will feature our regular mix of talks and discussions, covering a mix of physical oceanography, sea ice, model development and technical advances. Our workshops are not restricted to ACCESS-OM2/MOM users – we welcome contributions from all who consider themselves part of the COSIMA community.

Registration is now closed, but you can join via Zoom – see below.

We will begin with morning tea at about 10am on the first morning (3 Sept), with the first talk at about 10:30.

Click here for the workshop program (updated 1st Sept).

The workshop report provides slides from the presentations and a summary of the discussion. Shweta’s post on the workshop is here.

Joining via Zoom
The workshop will be streamed via Zoom. When joining via zoom, please mute your mic and hide your video. Unfortunately we probably won’t have an ability to take questions from remote participants.

Connection details:

Join from a PC, Mac, iPad, iPhone or Android device: https://anu.zoom.us/j/8232211676

Join from a H.323/SIP room system:
Dial: +61 2 6222 7588
or SIP:7588@aarnet.edu.au
or H323:8232211676@182.255.112.21 (From Cisco)
or H323:182.255.112.21##8232211676 (From Huawei, LifeSize, Polycom)
or 162.255.36.11 or 162.255.37.11 (U.S.)

Meeting ID: 8232211676

ACCESS-OM2 at the CLEx Winter School

This week saw the ARC Centre of Excellence for Climate System Science hold its annual Winter School. This year’s edition is on modelling the climate system, hosted by the University of Melbourne. Yesterday was “Oceans Day” at the Winter School, and the highlight was an afternoon lab session based on ACCESS-OM2. Tasks included:

  • To run the ACCESS-OM2 model (about 65 of the 70 students managed to do this); and
  • To analyse some existing ACCESS-OM2 model output using the COSIMA cookbook (most students also completed this, with about a third of students producing great plots to show some new and intriguing results).

Progress was enabled by erstwhile helpers from CLEx’s CMS Team – Aidan, Claire Holger, Paola and Scott!

ACCESS-OM2 Workshop
Feverishly running ACCESS-OM2 at the 2019 CLEx Winter School.

ACCESS-OM2 Evaluation Paper

A community paper evaluating the performance of ACCESS-OM2 has been submitted to Geoscientific Model Development (GMD).  The paper has 30 authors from across the Australian community. GMD has an open review process, so you can track its progress.

This paper outlines the performance of ACCESS-OM2 at three different model resolutions, as indicated in the figure below.  It aims to spell out which versions of the model are suitable for different types of studies, and highlights the performance of the 0.1° resolution configuration (referred to as ACCESS-OM2-01). The paper shows that ACCESS-OM2 does a good job of representing many features of the ocean. Historical sea ice extent trends are well-represented, and the surface properties and transects in each ocean basin compare well with the observational record. The large scale overturning circulation, flow through the Indonesian archipelago and patterns of boundary currents are realistic, supporting the notion that this suite of models is competitive with similar models from other institutions. Areas for improvement include the relatively weak barotropic transport in the midlatitude gyres, particularly in the Pacific Ocean, the weaker than observed Drake Passage transport and the weak AMOC in the 1° configuration. For full details, feel free to browse the paper!

Kiss, A. E., Hogg, A. McC., Hannah, N., Boeira Dias, F., Brassington, G. B., Chamberlain, M. A., Chapman, C., Dobrohotoff, P., Domingues, C. M., Duran, E. R., England, M. H., Fiedler, R., Griffies, S. M., Heerdegen, A., Heil, P., Holmes, R. M., Klocker, A., Marsland, S. J., Morrison, A. K., Munroe, J., Oke, P. R., Nikurashin, M., Pilo, G. S., Richet, O., Savita, A., Spence, P., Stewart, K. D., Ward, M. L., Wu, F., and Zhang, X.: ACCESS-OM2: A Global Ocean-Sea Ice Model at Three Resolutions, Geosci. Model Dev. Discuss., https://doi.org/10.5194/gmd-2019-106, in review, 2019.