Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update cmor.c #733

Merged
merged 2 commits into from
Apr 29, 2024
Merged

Update cmor.c #733

merged 2 commits into from
Apr 29, 2024

Conversation

cofinoa
Copy link
Contributor

@cofinoa cofinoa commented Apr 9, 2024

Use netcCDF4 DEFAULT_CHUNK_SIZES, for chunked vars and coordinates/axis.

This relates to issue #601 where it is explained that chunk sizes of 1, for coordinates/axis, like time has a huge bad performance impact on reading those netCDF variables.

The netcdf-c library defines default CHUNK sizes for netCDF4/HDF5 files when chunkingsizes are NULL.

For current netcdf-c (i.e. version 4.9.2)

  • nc_def_var_chunking:

    [...] Chunk sizes may be specified with the chunksizes parameter or default sizes will be used if that parameter is NULL. [...]

  • See Default Chunking Scheme from netCDF User Guide (NUG):
    • [...] variables that only have a single unlimited dimension [...] the [default] chunk sizes for such variables are limited to 4KiB

    • [...] Currently the netCDF default chunk size is 4MiB, which is reasonable for filesystems on high-performance computing platforms [...]

    • [...] The current default chunking strategy of the netCDF library is to balance access time along any of a variable's dimensions, by using chunk shapes similar to the shape of the entire variable but small enough that the resulting chunk size is less than or equal to the default chunk size. This differs from an earlier default chunking strategy that always used one for the length of a chunk along any unlimited dimension, and otherwise divided up the number of chunks along fixed dimensions to keep chunk sizes less than or equal to the default chunk size. [...]

  • To change the default chunk cache size, use the nc_set_chunk_cache() function before opening the file, for all variables, or per variable use nc_set_var_chunk_cache().
  • Related HDF5 function: H5Pset_cache
  • This PR not only propose DEFAULT chunking for time coordinate/axis but also for data variable itself with unlimited dimensions.

Use netcCDF4 DEFAULT_CHUNK_SIZES, for chunked vars and coordinates/axis.
Copy link
Collaborator

@mauzey1 mauzey1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes look good. Please merge the latest changes from main to this branch.

@mauzey1 mauzey1 merged commit 92e3efd into PCMDI:main Apr 29, 2024
@durack1
Copy link
Contributor

durack1 commented Apr 29, 2024

@cofinoa it seems like you need to rebase your branch with the latest main and then repush? We may need to stand up a new test alongside your PR changes

UPDATE: Ok ignore the above, it seems like this is now in - and we'll need to add a test to ensure coverage before 3.9 is finalized

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants