hard-coded tables in CMOR #536

alaniwi · 2019-08-19T10:06:36Z

Various tables are included in the software, which forces the user to upgrade CMOR if they are updated. In particular, the following:

$ cd cmor/
$ find . -name '*.json' | grep -v Test
./Lib/CV_experiments.json
./Lib/experiments_id.json
./LibCV/PrePARE/out_names_tests.json

Could I suggest moving them to a separate repo so that they can be updated independently? (Possibly added to https://github.com/PCMDI/cmip6-cmor-tables, although I don't know enough to be sure whether this is appropriate or not.)

Thanks.

The text was updated successfully, but these errors were encountered:

durack1 · 2019-08-19T18:06:10Z

@alaniwi thanks for commenting. We are actually in the process of cleaning up repos (see PCMDI/xml-cmor3-database#52) and @mauzey1 has also flagged that CMOR should be carrying the primary Tables repo (PCMDI/cmip6-cmor-tables) along for the ride in conda installations, see #529.

Where other than tests are the files referenced #536 (comment) actually used? In order for a user to configure and use CMOR3 they first need to clone and configure a Table subdir for their use (either cmip6-cmor-tables, input4mips-cmor-tables or obs4mips-cmor-tables currently).

mauzey1 · 2019-08-19T18:35:27Z

@durack1 Issue #529 is about making a git submodule of cmip6-cmor-tables inside the CMOR repo, not including it in the conda installation.

@alaniwi The file out_names_tests.json is used by PrePARE for finding the variable of a file by looking at their "out name," which is a truncated version of the variable name used in the name of the output file. out_names_tests.json is not a part of the CMIP6 tables. The other two files appear to not be used by CMOR/PrePARE and will probably be removed. @durack1 and @taylor13, do you know the purpose of experiments_id.json and CV_experiments.json?

durack1 · 2019-08-19T18:48:59Z

@mauzey1 good question, those two experiments files are not up-to-date and as long as they're not being used by the code should also be purged as part of a repo cleanup. This repo should be for the CMOR software (along with it's internal tests) not for also hosting table files

alaniwi · 2019-08-20T08:15:18Z

@durack1 The specific issue I encountered was related to PrePARE using the out_names_tests.json file.

mauzey1 · 2019-08-27T01:03:00Z

@durack1 Should out_names_tests.json be a part of the CMIP6 CMOR tables repo? If a change were to happen in the CMIP6 tables that would require changing out_names_tests.json, then we would only need to update the table repo instead of both the tables and CMOR.

durack1 · 2019-08-27T09:22:56Z

@mauzey1 it is not clear to me what the out_name_tests.json file actually contains. It would appear to be a look up table of sorts for contents of the cmip6-cmor-tables and if this is true, and is not CMOR-specific, then moving this to the cmip6-cmor-tables repo makes complete sense to me

mauzey1 · 2019-08-27T17:00:13Z

@durack1 It is CMOR-specific since the index values in each table-out_name entry corresponds to a function in PrePARE for testing which variable name to use.

I am thinking we might not even need out_name_tests.json. We could look up the out_names in the table we get from the file name, and then determine which test we need to pick a variable name.

mauzey1 · 2022-09-03T02:03:25Z

@durack1 I wanted to get back to this issue of hard-coded tables in CMOR/PrePARE. out_name_tests.json is the last of these files.

As I stated previously, we shouldn't need this file since the out_name attributes from the variable entries in the tables should handle it. The current version of PrePARE will get the table and variable out_name from the file name of the dataset, look up which variable property to check in out_name_tests.json , and then use that check to determine which variable name it should be.

An example would be a dataset for ta27 in the 6hrPlevPt table. PrePARE would get the out_name ta and the table name 6hrPlevPt from the dataset's file name. PrePARE would then concatenate those names into 6hrPlevPt_ta and then use that as a key to look up in out_names_tests.json. It will see that it will have to determine if the variable name is ta27 or ta7h using the has_27_pressure_levels and has_7_pressure_levels functions respectively.

cmor/LibCV/PrePARE/out_names_tests.json

Lines 27 to 30 in c805fe0

 "6hrPlevPt_ta": { 

 "has_27_pressure_levels": "ta27", 

 "has_7_pressure_levels": "ta7h" 

 },

cmor/LibCV/PrePARE/PrePARE.py

Lines 340 to 348 in c805fe0

 @staticmethod 

 def has_27_pressure_levels(infile, **kwargs): 

 dim = [d for d in list(infile.dimensions.keys()) if 'plev' in d] 

 return True if len(dim) == 1 and infile.dimensions[dim[0]].size == 27 else False 

 @staticmethod 

 def has_7_pressure_levels(infile, **kwargs): 

 dim = [d for d in list(infile.dimensions.keys()) if 'plev' in d] 

 return True if len(dim) == 1 and infile.dimensions[dim[0]].size == 7 else False

cmor/LibCV/PrePARE/PrePARE.py

Lines 408 to 434 in c805fe0

 # ------------------------------------------------------------------- 

 # Distinguish similar CMOR entries with the same out_name if exist 

 # ------------------------------------------------------------------- 

 # Apply test on variable only if a particular treatment if required 

 prepare_path = os.path.dirname(os.path.realpath(__file__)) 

 out_names_tests = json.loads(open(os.path.join(prepare_path, 'out_names_tests.json')).read()) 

 # ------------------------------------------------------------------- 

 # Open file in processing 

 # The file needs to be open before the calling the test. 

 # ------------------------------------------------------------------- 

 infile = netCDF4.Dataset(ncfile, "r") 

 key = '{}_{}'.format(table_id, variable_id) 

 variable_cmor_entry = None 

 if key in list(out_names_tests.keys()): 

 for test, cmor_entry in list(out_names_tests[key].items()): 

 if getattr(self, test)(**{'infile': infile, 

 'variable': variable, 

 'filename': filename}): 

 # If test successfull, the CMOR entry to consider is given by the test 

 variable_cmor_entry = cmor_entry 

 break 

 else: 

 # If not, CMOR entry to consider is the variable from filename or from input command-line 

 variable_cmor_entry = variable 

 else: 

 # By default, CMOR entry to consider is the variable from filename or from input command-line 

 variable_cmor_entry = variable

PrePARE will determine that the dataset should have the variable name ta27 if it has 27 pressure levels.

I propose a different method of validating the file name and variable name. First, find the variable and table name in the file to find the variable entry in the table.

        "ta27": {
            "frequency": "6hrPt", 
            "modeling_realm": "atmos", 
            "standard_name": "air_temperature", 
            "units": "K", 
            "cell_methods": "area: mean time: point", 
            "cell_measures": "area: areacella", 
            "long_name": "Air Temperature", 
            "comment": "Air Temperature", 
            "dimensions": "longitude latitude plev27 time1", 
            "out_name": "ta", 
            "type": "real", 
            "positive": "", 
            "valid_min": "", 
            "valid_max": "", 
            "ok_min_mean_abs": "", 
            "ok_max_mean_abs": ""
        }

From there we can get the out_name attribute to validate the name used in the file, and we can also perform the has_27_pressure_levels check due to plev27 being present in the dimensions list. We could do similar checks with plev4 and plev7h. is_climatology is about finding -clim in the file name and Clim at the end of the variable name, and has_land_in_cell_methods is about finding land in the variable name and in the cell_methods attribute. has_3_dimensions is about checking if there are 3 dimensions for the variable. We might not even need these checks if the CMOR CV already handles them.

One issue when it comes to validating this feature is the lack of files from ESGF that have an out name that differs from their true variable name. Going through the out names list and searching for variables on esgf-node.llnl.gov, I've only found one dataset: CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.piControl.r1i1p1f2.Omon.ficeberg2d.gn

One odd thing about this dataset is that the variable name ficeberg2d is the same as the name in the file name, rather than ficeberg as out_names_tests.json would suggest.

taylor13 · 2022-09-07T15:03:14Z

Soon we should be moving to a slightly different way of uniquely naming variables, so that there won't be multiple "in names" in a table with the same "out name". In fact the names will be unique across all tables (although the variable may still be divided up and hosted by different tables). I'm not sure we should try to clean up things until that new approach has been agreed.

mauzey1 added this to To do in 3.6.0 via automation Oct 8, 2019

mauzey1 removed this from To do in 3.6.0 Mar 4, 2020

mauzey1 added this to the 4.0/Future milestone Mar 4, 2020

mauzey1 removed this from the 4.0/Future milestone Aug 29, 2022

durack1 added this to the 4.0/Future milestone Feb 27, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

hard-coded tables in CMOR #536

hard-coded tables in CMOR #536

alaniwi commented Aug 19, 2019 •

edited

durack1 commented Aug 19, 2019

mauzey1 commented Aug 19, 2019

durack1 commented Aug 19, 2019

alaniwi commented Aug 20, 2019

mauzey1 commented Aug 27, 2019

durack1 commented Aug 27, 2019

mauzey1 commented Aug 27, 2019

mauzey1 commented Sep 3, 2022

taylor13 commented Sep 7, 2022

hard-coded tables in CMOR #536

hard-coded tables in CMOR #536

Comments

alaniwi commented Aug 19, 2019 • edited

durack1 commented Aug 19, 2019

mauzey1 commented Aug 19, 2019

durack1 commented Aug 19, 2019

alaniwi commented Aug 20, 2019

mauzey1 commented Aug 27, 2019

durack1 commented Aug 27, 2019

mauzey1 commented Aug 27, 2019

mauzey1 commented Sep 3, 2022

taylor13 commented Sep 7, 2022

alaniwi commented Aug 19, 2019 •

edited