Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hard-coded tables in CMOR #536

Open
alaniwi opened this issue Aug 19, 2019 · 9 comments
Open

hard-coded tables in CMOR #536

alaniwi opened this issue Aug 19, 2019 · 9 comments
Milestone

Comments

@alaniwi
Copy link

alaniwi commented Aug 19, 2019

Various tables are included in the software, which forces the user to upgrade CMOR if they are updated. In particular, the following:

$ cd cmor/
$ find . -name '*.json' | grep -v Test
./Lib/CV_experiments.json
./Lib/experiments_id.json
./LibCV/PrePARE/out_names_tests.json

Could I suggest moving them to a separate repo so that they can be updated independently? (Possibly added to https://github.com/PCMDI/cmip6-cmor-tables, although I don't know enough to be sure whether this is appropriate or not.)

Thanks.

@durack1
Copy link
Contributor

durack1 commented Aug 19, 2019

@alaniwi thanks for commenting. We are actually in the process of cleaning up repos (see PCMDI/xml-cmor3-database#52) and @mauzey1 has also flagged that CMOR should be carrying the primary Tables repo (PCMDI/cmip6-cmor-tables) along for the ride in conda installations, see #529.

Where other than tests are the files referenced #536 (comment) actually used? In order for a user to configure and use CMOR3 they first need to clone and configure a Table subdir for their use (either cmip6-cmor-tables, input4mips-cmor-tables or obs4mips-cmor-tables currently).

@mauzey1
Copy link
Collaborator

mauzey1 commented Aug 19, 2019

@durack1 Issue #529 is about making a git submodule of cmip6-cmor-tables inside the CMOR repo, not including it in the conda installation.

@alaniwi The file out_names_tests.json is used by PrePARE for finding the variable of a file by looking at their "out name," which is a truncated version of the variable name used in the name of the output file. out_names_tests.json is not a part of the CMIP6 tables. The other two files appear to not be used by CMOR/PrePARE and will probably be removed. @durack1 and @taylor13, do you know the purpose of experiments_id.json and CV_experiments.json?

@durack1
Copy link
Contributor

durack1 commented Aug 19, 2019

@mauzey1 good question, those two experiments files are not up-to-date and as long as they're not being used by the code should also be purged as part of a repo cleanup. This repo should be for the CMOR software (along with it's internal tests) not for also hosting table files

@alaniwi
Copy link
Author

alaniwi commented Aug 20, 2019

@durack1 The specific issue I encountered was related to PrePARE using the out_names_tests.json file.

@mauzey1
Copy link
Collaborator

mauzey1 commented Aug 27, 2019

@durack1 Should out_names_tests.json be a part of the CMIP6 CMOR tables repo? If a change were to happen in the CMIP6 tables that would require changing out_names_tests.json, then we would only need to update the table repo instead of both the tables and CMOR.

@durack1
Copy link
Contributor

durack1 commented Aug 27, 2019

@mauzey1 it is not clear to me what the out_name_tests.json file actually contains. It would appear to be a look up table of sorts for contents of the cmip6-cmor-tables and if this is true, and is not CMOR-specific, then moving this to the cmip6-cmor-tables repo makes complete sense to me

@mauzey1
Copy link
Collaborator

mauzey1 commented Aug 27, 2019

@durack1 It is CMOR-specific since the index values in each table-out_name entry corresponds to a function in PrePARE for testing which variable name to use.

I am thinking we might not even need out_name_tests.json. We could look up the out_names in the table we get from the file name, and then determine which test we need to pick a variable name.

@mauzey1 mauzey1 added this to To do in 3.6.0 via automation Oct 8, 2019
@mauzey1 mauzey1 removed this from To do in 3.6.0 Mar 4, 2020
@mauzey1 mauzey1 added this to the 4.0/Future milestone Mar 4, 2020
@mauzey1 mauzey1 removed this from the 4.0/Future milestone Aug 29, 2022
@mauzey1
Copy link
Collaborator

mauzey1 commented Sep 3, 2022

@durack1 I wanted to get back to this issue of hard-coded tables in CMOR/PrePARE. out_name_tests.json is the last of these files.

As I stated previously, we shouldn't need this file since the out_name attributes from the variable entries in the tables should handle it. The current version of PrePARE will get the table and variable out_name from the file name of the dataset, look up which variable property to check in out_name_tests.json , and then use that check to determine which variable name it should be.

An example would be a dataset for ta27 in the 6hrPlevPt table. PrePARE would get the out_name ta and the table name 6hrPlevPt from the dataset's file name. PrePARE would then concatenate those names into 6hrPlevPt_ta and then use that as a key to look up in out_names_tests.json. It will see that it will have to determine if the variable name is ta27 or ta7h using the has_27_pressure_levels and has_7_pressure_levels functions respectively.

"6hrPlevPt_ta": {
"has_27_pressure_levels": "ta27",
"has_7_pressure_levels": "ta7h"
},

@staticmethod
def has_27_pressure_levels(infile, **kwargs):
dim = [d for d in list(infile.dimensions.keys()) if 'plev' in d]
return True if len(dim) == 1 and infile.dimensions[dim[0]].size == 27 else False
@staticmethod
def has_7_pressure_levels(infile, **kwargs):
dim = [d for d in list(infile.dimensions.keys()) if 'plev' in d]
return True if len(dim) == 1 and infile.dimensions[dim[0]].size == 7 else False

# -------------------------------------------------------------------
# Distinguish similar CMOR entries with the same out_name if exist
# -------------------------------------------------------------------
# Apply test on variable only if a particular treatment if required
prepare_path = os.path.dirname(os.path.realpath(__file__))
out_names_tests = json.loads(open(os.path.join(prepare_path, 'out_names_tests.json')).read())
# -------------------------------------------------------------------
# Open file in processing
# The file needs to be open before the calling the test.
# -------------------------------------------------------------------
infile = netCDF4.Dataset(ncfile, "r")
key = '{}_{}'.format(table_id, variable_id)
variable_cmor_entry = None
if key in list(out_names_tests.keys()):
for test, cmor_entry in list(out_names_tests[key].items()):
if getattr(self, test)(**{'infile': infile,
'variable': variable,
'filename': filename}):
# If test successfull, the CMOR entry to consider is given by the test
variable_cmor_entry = cmor_entry
break
else:
# If not, CMOR entry to consider is the variable from filename or from input command-line
variable_cmor_entry = variable
else:
# By default, CMOR entry to consider is the variable from filename or from input command-line
variable_cmor_entry = variable

PrePARE will determine that the dataset should have the variable name ta27 if it has 27 pressure levels.

I propose a different method of validating the file name and variable name. First, find the variable and table name in the file to find the variable entry in the table.

        "ta27": {
            "frequency": "6hrPt", 
            "modeling_realm": "atmos", 
            "standard_name": "air_temperature", 
            "units": "K", 
            "cell_methods": "area: mean time: point", 
            "cell_measures": "area: areacella", 
            "long_name": "Air Temperature", 
            "comment": "Air Temperature", 
            "dimensions": "longitude latitude plev27 time1", 
            "out_name": "ta", 
            "type": "real", 
            "positive": "", 
            "valid_min": "", 
            "valid_max": "", 
            "ok_min_mean_abs": "", 
            "ok_max_mean_abs": ""
        }

From there we can get the out_name attribute to validate the name used in the file, and we can also perform the has_27_pressure_levels check due to plev27 being present in the dimensions list. We could do similar checks with plev4 and plev7h. is_climatology is about finding -clim in the file name and Clim at the end of the variable name, and has_land_in_cell_methods is about finding land in the variable name and in the cell_methods attribute. has_3_dimensions is about checking if there are 3 dimensions for the variable. We might not even need these checks if the CMOR CV already handles them.

One issue when it comes to validating this feature is the lack of files from ESGF that have an out name that differs from their true variable name. Going through the out names list and searching for variables on esgf-node.llnl.gov, I've only found one dataset: CMIP6.CMIP.CNRM-CERFACS.CNRM-CM6-1.piControl.r1i1p1f2.Omon.ficeberg2d.gn

One odd thing about this dataset is that the variable name ficeberg2d is the same as the name in the file name, rather than ficeberg as out_names_tests.json would suggest.

@taylor13
Copy link
Collaborator

taylor13 commented Sep 7, 2022

Soon we should be moving to a slightly different way of uniquely naming variables, so that there won't be multiple "in names" in a table with the same "out name". In fact the names will be unique across all tables (although the variable may still be divided up and hosted by different tables). I'm not sure we should try to clean up things until that new approach has been agreed.

@durack1 durack1 added this to the 4.0/Future milestone Feb 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants