Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some coordinate system dimensions can be lost if unnamed #951

Open
desruisseaux opened this issue Dec 7, 2021 · 22 comments
Open

Some coordinate system dimensions can be lost if unnamed #951

desruisseaux opened this issue Dec 7, 2021 · 22 comments
Labels
bug Something isn't working

Comments

@desruisseaux
Copy link

By constructor contract, Dimension.shortName can be null if the dimension is not shared (i.e. when a dimension is private to a variable). When this is the case, then the only criterion left for allowing Dimension.equals(Object) to differentiate two dimensions is the dimension length. If by coincidence two unnamed dimensions have the same length, then they are considered equal.

In the CoordinateSystem class, dimensions are stored in a HashSet. Consequently if a coordinate system have two dimensions like below:

  • Latitude with null shortName and a length of 200.
  • Longitude with null shortName and a length of 200.

Then the two above dimensions are considered equal and the domain contains only one element. It causes user code to believe that the coordinate system is one-dimensional.

Side note: there is a potential NullPointerException in compareTo(Dimension).

asfgit pushed a commit to apache/sis that referenced this issue Dec 7, 2021
This workaround	does not completely solve the problem because a
fix relative to `ucar.nc2.dataset.CoordinateSystem` is also needed.

Unidata/netcdf-java#951
@haileyajohnson haileyajohnson added the bug Something isn't working label Dec 7, 2021
@JohnLCaron
Copy link
Collaborator

Hi Martin:

Thanks for the comments.

The only way that a coordinate can be associated with a data variable is with shared dimensions. I agree that allowing unnamed dimensions is questionable, but I dont see any practical effects on coordinate variables or systems.

Unshared dimensions have been allowed for NetcdfFile for legacy datasets; the idea is that this would be fixed with CoordinateBuilders so that NetcdfDataset and GridDataset can georeference.

Regards,
John Caron

@desruisseaux
Copy link
Author

desruisseaux commented Dec 8, 2021

Hello John. Thanks for the reply. I'm reading a GCOM-C file from JAXA in HDF5 format. My test file is 1.3 Mb, which is why I didn't attached it to this issue. I get the CoordinateSystem instances by a call to ucar.nc2.dataset.NetcdfDataset.getCoordinateSystems() — I presume that it uses a builder under the hood but didn't verified. This is this object which, when invoking CoordinateSystem.getDomain(), gives me a collection of only 1 element while 2 elements were expected.

Maybe an easy fix would be to change the type of the domain private field from HashSet to List?

@JohnLCaron
Copy link
Collaborator

HI Martin, can you send me the file in question? Thanks

@desruisseaux
Copy link
Author

Sure, I will send it by private email in a few minutes. A ncdump -h of that file shows the following header (the key point is that phony_dim_0 and phony_dim_1 have the same value):

group: Geometry_data {
  dimensions:
  	phony_dim_0 = 126 ;
  	phony_dim_1 = 126 ;
  variables:
  	float Latitude(phony_dim_0, phony_dim_1) ;
        (…snip…)
  	float Longitude(phony_dim_0, phony_dim_1) ;
        (…snip…)

The following Java code reproduces the problem:

import ucar.nc2.dataset.CoordinateSystem;
import ucar.nc2.dataset.NetcdfDataset;
import ucar.nc2.dataset.NetcdfDatasets;

final class Test {
    public static void main(String[] args) throws Exception {
        try (NetcdfDataset file = NetcdfDatasets.openDataset("GC1SG1_202105251107C25504_L2SG_SSTDK_2000.h5")) {
            for (CoordinateSystem cs : file.getCoordinateSystems()) {
                System.out.println(cs.getDomain().size());
            }
        }
    }
}

Above code prints 1 with above test file, while it correctly prints 2 for any other file where phony_dim_0 and phony_dim_1 have different values.

@desruisseaux
Copy link
Author

I thought that I had your email address, but couldn't find it back. Is there a place where I can upload a 1.3 MB file?

@lesserwhirls
Copy link
Collaborator

lesserwhirls commented Dec 9, 2021

You should be able to attach the file to this issue as long as it is compressed (.zip or .gz)

@JohnLCaron
Copy link
Collaborator

Hi Martin: I agree that something better could be done. What version of the library are you using?

@desruisseaux
Copy link
Author

I'm using 5.4.2 downloaded from UCAR Maven repository.

@JohnLCaron
Copy link
Collaborator

The problem is that we have 4 anonymous dimensions, and we have to know that the dim0 and dim1 are the same for the two variables Latitude and Longitude, but dim0 != dim1. That is plain to see by inspection of the Dim0 and Dim1 attributes:

variables:
  float Latitude(126, 126);
    :Dim0 = "Line grids";
    :Dim1 = "Pixel grids";
    :Error_value = -999.0f; // float
    :Maximum_valid_value = 90.0f; // float
    :Minimum_valid_value = -90.0f; // float
    :Resampling_interval = 10; // int
    :Resampling_interval_unit = "pixel";
    :Slope = 1.0f; // float
    :Offset = 0.0f; // float
    :Unit = "degree";
    :Data_description = "Latitude (degree)";
    :_ChunkSizes = 10U, 10U; // uint

  float Longitude(126, 126);
    :Dim0 = "Line grids";
    :Dim1 = "Pixel grids";
    :Error_value = -999.0f; // float
    :Maximum_valid_value = 180.0f; // float
    :Minimum_valid_value = -180.0f; // float
    :Resampling_interval = 10; // int
    :Resampling_interval_unit = "pixel";
    :Slope = 1.0f; // float
    :Offset = 0.0f; // float
    :Unit = "degree";
    :Data_description = "Longitude (degree)";
    :_ChunkSizes = 10U, 10U; // uint

By design, this kind of dataset specific knowledge is placed in a CoordSysBuilder. Is there a good reason to not do so here?

@desruisseaux
Copy link
Author

No reason to not use CoordSysBuilder, but I would have thought that this is what NetcdfDataset is already doing. As shown in the code snippet above, I just open the file with NetcdfDatasets.openDataset(…) and ask for the CoordinateSystem instances that it built. I did not verified how NetcdfDataset build them…

@JohnLCaron
Copy link
Collaborator

JohnLCaron commented Dec 15, 2021

Yes it does use CoordSystemBuilder, what I meant to say is that we want to make a JAXA specific subclass that can key in on the :DimX attributes and assign shared dimensions.

Since the data variables are also using the same names, there will be two identically named dimensions in each group (Geometry_data and Image_data) with different lengths:

   group: Image_data {
    variables:
      ushort Cloud_probability(1242, 1250);
        :Data_description = "Cloud Probability";
        :Dim0 = "Line grids";
        :Dim1 = "Pixel grids";
        :Error_DN = 255UB; // ubyte
        :Maximum_valid_DN = 100UB; // ubyte
        :Minimum_valid_DN = 0UB; // ubyte
        :Slope = 1.0f; // float
        :Offset = 0.0f; // float
        :Spatial_resolution = 1000.0f; // float
        :Spatial_resolution_unit = "meter";
        :Unit = "%";
        :_ChunkSizes = 116U, 157U; // uint

I guess your software knows all about that and is doing coordinate interpolation? One could place that into the CoordSystemBuilder also.

The original idea was that people could add their own CoordSystemBuilderFactory class and add it at runtime. But it never caught on, too complicated i think.

@JohnLCaron
Copy link
Collaborator

PS: thanks for the note on NPE in compare(), im changing so nulls arent used (in version 8, not sure about backporting).

@JohnLCaron
Copy link
Collaborator

Ive prototyped adding a JAXA CoordSystemBuilder. However, you dont get any CoordinateSystems because the Coordinate Variables (Latitude, Longiture, Time) dont match the data Variables (Cloud_probability, QA_flag, SST).

to do that we would have to include the interpolation code and generate Latitude(Line_grids=1242, Pixel_grids=1250), Longitude(Line_grids=1242, Pixel_grids=1250) etc.

@JohnLCaron
Copy link
Collaborator

The current ncdump with prototyped JAXA CoordSystemBuilder:

netcdf /home/snake/Downloads/GC1SG1_202105251107C25504_L2SG_SSTDK_2000.h5 {
  group: Geometry_data {
    dimensions:
      Line_grids = 126;
      Pixel_grids = 126;
    variables:
      float Latitude(Line_grids=126, Pixel_grids=126);
        :Dim0 = "Line grids";
        :Dim1 = "Pixel grids";
        :Error_value = -999.0f; // float
        :Maximum_valid_value = 90.0f; // float
        :Minimum_valid_value = -90.0f; // float
        :Resampling_interval = 10; // int
        :Resampling_interval_unit = "pixel";
        :Slope = 1.0f; // float
        :Offset = 0.0f; // float
        :Unit = "degree";
        :Data_description = "Latitude (degree)";
        :_ChunkSizes = 10U, 10U; // uint

      float Longitude(Line_grids=126, Pixel_grids=126);
        :Dim0 = "Line grids";
        :Dim1 = "Pixel grids";
        :Error_value = -999.0f; // float
        :Maximum_valid_value = 180.0f; // float
        :Minimum_valid_value = -180.0f; // float
        :Resampling_interval = 10; // int
        :Resampling_interval_unit = "pixel";
        :Slope = 1.0f; // float
        :Offset = 0.0f; // float
        :Unit = "degree";
        :Data_description = "Longitude (degree)";
        :_ChunkSizes = 10U, 10U; // uint

      short Obs_time(Line_grids=126, Pixel_grids=126);
        :Dim0 = "Line grids";
        :Dim1 = "Pixel grids";
        :Error_DN = -32768S; // short
        :Maximum_valid_DN = 32767S; // short
        :Minimum_valid_DN = -32767S; // short
        :Resampling_interval = 10; // int
        :Resampling_interval_unit = "pixel";
        :Slope = 0.001f; // float
        :Offset = 0.0f; // float
        :Unit = "hour";
        :Data_description = "Observation time (hour)";
        :_ChunkSizes = 126U, 126U; // uint

      short Sensor_zenith(Line_grids=126, Pixel_grids=126);
        :Data_description = "Sensor zenith angle (from the earth surface)";
        :Dim1 = "Pixel grids";
        :Error_DN = -32768S; // short
        :Maximum_valid_DN = 32767S; // short
        :Minimum_valid_DN = -32767S; // short
        :Resampling_interval = 10; // int
        :Resampling_interval_unit = "pixel";
        :Slope = 0.01f; // float
        :Offset = 0.0f; // float
        :Unit = "degree";
        :Dim0 = "Line grids";
        :_ChunkSizes = 126U, 126U; // uint

      short Solar_zenith(Line_grids=126, Pixel_grids=126);
        :Data_description = "Solar zenith angle (from the earth surface)";
        :Dim1 = "Pixel grids";
        :Error_DN = -32768S; // short
        :Maximum_valid_DN = 32767S; // short
        :Minimum_valid_DN = -32767S; // short
        :Resampling_interval = 10; // int
        :Resampling_interval_unit = "pixel";
        :Slope = 0.01f; // float
        :Offset = 0.0f; // float
        :Unit = "degree";
        :Dim0 = "Line grids";
        :_ChunkSizes = 126U, 126U; // uint

    // group attributes:
    :Grid_interval = 10000.0f; // float
    :Grid_interval_unit = "meter";
    :Image_projection = "L1B reference grid";
    :Latitude_unit = "degree North";
    :Longitude_unit = "degree East";
    :Lower_left_latitude = 61.536434f; // float
    :Lower_left_longitude = 166.4001f; // float
    :Lower_right_latitude = 57.32705f; // float
    :Lower_right_longitude = 138.88922f; // float
    :Upper_left_latitude = 50.806602f; // float
    :Upper_left_longitude = 168.65584f; // float
    :Upper_right_latitude = 47.533165f; // float
    :Upper_right_longitude = 147.03899f; // float
    :Number_of_lines = 126; // int
    :Number_of_pixels = 126; // int
  }

  group: Global_attributes {
    // group attributes:
    :Algorithm_developer = "Japan Aerospace Exploration Agency (JAXA)";
    :Algorithm_version = "2.00";
    :Dataset_description = "Sea surface temperatures determined by using the TIR 1 and 2 data of SGLI";
    :Parameter_version = "000.00";
    :Product_file_name = "GC1SG1_202105251107C25504_L2SG_SSTDK_2000.h5";
    :Product_level = "Level-2";
    :Product_version = "2000";
    :RSP_path_number = 255; // int
    :Scene_number = 4; // int
    :Total_orbit_number = 17823; // int
    :Satellite = "Global Change Observation Mission - Climate (GCOM-C)";
    :Scene_start_time = "20210525 11:06:50.172";
    :Scene_center_time = "20210525 11:08:21.493";
    :Scene_end_time = "20210525 11:09:52.814";
    :Product_name = "SGLI SST";
    :Sensor = "Second-generation Global Imager (SGLI)";
  }

  group: Image_data {
    dimensions:
      Line_grids = 1242;
      Pixel_grids = 1250;
      L1B-lines = 1242;
    variables:
      ushort Cloud_probability(Line_grids=1242, Pixel_grids=1250);
        :Data_description = "Cloud Probability";
        :Dim0 = "Line grids";
        :Dim1 = "Pixel grids";
        :Error_DN = 255UB; // ubyte
        :Maximum_valid_DN = 100UB; // ubyte
        :Minimum_valid_DN = 0UB; // ubyte
        :Slope = 1.0f; // float
        :Offset = 0.0f; // float
        :Spatial_resolution = 1000.0f; // float
        :Spatial_resolution_unit = "meter";
        :Unit = "%";
        :_ChunkSizes = 116U, 157U; // uint

      double Line_tai93(L1B-lines=1242);
        :Dim0 = "L1B-lines";
        :Error_value = -1.0; // double
        :Maximum_valid_value = 9.99999999E8; // double
        :Minimum_valid_value = 0.0; // double
        :Unit = "second";
        :Data_description = "TAI93 at each line";
        :_ChunkSizes = 1242U; // uint

      uint QA_flag(Line_grids=1242, Pixel_grids=1250);
        :Data_description = "Bit-0:invalid data,1:land,2:Rejected by QC,3:retrieval error,4:invalid data(TIR1),5:invalid data(TIR2),6-7:reserved,8:0:nighttime or no visible data,1:daytime,9:near land,10:cloudy,11:unknown clear/cloudy,12:possibly cloudy, 13:acceptable,14:good,15:reserved";
        :Dim1 = "Pixel grids";
        :Error_DN = 65535US; // ushort
        :Maximum_valid_DN = 65534US; // ushort
        :Minimum_valid_DN = 0US; // ushort
        :Slope = 1.0f; // float
        :Offset = 0.0f; // float
        :Spatial_resolution = 1000.0f; // float
        :Spatial_resolution_unit = "meter";
        :Unit = "NA";
        :Dim0 = "Line grids";
        :_ChunkSizes = 116U, 157U; // uint

      uint SST(Line_grids=1242, Pixel_grids=1250);
        :Maximum_valid_DN = 65531US; // ushort
        :Minimum_valid_DN = 0US; // ushort
        :Mask_for_statistics = 7743US; // ushort
        :Slope = 0.0012f; // float
        :Offset = -10.0f; // float
        :Spatial_resolution = 1000.0f; // float
        :Spatial_resolution_unit = "meter";
        :Unit = "degree";
        :Data_description = "Sea Surface Temperature[SST]: SST[degree]=DN*Slope+Offset";
        :Dim0 = "Line grids";
        :Dim1 = "Pixel grids";
        :Error_DN = 65535US; // ushort
        :Land_DN = 65534US; // ushort
        :Cloud_error_DN = 65533US; // ushort
        :Retrieval_error_DN = 65532US; // ushort
        :_ChunkSizes = 116U, 157U; // uint

    // group attributes:
    :Grid_interval = 1000.0f; // float
    :Grid_interval_unit = "meter";
    :Image_projection = "L1B reference grid";
    :Number_of_lines = 1242; // int
    :Number_of_pixels = 1250; // int
  }

  group: Level_1_attributes {
    // group attributes:
    :Operation_mode = "OBD";
    :Radiometric_calibration = "Original";
    :Geometric_calibration = "Original";
  }

  group: Processing_attributes {
    // group attributes:
    :Contact_point = "JAXA/Earth Observation Research Center (EORC)";
    :Processing_organization = "JAXA/GCOM-C science project";
    :Processing_UT = "20210526 07:22:39";
    :Ancillary_data_information = "Z__C_RJTD_20210525180000_OCN_GPV_Rgl_Gll0p25deg_Pss_ANAL_Tjaxa_grib2.bin ";
    :Input_files = "GC1SG1_202105251107C25504_1BSG_IRSDL_2002.h5 GC1SG1_202105251107C25504_1BSG_VNRDK_2002.h5";
    :Processing_result = "Good";
  }

  // global attributes:
  :_CoordSysBuilder = "ucar.nc2.internal.dataset.conv.JaxaGcomConvention";
}

@desruisseaux
Copy link
Author

A JAXA CoordinateSystem would be helpful. Interpolation are indeed needed, but on my side this is already handled. If we can get the CoordinateSystem to always have 2 axes instead of 1, this is already a good start (enough for my need, but I agree that it may not be complete).

@JohnLCaron
Copy link
Collaborator

The problem is that there is no CoordinateSystem assigned, because there are no Coordinate Axes that match the data values' dimensions (domain). We could assign it anyway, but standard tools would break on it.

  1. You could process the NetcdfDataset yourself.
  2. We could create the CoordinateSystem but not assign it to the data variables.
  3. The only way that standard tools would work is if we added the interpolation inside the CoordSysBuilder.

Perhaps I should ask what functionality do you get out of the CoordinateSystem being assigned to the data variable? Do you want standard tools to work (eg TDS' WMS and Coverage server)?

@desruisseaux
Copy link
Author

Actually I do not use the CoordinateSystem attached to the variable, but the CoordinateSystem attached to the NetcdfDataset. And the only functionality that I'm using from these CoordinateSystem are the getDomain() and getCoordinateAxes() methods. I do not use the WMS and Coverage server, and I do not need support for the interpolation because I do it myself (but it would of course be nice if the library supports it). In a nutshell, the only thing that I need is that this test case prints "2" instead of "1":

import ucar.nc2.dataset.CoordinateSystem;
import ucar.nc2.dataset.NetcdfDataset;
import ucar.nc2.dataset.NetcdfDatasets;

final class Test {
    public static void main(String[] args) throws Exception {
        try (NetcdfDataset file = NetcdfDatasets.openDataset("GC1SG1_202105251107C25504_L2SG_SSTDK_2000.h5")) {
            for (CoordinateSystem cs : file.getCoordinateSystems()) {
                System.out.println(cs.getDomain().size());
            }
        }
    }
}

It seems to me that the problem could be solved by updating this field, by replacing the HashSet by an ArrayList. It seems to be the only thing needed for solving my problem, but I do not know if it would have unexpected effects elsewhere in the library…

@JohnLCaron
Copy link
Collaborator

CoordinateSystem's depend on knowing the unique set of Dimensions, which is their "domain".

The problem is in Dimension.equals() for unshared Dimensions. Im experimenting with changing that so that unshared Dimensions dont equal anything but themselves.

Really the problem is that HDF5 doesnt have a coherent syntax for shared dimensions and/or the dataset writer doesnt use what HDF5 does provide (Dimension scales) and instead uses a bespoke Convention. Which is supposed to be fixed by a bespoke CoordSysBuilder. Which we could do except that we dont handle the interpolated satellite case.

I think Id like to add the interpolated satellite case to version 8 as a CoordSysBuilder, and do some minimal thing for ver5.

  1. would you be willing to let me incorporate your interpolation code into version 8 netcdf-java library version 8?
  2. would you check to see if I can include the JAXA data set you sent for our unit tests? Or find an example file we can use?

thanks

@desruisseaux
Copy link
Author

Sure you are welcome to incorporate the interpolation code :-). But it may be a significant work. The interpolation from grid to geographic coordinates is relatively easy, but the inverse interpolation (from geographic coordinates to grid) is much more difficult.

I will ask for including a JAXA file in unit test and will report back on this issue.

Note: about above-cited difficulty with interpolation, in the particular case of those JAXA data, interpolations become easier if we convert the geographic coordinates to a Universal Transverse Mercator (UTM) projection. The reason is that the grid is highly non-linear when coordinates are expressed in latitude/longitude, but become much more linear (i.e. it looks more like a regular grid) after coordinates are projected to UTM. I mention that as a point worth to keep in mind when designing the software (i.e. to have the capability to project the coordinates to another Coordinate Reference System before to use them for interpolations).

@JohnLCaron
Copy link
Collaborator

Seems reasonable to convert to UTM. Can you send me a pointer to your code? Any tests you have would be helpful.

@desruisseaux
Copy link
Author

Sure. The way interpolation is implemented, an initial step is to compute a linear approximation of the interpolation. This is the work of LinearTransformBuilder class, which takes in input the coordinates of the localization grid and gives in output a java.awt.geom.AffineTransform computed by the "less root mean square" method. This step is not really necessary for forward interpolation, but is more helpful for inverse interpolation. Note that this class also provides some methods for pre-processing the geographic coordinates before to compute the linear interpolation, such as resolveWraparoundAxis. This is necessary when data crosses the ±180° meridian.

A helper class used by LinearTransformBuilder is Plane, used for the "less root mean square" method. There is probably code that you can remove in both classes, especially if you are interested in the 2D case only.

After we got an AffineTransform approximation of the interpolation, I transform all coordinates by the inverse of that approximation. The new coordinates that we get after this transformation are close to the original grid coordinates. For example for grid coordinates (6, 8) we may have transformed coordinates (6.046, 7.996), for grid coordinates (20, 30) we may have transformed coordinates (20.005, 30.002), etc.

The code doing the interpolation itself is InterpolatedTransform. This code was intended primarily for handling datum shifts; the use of netCDF localization grid is then considered as a special kind of datum shift (from "grid datum" to geodesic datum). A significant part of this code is about parameters and Well Known Text supports, which can be ignored. The important method is transform(double[] srcPts, int srcOff, final double[] dstPts, int dstOff, int numPts) (half of the method complexity is about handling overlapping source and destination arrays, which could be ignored as well). That method delegates most of its work to DatumShiftGrid.

The inverse transform is handled by the transform(…) method in the Inverse inner class. This is the most difficult part. It requires that DatumShiftGrid not only performs interpolation, but is also capable to provide an estimation of the derivative (i.e. the Jacobian matrix) at a given point of the grid. The basic idea is to apply the iterative algorithm defined by NADCON for inverse transformation. But the NADCON algorithm assumes that the coordinates before and after transformations are very close, which is usually the case with geodesic datum shifts. In other words, the NADCON algorithm has an implicit assumption that the Jacobian matrix is very close to the identity matrix. This is not the case of JAXA data, so the "real" Jacobian matrix has to be taken in account.

I realize that the above is quite complex. The above-cited code is designed for integrating into a chain of ISO 19111 CoordinateOperation steps. It can surely be simplified by dropping functionality not needed by ucar.nc2.dataset.CoordinateSystem, but the result may still complex I think.

There is some JUnit tests for the above classes, but not for the specific case of JAXA file for data licensing reason.

@desruisseaux
Copy link
Author

About including the GCOM-C file in JUnit test suite, my contact said that JAXA is positive about that, but would like to know if the following conditions are okay for UCAR. The terms of use are given in GCOM-C Data Provision Policy PDF document on JAXA G-Portal web site, section 10 at page 5:

Any publication of outcomes in consequence of the use of GCOM-C data identified in Table 1 shalll be accompanied by any one of the following indications;

  • ©JAXA
  • ©Japan Aerospace Exploration Agency
  • Copyright: JAXA
  • Copyright: Japan Aerospace Exploration Agency
  • ©JAXA [Year] ALL RIGHTS RESERVED

Redistribution under above condition is allowed by:

"(d) Redistribution of the data
...
Redistribution of any Standard Product to a third party is allowed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants