Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Problems feeding data to operational model: Target variable geopotential_at_surface must be time-dependent #61

Open
EloyAnguiano opened this issue Feb 8, 2024 · 1 comment

Comments

@EloyAnguiano
Copy link

Hi, I am trying to execute the graphcast operational model with my own data and it seems to be a problem with the xarray object I build with operational data.

When I run an script that get the input_data from google cloud, it works just fine, and those data look like this:

 (Pdb) eval_inputs
<xarray.Dataset>
Dimensions:                       (batch: 1, time: 2, lat: 721, lon: 1440,
                                   level: 13)
Coordinates:
  * lon                           (lon) float32 0.0 0.25 0.5 ... 359.5 359.8
  * lat                           (lat) float32 -90.0 -89.75 ... 89.75 90.0
  * level                         (level) int32 50 100 150 200 ... 850 925 1000
  * time                          (time) timedelta64[ns] -1 days +18:00:00 00...
Dimensions without coordinates: batch
Data variables: (12/17)
    2m_temperature                (batch, time, lat, lon) float32 250.3 ... 2...
    mean_sea_level_pressure       (batch, time, lat, lon) float32 9.936e+04 ....
    10m_v_component_of_wind       (batch, time, lat, lon) float32 -0.4746 ......
    10m_u_component_of_wind       (batch, time, lat, lon) float32 -5.817 ... ...
    temperature                   (batch, time, level, lat, lon) float32 238....
    geopotential                  (batch, time, level, lat, lon) float32 1.98...
    ...                            ...
    year_progress_sin             (batch, time) float32 0.006986 0.01129
    year_progress_cos             (batch, time) float32 1.0 0.9999
    day_progress_sin              (batch, time, lon) float32 0.0 ... 1.0
    day_progress_cos              (batch, time, lon) float32 1.0 ... 0.004363
    geopotential_at_surface       (lat, lon) float32 2.735e+04 ... -0.07617
    land_sea_mask                 (lat, lon) float32 1.0 1.0 1.0 ... 0.0 0.0 0.0

And when I build my xarray object looks like this:

(Pdb) input_data
<xarray.Dataset>
Dimensions:                       (lat: 721, lon: 1440, time: 2, level: 13,
                                   batch: 1)
Coordinates:
  * lat                           (lat) float64 -90.0 -89.75 ... 89.75 90.0
  * lon                           (lon) float64 -180.0 -179.8 ... 179.5 179.8
  * time                          (time) timedelta64[ns] -1 days +18:00:00 00...
  * level                         (level) float64 50.0 100.0 ... 925.0 1e+03
  * batch                         (batch) int64 1
Data variables: (12/16)
    temperature                   (batch, time, lat, lon, level) float32 239....
    u_component_of_wind           (batch, time, lat, lon, level) float32 1.65...
    v_component_of_wind           (batch, time, lat, lon, level) float32 -14....
    geopotential                  (batch, time, lat, lon, level) float32 1.98...
    specific_humidity             (batch, time, lat, lon, level) float32 3.09...
    10m_v_component_of_wind       (batch, time, lat, lon) float32 -0.6771 ......
    ...                            ...
    mean_sea_level_pressure       (batch, time, lat, lon) float32 9.939e+04 ....
    toa_incident_solar_radiation  (batch, time, lat, lon) float64 554.8 ... 0.0
    year_progress_sin             (batch, time) float64 -0.008601 0.0
    year_progress_cos             (batch, time) float64 1.0 1.0
    day_progress_sin              (batch, time, lon) float64 -1.0 -1.0 ... 0.0
    day_progress_cos              (batch, time, lon) float64 -1.837e-16 ... 1.0

The problem is that when I try to run the model with the rollout.chunked_prediction method with the eval_inputs data it works just fine, but when I use my input_data get the following error:

Traceback (most recent call last):
  File "/home/eloy.anguiano/repos/graphcast/1.get_data.py", line 342, in <module>
    predictions = rollout.chunked_prediction(
  File "/home/eloy.anguiano/repos/graphcast/graphcast/rollout.py", line 68, in chunked_prediction
    for prediction_chunk in chunked_prediction_generator(
  File "/home/eloy.anguiano/repos/graphcast/graphcast/rollout.py", line 164, in chunked_prediction_generator
    predictions = predictor_fn(
  File "/home/eloy.anguiano/repos/graphcast/1.get_data.py", line 199, in <lambda>
    return lambda **kw: fn(**kw)[0]
  File "/home/eloy.anguiano/miniconda3/envs/graphcast_iic/lib/python3.10/site-packages/haiku/_src/transform.py", line 456, in apply_fn
    out = f(*args, **kwargs)
  File "/home/eloy.anguiano/repos/graphcast/1.get_data.py", line 165, in run_forward
    return predictor(inputs, targets_template=targets_template, forcings=forcings)
  File "/home/eloy.anguiano/repos/graphcast/graphcast/autoregressive.py", line 163, in __call__
    self._validate_targets_and_forcings(targets_template, forcings)
  File "/home/eloy.anguiano/repos/graphcast/graphcast/autoregressive.py", line 103, in _validate_targets_and_forcings
    raise ValueError(f'Target variable {name} must be time-dependent.')
ValueError: Target variable geopotential_at_surface must be time-dependent.

I seems a bit strange as both datasets have that variable not time dependant, so I would like to know If there is anything else wrong with the data that raises this error by any chance. Here is the problematic variable at both variables:
Tutorial data

(Pdb) eval_inputs.geopotential_at_surface
<xarray.DataArray 'geopotential_at_surface' (lat: 721, lon: 1440)>
array([[ 2.7354750e+04,  2.7354750e+04,  2.7354750e+04, ...,
         2.7354750e+04,  2.7354750e+04,  2.7354750e+04],
       [ 2.7163490e+04,  2.7165285e+04,  2.7167082e+04, ...,
         2.7159000e+04,  2.7159898e+04,  2.7161693e+04],
       [ 2.6957861e+04,  2.6961453e+04,  2.6965045e+04, ...,
         2.6949779e+04,  2.6952475e+04,  2.6956066e+04],
       ...,
       [-1.8730469e+00, -1.8730469e+00, -1.8730469e+00, ...,
        -1.8730469e+00, -1.8730469e+00, -1.8730469e+00],
       [ 4.4121094e+00,  4.4121094e+00,  4.4121094e+00, ...,
         4.4121094e+00,  4.4121094e+00,  4.4121094e+00],
       [-7.6171875e-02, -7.6171875e-02, -7.6171875e-02, ...,
        -7.6171875e-02, -7.6171875e-02, -7.6171875e-02]], dtype=float32)
Coordinates:
  * lon      (lon) float32 0.0 0.25 0.5 0.75 1.0 ... 359.0 359.2 359.5 359.8
  * lat      (lat) float32 -90.0 -89.75 -89.5 -89.25 ... 89.25 89.5 89.75 90.0

My data

(Pdb) input_data.geopotential_at_surface
<xarray.DataArray 'geopotential_at_surface' (lat: 721, lon: 1440)>
array([[ 2.7109883e+04,  2.7109883e+04,  2.7109883e+04, ...,
         2.7109883e+04,  2.7109883e+04,  2.7109883e+04],
       [ 2.7554883e+04,  2.7553883e+04,  2.7551883e+04, ...,
         2.7561883e+04,  2.7559883e+04,  2.7556883e+04],
       [ 2.8437883e+04,  2.8431883e+04,  2.8425883e+04, ...,
         2.8454883e+04,  2.8448883e+04,  2.8442883e+04],
       ...,
       [-5.1181641e+00, -5.1181641e+00, -5.1181641e+00, ...,
        -4.1181641e+00, -4.1181641e+00, -4.1181641e+00],
       [ 1.0881836e+01,  9.8818359e+00,  9.8818359e+00, ...,
         1.0881836e+01,  1.0881836e+01,  1.0881836e+01],
       [ 1.8818359e+00,  1.8818359e+00,  1.8818359e+00, ...,
         1.8818359e+00,  1.8818359e+00,  1.8818359e+00]], dtype=float32)
Coordinates:
  * lat      (lat) float64 -90.0 -89.75 -89.5 -89.25 ... 89.25 89.5 89.75 90.0
  * lon      (lon) float64 -180.0 -179.8 -179.5 -179.2 ... 179.2 179.5 179.8

Could it be the longitude values that raises an uncontrolled error? Does anyone know any tip to continue?

@tewalds
Copy link
Member

tewalds commented Feb 22, 2024

It looks like your lon values are (-180, 180) instead of (0, 360). I'm not sure if that matters, but it certainly looks suspicious.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants