Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inference Single Item on model trained on Multiple Items #3128

Open
Alex-Wenner-FHR opened this issue Feb 16, 2024 · 5 comments
Open

Inference Single Item on model trained on Multiple Items #3128

Alex-Wenner-FHR opened this issue Feb 16, 2024 · 5 comments

Comments

@Alex-Wenner-FHR
Copy link

I am using:

  • gluonts: latest
  • python: 3.11.0

I have a TemporalFusionTransformer that was trained with a PandasDataset.from_long_dataframe(...). In this PandasDataset I have multiple item_ids

|item_id| ... 
|-------|
|cat1   |
|cat2   |
|cat3...|

This dataset includes several past_feat_dynamic_reals and a few static_features.

I want to predict on just one category. However when I do something like

df = df.loc[df['item_id'] == 'cat1']
sample_group = PandasDataset.from_long_dataframe(df, **same_dataset_spec_used_for_training)
forecasts = model.predict(dataset = sample_group)
next(iter(forecasts))

I get the following error:

IndexError                                Traceback (most recent call last)
Cell In[124], line 9
      7 model = Pred.deserialize(pathlib.Path(\"./model\"))
      8 forecasts = model.predict(dataset = sample_group)
----> 9 next(iter(forecasts))

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/torch/model/predictor.py:90, in PyTorchPredictor.predict(self, dataset, num_samples)
     87 self.prediction_net.eval()
     89 with torch.no_grad():
---> 90     yield from self.forecast_generator(
     91         inference_data_loader=inference_data_loader,
     92         prediction_net=self.prediction_net,
     93         input_names=self.input_names,
     94         output_transform=self.output_transform,
     95         num_samples=num_samples,
     96     )

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/model/forecast_generator.py:117, in QuantileForecastGenerator.__call__(self, inference_data_loader, prediction_net, input_names, output_transform, num_samples, **kwargs)
    108 def __call__(
    109     self,
    110     inference_data_loader: DataLoader,
   (...)
    115     **kwargs
    116 ) -> Iterator[Forecast]:
--> 117     for batch in inference_data_loader:
    118         inputs = select(input_names, batch, ignore_missing=True)
    119         outputs = predict_to_numpy(prediction_net, inputs)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:111, in TransformedDataset.__iter__(self)
    110 def __iter__(self) -> Iterator[DataEntry]:
--> 111     yield from self.transformation(
    112         self.base_dataset, is_train=self.is_train
    113     )

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/loader.py:50, in Batch.__call__(self, data, is_train)
     49 def __call__(self, data, is_train):
---> 50     yield from batcher(data, self.batch_size)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/itertools.py:131, in batcher.<locals>.get_batch()
    130 def get_batch():
--> 131     return list(itertools.islice(it, batch_size))

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:186, in FlatMapTransformation.__call__(self, data_it, is_train)
    182 def __call__(
    183     self, data_it: Iterable[DataEntry], is_train: bool
    184 ) -> Iterator:
    185     num_idle_transforms = 0
--> 186     for data_entry in data_it:
    187         num_idle_transforms += 1
    188         for result in self.flatmap_transform(data_entry.copy(), is_train):

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

    [... skipping similar frames: MapTransformation.__call__ at line 132 (5 times)]

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:217, in PandasDataset.__iter__(self)
    216 def __iter__(self):
--> 217     yield from self._data_entries
    218     self.unchecked = True

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:188, in PandasDataset._pair_to_dataentry(self, item_id, df)
    179 if not self.unchecked:
    180     assert is_uniform(df.index), (
    181         \"Dataframe index is not uniformly spaced. \"
    182         \"If your dataframe contains data from multiple series in the \"
    183         'same column (\"long\" format), consider constructing the '
    184         \"dataset with `PandasDataset.from_long_dataframe` instead.\"
    185     )
    187 entry = {
--> 188     \"start\": df.index[0],
    189 }
    191 target = df[self.target].values
    192 target = target[: len(target) - self.future_length]

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/indexes/base.py:5385, in Index.__getitem__(self, key)
   5382 if is_integer(key) or is_float(key):
   5383     # GH#44051 exclude bool, which would return a 2d ndarray
   5384     key = com.cast_scalar_indexer(key)
-> 5385     return getitem(key)
   5387 if isinstance(key, slice):
   5388     # This case is separated from the conditional above to avoid
   5389     # pessimization com.is_bool_indexer and ndim checks.
   5390     return self._getitem_slice(key)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py:379, in DatetimeLikeArrayMixin.__getitem__(self, key)
    372 \"\"\"
    373 This getitem defers to the underlying array, which by-definition can
    374 only handle list-likes, slices, and integer scalars
    375 \"\"\"
    376 # Use cast as we know we will get back a DatetimeLikeArray or DTScalar,
    377 # but skip evaluating the Union at runtime for performance
    378 # (see https://github.com/pandas-dev/pandas/pull/44624)
--> 379 result = cast(\"Union[Self, DTScalarOrNaT]\", super().__getitem__(key))
    380 if lib.is_scalar(result):
    381     return result

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py:284, in NDArrayBackedExtensionArray.__getitem__(self, key)
    278 def __getitem__(
    279     self,
    280     key: PositionalIndexer2D,
    281 ) -> Self | Any:
    282     if lib.is_integer(key):
    283         # fast-path
--> 284         result = self._ndarray[key]
    285         if self.ndim == 1:
    286             return self._box_func(result)

IndexError: index 0 is out of bounds for axis 0 with size 0"

Does anyone have any ideas on how one item at a time can be inferenced instead of having to pass multiple items in a dataset at once? The shape of this subset is the exact same as the training shape along with dtypes.
Thanks!

Originally posted by @Alex-Wenner-FHR in #3126

@Alex-Wenner-FHR
Copy link
Author

Alex-Wenner-FHR commented Feb 16, 2024

It appears, that when using the same dataset spec with my subset, the other categories are still represented for whatever reason.

for iter in ds_val._data_entries.iterable.iterable:
    print(iter)
[0 rows x 24 columns])
('cat2', Empty DataFrame
Columns: [...]
Index: []

[0 rows x 24 columns])
('cat3', Empty DataFrame
Columns: [...]
Index: []

@Alex-Wenner-FHR
Copy link
Author

This is less than ideal, but doing something like this allows a single item_id to be inferenced:

iterable: tuple = ds_val._data_entries.iterable.iterable
iterable = [t for t in iterable if len(t[1]) > 1]
ds_val._data_entries.iterable.iterable = tuple(iterable)

@Alex-Wenner-FHR Alex-Wenner-FHR changed the title Inference Single Item on model trained on Multiple Items Inference Single Item on model trained on Multiple Items [bug] Mar 19, 2024
@Alex-Wenner-FHR Alex-Wenner-FHR changed the title Inference Single Item on model trained on Multiple Items [bug] Inference Single Item on model trained on Multiple Items Mar 19, 2024
@Alex-Wenner-FHR
Copy link
Author

@lostella - has anyone from the team been able to lend an eye to this?

@lostella
Copy link
Contributor

@Alex-Wenner-FHR predict gets a dataset just like train: if you want to only predict a specific item id, you should be able to construct a PandasDataset with only a subset of the data, and pass that to predict. Does that work?

@Alex-Wenner-FHR
Copy link
Author

It does not - if you check out the issue a few comments above I put a work around that I was able to implement to get it to work, but natively it does not!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants