Inference Single Item on model trained on Multiple Items #3128

Alex-Wenner-FHR · 2024-02-16T14:13:24Z

I am using:

gluonts: latest
python: 3.11.0

I have a TemporalFusionTransformer that was trained with a PandasDataset.from_long_dataframe(...). In this PandasDataset I have multiple item_ids

|item_id| ... 
|-------|
|cat1   |
|cat2   |
|cat3...|

This dataset includes several past_feat_dynamic_reals and a few static_features.

I want to predict on just one category. However when I do something like

df = df.loc[df['item_id'] == 'cat1']
sample_group = PandasDataset.from_long_dataframe(df, **same_dataset_spec_used_for_training)
forecasts = model.predict(dataset = sample_group)
next(iter(forecasts))

I get the following error:

IndexError                                Traceback (most recent call last)
Cell In[124], line 9
      7 model = Pred.deserialize(pathlib.Path(\"./model\"))
      8 forecasts = model.predict(dataset = sample_group)
----> 9 next(iter(forecasts))

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/torch/model/predictor.py:90, in PyTorchPredictor.predict(self, dataset, num_samples)
     87 self.prediction_net.eval()
     89 with torch.no_grad():
---> 90     yield from self.forecast_generator(
     91         inference_data_loader=inference_data_loader,
     92         prediction_net=self.prediction_net,
     93         input_names=self.input_names,
     94         output_transform=self.output_transform,
     95         num_samples=num_samples,
     96     )

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/model/forecast_generator.py:117, in QuantileForecastGenerator.__call__(self, inference_data_loader, prediction_net, input_names, output_transform, num_samples, **kwargs)
    108 def __call__(
    109     self,
    110     inference_data_loader: DataLoader,
   (...)
    115     **kwargs
    116 ) -> Iterator[Forecast]:
--> 117     for batch in inference_data_loader:
    118         inputs = select(input_names, batch, ignore_missing=True)
    119         outputs = predict_to_numpy(prediction_net, inputs)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:111, in TransformedDataset.__iter__(self)
    110 def __iter__(self) -> Iterator[DataEntry]:
--> 111     yield from self.transformation(
    112         self.base_dataset, is_train=self.is_train
    113     )

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/loader.py:50, in Batch.__call__(self, data, is_train)
     49 def __call__(self, data, is_train):
---> 50     yield from batcher(data, self.batch_size)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/itertools.py:131, in batcher.<locals>.get_batch()
    130 def get_batch():
--> 131     return list(itertools.islice(it, batch_size))

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:186, in FlatMapTransformation.__call__(self, data_it, is_train)
    182 def __call__(
    183     self, data_it: Iterable[DataEntry], is_train: bool
    184 ) -> Iterator:
    185     num_idle_transforms = 0
--> 186     for data_entry in data_it:
    187         num_idle_transforms += 1
    188         for result in self.flatmap_transform(data_entry.copy(), is_train):

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

    [... skipping similar frames: MapTransformation.__call__ at line 132 (5 times)]

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/transform/_base.py:132, in MapTransformation.__call__(self, data_it, is_train)
    129 def __call__(
    130     self, data_it: Iterable[DataEntry], is_train: bool
    131 ) -> Iterator:
--> 132     for data_entry in data_it:
    133         try:
    134             yield self.map_transform(data_entry.copy(), is_train)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:217, in PandasDataset.__iter__(self)
    216 def __iter__(self):
--> 217     yield from self._data_entries
    218     self.unchecked = True

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/gluonts/dataset/pandas.py:188, in PandasDataset._pair_to_dataentry(self, item_id, df)
    179 if not self.unchecked:
    180     assert is_uniform(df.index), (
    181         \"Dataframe index is not uniformly spaced. \"
    182         \"If your dataframe contains data from multiple series in the \"
    183         'same column (\"long\" format), consider constructing the '
    184         \"dataset with `PandasDataset.from_long_dataframe` instead.\"
    185     )
    187 entry = {
--> 188     \"start\": df.index[0],
    189 }
    191 target = df[self.target].values
    192 target = target[: len(target) - self.future_length]

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/indexes/base.py:5385, in Index.__getitem__(self, key)
   5382 if is_integer(key) or is_float(key):
   5383     # GH#44051 exclude bool, which would return a 2d ndarray
   5384     key = com.cast_scalar_indexer(key)
-> 5385     return getitem(key)
   5387 if isinstance(key, slice):
   5388     # This case is separated from the conditional above to avoid
   5389     # pessimization com.is_bool_indexer and ndim checks.
   5390     return self._getitem_slice(key)

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/datetimelike.py:379, in DatetimeLikeArrayMixin.__getitem__(self, key)
    372 \"\"\"
    373 This getitem defers to the underlying array, which by-definition can
    374 only handle list-likes, slices, and integer scalars
    375 \"\"\"
    376 # Use cast as we know we will get back a DatetimeLikeArray or DTScalar,
    377 # but skip evaluating the Union at runtime for performance
    378 # (see https://github.com/pandas-dev/pandas/pull/44624)
--> 379 result = cast(\"Union[Self, DTScalarOrNaT]\", super().__getitem__(key))
    380 if lib.is_scalar(result):
    381     return result

File ~/.pyenv/versions/3.11.0/lib/python3.11/site-packages/pandas/core/arrays/_mixins.py:284, in NDArrayBackedExtensionArray.__getitem__(self, key)
    278 def __getitem__(
    279     self,
    280     key: PositionalIndexer2D,
    281 ) -> Self | Any:
    282     if lib.is_integer(key):
    283         # fast-path
--> 284         result = self._ndarray[key]
    285         if self.ndim == 1:
    286             return self._box_func(result)

IndexError: index 0 is out of bounds for axis 0 with size 0"

Does anyone have any ideas on how one item at a time can be inferenced instead of having to pass multiple items in a dataset at once? The shape of this subset is the exact same as the training shape along with dtypes.
Thanks!

Originally posted by @Alex-Wenner-FHR in #3126

The text was updated successfully, but these errors were encountered:

Alex-Wenner-FHR · 2024-02-16T14:13:59Z

It appears, that when using the same dataset spec with my subset, the other categories are still represented for whatever reason.

for iter in ds_val._data_entries.iterable.iterable:
    print(iter)
[0 rows x 24 columns])
('cat2', Empty DataFrame
Columns: [...]
Index: []

[0 rows x 24 columns])
('cat3', Empty DataFrame
Columns: [...]
Index: []

Alex-Wenner-FHR · 2024-02-16T15:41:43Z

This is less than ideal, but doing something like this allows a single item_id to be inferenced:

iterable: tuple = ds_val._data_entries.iterable.iterable
iterable = [t for t in iterable if len(t[1]) > 1]
ds_val._data_entries.iterable.iterable = tuple(iterable)

Alex-Wenner-FHR · 2024-03-19T15:38:21Z

@lostella - has anyone from the team been able to lend an eye to this?

lostella · 2024-03-19T15:46:40Z

@Alex-Wenner-FHR predict gets a dataset just like train: if you want to only predict a specific item id, you should be able to construct a PandasDataset with only a subset of the data, and pass that to predict. Does that work?

Alex-Wenner-FHR · 2024-03-19T15:49:02Z

It does not - if you check out the issue a few comments above I put a work around that I was able to implement to get it to work, but natively it does not!

Alex-Wenner-FHR changed the title ~~Inference Single Item on model trained on Multiple Items~~ Inference Single Item on model trained on Multiple Items [bug] Mar 19, 2024

Alex-Wenner-FHR changed the title ~~Inference Single Item on model trained on Multiple Items [bug]~~ Inference Single Item on model trained on Multiple Items Mar 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inference Single Item on model trained on Multiple Items #3128

Inference Single Item on model trained on Multiple Items #3128

Alex-Wenner-FHR commented Feb 16, 2024

Alex-Wenner-FHR commented Feb 16, 2024 •

edited

Alex-Wenner-FHR commented Feb 16, 2024

Alex-Wenner-FHR commented Mar 19, 2024

lostella commented Mar 19, 2024

Alex-Wenner-FHR commented Mar 19, 2024

Inference Single Item on model trained on Multiple Items #3128

Inference Single Item on model trained on Multiple Items #3128

Comments

Alex-Wenner-FHR commented Feb 16, 2024

Alex-Wenner-FHR commented Feb 16, 2024 • edited

Alex-Wenner-FHR commented Feb 16, 2024

Alex-Wenner-FHR commented Mar 19, 2024

lostella commented Mar 19, 2024

Alex-Wenner-FHR commented Mar 19, 2024

Alex-Wenner-FHR commented Feb 16, 2024 •

edited