Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch) #3129

timoschowski · 2024-02-17T21:33:56Z

Description

When loading the FOOD_3 subset of the M5 competition (cut to only data in 2016), I noticed that the performance of the negative binomial distribution changes, at least in DeepAR.

I suspect that something with the scaling is broken, but I haven't been able to pin it down unfortunately. When I look at the comparision between the two versions I can't really tell whether the negative binomial distribution changed.

Any help here is appreciated. The problem continues up to the current version.

To Reproduce

The example isn't really minimal, it's based on a notebook of the M5 data set available in colab (BUT IT DOESN'T RUN IN COLAB because of a lightning error with colab; locally it runs):

https://drive.google.com/file/d/1OOv_I7aAStgHW5iFuuKKB5r0qUW8BLxo/view?usp=sharing

Error message or code output

(Paste the complete error message, including stack trace, or the undesired output that the above snippet produces.)

The output shown here is after 15 epochs of training DeepAR on bespoke data where I've aggregated all forecasts and plot them against aggregated actuals. In v0.12 this produced an ok result after 15 epochs (it improves considerably with more epochs):

wheras in v0.13 this produces

Note however that there are small differences. The number of parameters is 76.3 K in v0.13 and in v0.12

The dataset that I'm loading is a pickled dataset which includes dynamic features (but I think I'm ignoring them in both v0.12 and v0.13)

Environment

Operating system: Mac OSX Monterey, ARM chip
Python version: 3.9.9
GluonTS version: 0.12 vs 0.13 and upwards
MXNet version: NA (this is for PyTorch, v 1.13.1)

timoschowski · 2024-02-17T22:18:49Z

@kashif @lostella I mentioned this to you some time ago and @jgasthaus FYI

lostella · 2024-02-18T10:00:44Z

@timoschowski inspecting the diff, one thing that changed is the dependency on PyTorch Lightning from 1.5 to >= 1.5. It seems like 1.7 introduced the MPS backend https://lightning.ai/pages/community/lightning-releases/pytorch-lightning-1-7-release/ which is one thing that might be causing trouble.

What version of lightning do you use?

Two options to check if this MPS thing is to be blamed:

pin lightning to 1.5 and see if it works better
on whatever version of lightning you have, set trainer_kwargs = dict(accelerator=“cpu”) when constructing the estimator, see if it’s better

I don’t see other changes between the versions that could explain this.

timoschowski · 2024-02-18T14:48:23Z

thanks @lostella, you're a wizard.

I have

import pytorch_lightning as pl
pl.__version__
'1.9.5'

when I do
"accelerator": "cpu"

the resulting output is still this:

however, when running the notebook with
!pip install -U "gluonts[torch]==0.13.0" matplotlib orjson tensorboard optuna datasets "pytorch-lightning==1.5"

results are like this for neg binomial, so indeed improved:

and performance is inline also after more epochs (500 here for v0.13 with lightning 1.5)

compare with (500 here for v0.12 with lightning 1.5)

For the moment I have a workaround by pinning the lightning version, so that's great. Huge thanks.

A couple of interesting things remain:

for v0.14 of GluonTS, a lighting version larger than 1.5 is required, so I'm stuck on v0.13... any idea here?
One thing that stands out for me is that all the distribution code shifted around, and imports are different. Did we change anything with the neg binomial implementation? Performance with student_t is exactly the same between v0.12 and v0.13 independent of lightning, so I find that curious. It doesn't really show up on the diff, so I'm wondering if you had any intuition here (I remember discussions with @kashif about this in the past)
why doesn't the notebook work on collab? Seems like the model loading doesn't work.

Of course the overall performance isn't there yet (eg peaks aren't aligned), but this is because I don't have any dynamic features included, will bring that back next.

timoschowski · 2024-02-19T20:29:51Z

adding some thoughts here. After a suggestion by @kashif I also tried running the notebook with

!pip install -U --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cpu

which gives me torch version:
'2.3.0.dev20240219'

however results are the same.

timoschowski · 2024-02-19T20:31:12Z

one thing I noted is that changing context_length from the default prediction_length to 2*prediction_length has a substantial benefit here....

lostella · 2024-02-20T09:27:18Z

for v0.14 of GluonTS, a lighting version larger than 1.5 is required, so I'm stuck on v0.13... any idea here?

No, this is an issue; we'll have to figure out what's wrong with recent lightning versions and make sure that everything runs smoothly. Also, given that setting accelerator="cpu" did not work makes me think this may not be a problem on Apple silicon only? Running the same on Linux with recent versions of lightning would answer that

Did we change anything with the neg binomial implementation?

I don't think so: this is the history of changes, and @kashif's change is the only thing that happened. It's #2749 as was part of 0.13.0 already. It really seems like something weird is going on with training.

timoschowski · 2024-02-25T09:52:31Z

Ok, I have troubles running the notebook in colab. @kashif is this something that you could take a look at? this is about loading the models, something seems to be broken there....

timoschowski added the bug Something isn't working label Feb 17, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch) #3129

Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch) #3129

timoschowski commented Feb 17, 2024 •

edited

timoschowski commented Feb 17, 2024

lostella commented Feb 18, 2024 •

edited

timoschowski commented Feb 18, 2024 •

edited

timoschowski commented Feb 19, 2024

timoschowski commented Feb 19, 2024

lostella commented Feb 20, 2024

timoschowski commented Feb 25, 2024

Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch) #3129

Performance regression in negative binomial from 0.12 to 0.13 and onwards (at least for DeepAR in PyTorch) #3129

Comments

timoschowski commented Feb 17, 2024 • edited

Description

To Reproduce

Error message or code output

Environment

timoschowski commented Feb 17, 2024

lostella commented Feb 18, 2024 • edited

timoschowski commented Feb 18, 2024 • edited

timoschowski commented Feb 19, 2024

timoschowski commented Feb 19, 2024

lostella commented Feb 20, 2024

timoschowski commented Feb 25, 2024

timoschowski commented Feb 17, 2024 •

edited

lostella commented Feb 18, 2024 •

edited

timoschowski commented Feb 18, 2024 •

edited