Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use stable sort when sorting values in forecaster #2568

Merged
merged 1 commit into from May 18, 2024
Merged

Conversation

natl
Copy link
Contributor

@natl natl commented Apr 4, 2024

When Prophet prepares a dataframe for fitting or predicting, it runs a sort on the ds column.

This changes the sort algorithm from the pandas default sort to mergesort, which is a stable sort.

This is important in situations where multiple regressors are present and so the fitted dataframe is not unique by date.

Users may do something like the following:

df = <some dataframe>
# Example df
# ds            x   y   
# 2020-01-01    0   3.2
# 2020-01-01    1   4.1
# 2020-01-02    0   3.7
# 2020-01-02    1   4.3

m = Prophet()
m.add_regressor('x')
m.fit(df)

predict_df = m.predict(df)
df['prediction'] = predict_df['yhat']

In m.predict(df), there is a call to df.sort_values('ds'). As this is not a stable sort, sometimes the x column above can get switched around, even if df is sorted. A stable sort addresses this.

This partially addresses #2322 if you construct the input correctly.

When Prophet prepares a dataframe for fitting or predicting,
it runs a sort on the `ds` column.

This changes the sort algorithm from the pandas default sort
to mergesort, which is a stable sort.

This is important in situations where multiple regressors
are present and so the fitted dataframe is not unique by date.

Users may do something like the following:

```.py
df = <some dataframe>

m = Prophet()
m.add_regressor('x')
m.fit(df)

predict_df = m.predict(df)
df['prediction'] = predict_df['yhat']
```

In `m.predict(df)`, there is a call to `df.sort_values('ds')`.
As this is not a stable sort, sometimes the `x` column above
can get switched around, even if `df` is sorted. A stable sort
addresses this.
Copy link
Collaborator

@tcuongd tcuongd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! I'm not sure Prophet was intended to be used the way you described (i.e. as a regression engine). Typical guidance is to have one regressor value per date. But I guess your example does just work out of the box, so this change makes sense to help people using it in this way. Only other consideration I'd have is performance but given time series datasets are usually too small for this to matter + this is a training time operation, I don't think it's a big deal.

@tcuongd tcuongd merged commit 50d0686 into facebook:main May 18, 2024
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants