Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Zero Division in diagnostics.performance_metrics() causing failed assertion #2577

Closed
ThomasChia opened this issue May 1, 2024 · 0 comments

Comments

@ThomasChia
Copy link
Contributor

Issue

When calculating the SMAPE metric in the diagnostics.performance_metrics() function, there is the possibility for a zero division if y and yhat are both zero.

def smape(df, w):
    """Symmetric mean absolute percentage error
    based on Chen and Yang (2004) formula

    Parameters
    ----------
    df: Cross-validation results dataframe.
    w: Aggregation window size.

    Returns
    -------
    Dataframe with columns horizon and smape.
    """
    sape = np.abs(df['y'] - df['yhat']) / ((np.abs(df['y']) + np.abs(df['yhat'])) / 2)    <---- POSSIBLE ZERO DIVISION
    if w < 0:
        return pd.DataFrame({'horizon': df['horizon'], 'smape': sape})
    return rolling_mean_by_h(
        x=sape.values, h=df['horizon'].values, w=w, name='smape'
    )

This does not cause an error directly, however, it results in np.nan values where zero division occurs. When the rolling_mean_by_h() function is called, there is a groupby() which removes any np.nan values. This becomes an issue in the main performance_metrics() function with the following assert:

assert np.array_equal(res['horizon'].values, res_m['horizon'].values)

This is part of a loop that checks each of the metrics ensuring that they are the same length and fails given the above scenario, as np.nan values are removed and that metric returns fewer values.

Replication

Here is how you can replicate this issue:

import pandas as pd
from prophet import Prophet
from prophet.diagnostics import cross_validation, performance_metrics

df = pd.read_csv('https://raw.githubusercontent.com/facebook/prophet/main/examples/example_wp_log_peyton_manning.csv')

df['ds'] = pd.to_datetime(df['ds'])
df.loc[df['ds'].dt.dayofweek == 6, 'y'] = 0

m = Prophet()
m.fit(df)

df_cv = cross_validation(m, '365 days', initial='1825 days', period='365 days')
df_cv['yhat'] = df_cv['yhat'].clip(lower=0)
metrics = performance_metrics(df_cv)

We set certain values in the training data to zero and clip negative values to create a scenario where y and yhat are both zero.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant