Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Correlations stopped working #1527

Open
3 tasks done
hakan-77 opened this issue Jan 11, 2024 · 5 comments
Open
3 tasks done

Correlations stopped working #1527

hakan-77 opened this issue Jan 11, 2024 · 5 comments

Comments

@hakan-77
Copy link

hakan-77 commented Jan 11, 2024

Current Behaviour

Trying to create a profile with default settings, correlations do not work for some relatively simple data sets with the below error:

I think this issue started with 4.6.3 and is still the case for 4.6.4.
EDIT: I can confirm that downgrading to 4.6.2 solves the issue.

/home/ubuntu/.local/lib/python3.10/site-packages/ydata_profiling/model/correlations.py:66: UserWarning: There was an attempt to calculate the auto correlation, but this failed.
To hide this warning, disable the calculation
(using `df.profile_report(correlations={"auto": {"calculate": False}})`
If this is problematic for your use case, please report this as an issue:
https://github.com/ydataai/ydata-profiling/issues
(include the error message: 'Function <code object pandas_auto_compute at 0x7f34bdca3260, file "/home/ubuntu/.local/lib/python3.10/site-packages/ydata_profiling/model/pandas/correlations_pandas.py", line 164>')

Expected Behaviour

Correlations work

Data Description

Standard boston data set

Code that reproduces the bug

import pandas as pd
df = pd.read_csv("boston.csv")

from ydata_profiling import ProfileReport

profile_report = ProfileReport(df, title="profile")
profile_report.to_file("test.html")

pandas-profiling version

v4.6.4

Dependencies

pandas==2.1.4

OS

Ubuntu 22

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@driscoll42
Copy link

I was looking into this a bit as I was running into the issue. It's something with pandas going from 2.0.3 to 2.1.x. For ydata-profiling v.4.6.4 it works fine with pandas v2.0.3 but once you upgrade to pandas v2.1.x the autocorrelation stops working. Won't claim to know what in pandas is causing the break, but if you downgrade to pandas 2.0.3 it'll work again.

@hakan-77
Copy link
Author

hakan-77 commented Jan 13, 2024

@driscoll42 good catch. I can confirm that the reason 4.6.2 works is that it pins pandas < 2.1. The below pr relaxed pandas pin and thus broke correlations.

#1512

@aquemy @ricardodcpereira any idea what could be wrong?

@hakan-77
Copy link
Author

hakan-77 commented Feb 23, 2024

@aquemy @ricardodcpereira is there anything I can help with?

@SilasK
Copy link

SilasK commented Mar 11, 2024

Could it be to the newer pandas datatypes. There are now nullable datatypes for string, float etc. with pandas.NA as missing values.

I get many issues where data attempts to convert sting to float:

include the error message: 'could not convert string to float: 'positive''

@jtsekine
Copy link

include the error message: 'could not convert string to float: `'positive''

Maybe after pandas 2.0, we need to add numeric_only = True in pandas.Dataframe.corr()
Changed in version 2.0.0: The default value of numeric_only is now False.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants