Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugging creation of report #1537

Open
3 tasks done
machtom1 opened this issue Feb 8, 2024 · 0 comments
Open
3 tasks done

Bugging creation of report #1537

machtom1 opened this issue Feb 8, 2024 · 0 comments

Comments

@machtom1
Copy link

machtom1 commented Feb 8, 2024

Current Behaviour

Hi,
right now when I create a report, some columns are labeled as numeric values and not as expected labeled as categorical values. When I create the report without casting the columns, the creation of the report takes only less than one minute. Now after casting the columns as string values, the report takes forever and I had to stop executing it after waiting for more then 30 mins. It begins to stuck at columns B and F (have a look at the columns description in 'Data Description'). I tried the casting of the datatype in the read_csv and with .astype(string), which is also shown in the code-section.

Expected Behaviour

I expect the report to show the correct numerical and categorical values

Data Description

The data is a mix of date values and categorical values. Here is a chart of the missing values and the datatypes, before and after after casting them:
Missing values dtype
A 0 Date
B 0 Date
C 3 object
D 0 object
E 0 object
F 0 object
G 86317 object
H 39 date
I 6871 object
J 0 object

Code that reproduces the bug

#Way one:
df_data = pd.read_csv('example.csv', parse_dates=['A', 'B'], dtype={
    'C' : 'string',
    'D' : 'string',
    'E' : 'string'
}
)
#Way two: 
df_data = pd.read_csv('example.csv')
df_data['C'] = df_data['A'].astype(str)
df_data['D'] = df_data['A'].astype(str)
df_data['E'] = df_data['A'].astype(str)

#And then: 
profile = ProfileReport(df_data)

pandas-profiling version

v. 4.6.4

Dependencies

absl-py==2.0.0
adagio==0.2.4
aiohttp==3.9.2
aiosignal==1.3.1
annotated-types==0.6.0        
antlr4-python3-runtime==4.11.1
appdirs==1.4.4
asttokens==2.4.1
astunparse==1.6.3
async-timeout==4.0.3
attrs==23.2.0
cachetools==5.3.2
certifi==2023.11.17
charset-normalizer==3.3.2
cloudpickle==3.0.0
colorama==0.4.6
comm==0.2.1
contourpy==1.2.0
cycler==0.12.1
Cython==3.0.8
dacite==1.8.1
darts==0.27.2
debugpy==1.8.0
decorator==5.1.1
et-xmlfile==1.1.0
exceptiongroup==1.2.0
executing==2.0.1
fastjsonschema==2.19.1
filelock==3.13.1
flatbuffers==23.5.26
fonttools==4.47.0
frozenlist==1.4.1
fs==2.4.16
fsspec==2023.12.2
fugue==0.8.7
fugue-sql-antlr==0.2.0
gast==0.5.4
google-auth==2.26.2
google-auth-oauthlib==1.2.0
google-pasta==0.2.0
grpcio==1.60.0
h5py==3.10.0
holidays==0.41
htmlmin==0.1.12
idna==3.6
ImageHash==4.3.1
importlib-metadata==7.0.1
importlib-resources==6.1.1
ipykernel==6.28.0
ipython==8.18.1
ipywidgets==8.1.1
jedi==0.19.1
Jinja2==3.1.3
joblib==1.3.2
jsonschema==4.21.1
jsonschema-specifications==2023.12.1
jupyter_client==8.6.0
jupyter_core==5.7.1
jupyterlab-widgets==3.0.9
keras==2.15.0
kiwisolver==1.4.5
libclang==16.0.6
lightgbm==4.3.0
lightning-utilities==0.10.1
llvmlite==0.41.1
Markdown==3.5.2
MarkupSafe==2.1.3
matplotlib==3.8.2
matplotlib-inline==0.1.6
ml-dtypes==0.2.0
MouseInfo==0.1.3
mpmath==1.3.0
multidict==6.0.4
multimethod==1.11
nbformat==5.9.2
nest-asyncio==1.5.8
networkx==3.2.1
nfoursid==1.0.1
numba==0.58.1
numpy==1.25.2
oauthlib==3.2.2
openpyxl==3.1.2
opt-einsum==3.3.0
packaging==23.2
pandas==2.1.4
parso==0.8.3
patsy==0.5.6
phik==0.12.4
pillow==10.2.0
platformdirs==4.1.0
plotly==5.18.0
pmdarima==2.0.4
prompt-toolkit==3.0.43
protobuf==4.23.4
psutil==5.9.7
pure-eval==0.2.2
pyarrow==15.0.0
pyasn1==0.5.1
pyasn1-modules==0.3.0
PyAutoGUI==0.9.54
pydantic==2.6.1
pydantic_core==2.16.2
PyGetWindow==0.0.9
Pygments==2.17.2
PyMsgBox==1.0.9
pyod==1.1.2
pyparsing==3.1.1
pyperclip==1.8.2
PyRect==0.2.0
PyScreeze==0.1.30
python-dateutil==2.8.2
pytorch-lightning==2.1.2
pytweening==1.0.7
pytz==2023.3.post1
PyWavelets==1.5.0
pywin32==306
PyYAML==6.0.1
pyzmq==25.1.2
qpd==0.4.4
referencing==0.32.1
requests==2.31.0
requests-oauthlib==1.3.1
rpds-py==0.17.1
rsa==4.9
scikit-learn==1.3.2
scipy==1.11.4
seaborn==0.12.2
shap==0.44.1
six==1.16.0
slicer==0.0.7
sqlglot==20.10.0
stack-data==0.6.3
statsforecast==1.7.1
statsmodels==0.14.1
sweetviz==2.3.1
sympy==1.12
tangled-up-in-unicode==0.2.0
tbats==1.1.3
tenacity==8.2.3
tensorboard==2.15.1
tensorboard-data-server==0.7.2
tensorboardX==2.6.2.2
tensorflow==2.15.0
tensorflow-estimator==2.15.0
tensorflow-intel==2.15.0
tensorflow-io-gcs-filesystem==0.31.0
termcolor==2.4.0
threadpoolctl==3.2.0
torch==2.1.2
torchmetrics==1.3.0.post0
tornado==6.4
tqdm==4.66.1
traitlets==5.14.1
triad==0.9.5
typeguard==4.1.5
typing_extensions==4.9.0
tzdata==2023.4
urllib3==2.1.0
utilsforecast==0.0.26
visions==0.7.5
wcwidth==0.2.13
Werkzeug==3.0.1
widgetsnbextension==4.0.9
wordcloud==1.9.3
wrapt==1.14.1
xarray==2024.1.1
xgboost==2.0.3
yarl==1.9.4
ydata-profiling==4.6.4
zipp==3.17.0

OS

No response

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants