Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: ValueError: NaTType does not support strftime #1565

Open
3 tasks done
LukeJakielaszek opened this issue Mar 21, 2024 · 1 comment
Open
3 tasks done

Bug Report: ValueError: NaTType does not support strftime #1565

LukeJakielaszek opened this issue Mar 21, 2024 · 1 comment
Assignees
Labels
bug 🐛 Something isn't working

Comments

@LukeJakielaszek
Copy link

Current Behaviour

When attempting to generate a report for time series data, sometimes I receive a NaTType error. The dataset used is the example dataset from tutorials https://github.com/ydataai/ydata-profiling/blob/develop/examples/usaairquality/usaairquality.ipynb. If trying to load the full dataset into the report or filtered to a specific site num such as 2, the time series profiling will fail.

Expected Behaviour

A time series profile report should be generated

Data Description

https://github.com/ydataai/ydata-profiling/blob/develop/examples/usaairquality/usaairquality.ipynb

Code that reproduces the bug

import pandas as pd
from pandas_profiling import ProfileReport

import pandas as pd

from ydata_profiling.utils.cache import cache_file
from ydata_profiling import ProfileReport

file_name = cache_file(
    "pollution_us_2000_2016.csv",
    "https://query.data.world/s/mz5ot3l4zrgvldncfgxu34nda45kvb",
)

df = pd.read_csv(file_name, index_col=[0])
df["Date Local"] = pd.to_datetime(df["Date Local"])

type_schema = {
    "NO2 Mean": "timeseries",
    "NO2 1st Max Value": "timeseries",
    "NO2 1st Max Hour": "timeseries",
    "NO2 AQI": "timeseries",
}

# Filtering time-series to profile a single site
site = df[df["Site Num"] == 2]


site_2 = site[["NO2 Mean", "NO2 1st Max Value", "NO2 1st Max Hour", "NO2 AQI", "Date Local"]]

#Enable tsmode to True to automatically identify time-series variables
#Provide the column name that provides the chronological order of your time-series
profile = ProfileReport(
    site_2,
    tsmode=True,
    type_schema=type_schema,
    sortby="Date Local",
    title="Time-Series EDA for site",
)
profile.to_file("report_timeseries.html")

pandas-profiling version

4.6.0

Dependencies

pandas==2.0.3

OS

windows 10

Checklist

  • There is not yet another bug report for this issue in the issue tracker
  • The problem is reproducible from this bug report. This guide can help to craft a minimal bug report.
  • The issue has not been resolved by the entries listed under Common Issues.
@fabclmnt fabclmnt added bug 🐛 Something isn't working and removed needs-triage labels Mar 28, 2024
@fabclmnt fabclmnt self-assigned this Mar 28, 2024
@fabclmnt
Copy link
Contributor

fabclmnt commented Mar 28, 2024

Hi @LukeJakielaszek ,

thank you for the detailed walkthrough the reproduce the bug.
The team will be looking into it soon, and include a fix in the next package release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug 🐛 Something isn't working
Projects
Status: Selected for next release
Development

No branches or pull requests

3 participants