Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FIX-#7248: Make sure _validate_dtypes_sum_prod_mean works correctly with datetime types #7237

Merged
merged 1 commit into from May 13, 2024

Conversation

anmyachev
Copy link
Collaborator

@anmyachev anmyachev commented May 6, 2024

What do these changes do?

  • first commit message and PR title follow format outlined here

    NOTE: If you edit the PR title to match this format, you need to add another commit (even if it's empty) or amend your last commit for the CI job that checks the PR title to pick up the new PR title.

  • passes flake8 modin/ asv_bench/benchmarks scripts/doc_checker.py
  • passes black --check modin/ asv_bench/benchmarks scripts/doc_checker.py
  • signed commit with git commit -s
  • Resolves Internal function _validate_dtypes_sum_prod_mean doesn't process correctly datetime64[ns] and timedelta64[ns] types #7248
  • tests added and passing
  • module layout described at docs/development/architecture.rst is up-to-date

…rks correctly with datetime types

Signed-off-by: Anatoly Myachev <[email protected]>
@anmyachev anmyachev changed the title FIX-#0000: Fix _validate_dtypes_sum_prod_mean FIX-#7248: Make sure _validate_dtypes_sum_prod_mean works correctly with datetime types May 8, 2024
@anmyachev anmyachev marked this pull request as ready for review May 8, 2024 15:35
Comment on lines +2158 to +2169
# We cannot add datetime types, so if we are summing a column with
# dtype datetime64 and cannot ignore non-numeric types, we must throw a
# TypeError.
if numeric_only is False and any(
dtype == pandas.api.types.pandas_dtype("datetime64[ns]")
for dtype in self.dtypes
):
raise TypeError(
"'DatetimeArray' with dtype datetime64[ns] does not support reduction 'sum'"
)

data = self._get_numeric_data(axis) if numeric_only else self
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This piece of code has been moved from _validate_dtypes_sum_prod_mean function because the processing is different for sum function.

Comment on lines -3064 to -3065
dtype != pandas.api.types.pandas_dtype("datetime64[ns]")
and dtype != pandas.api.types.pandas_dtype("timedelta64[ns]")
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now these types cannot be compared with numbers.

# numeric_only is False because if it is None, it will default to True
# if the operation fails with mixed dtypes.
if (
axis
and numeric_only is False
and np.unique([is_numeric_dtype(dtype) for dtype in self.dtypes]).size == 2
and not all([is_numeric_dtype(dtype) for dtype in self.dtypes])
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new way is more obvious.

modin_df,
pandas_df,
lambda df: df.min(axis=1),
# pandas raises: `TypeError: '<=' not supported between instances of 'Timestamp' and 'int'`
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What prevents us from emitting the same error message?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Most likely there are no fundamental restrictions, but we need to know better how pandas does this.

@YarShev YarShev merged commit 78dd171 into modin-project:main May 13, 2024
38 checks passed
@anmyachev anmyachev deleted the pandas-dtype-1 branch May 13, 2024 09:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants