Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(polars): add more accurate type mapping for timestamps #8954

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ncclementi
Copy link
Contributor

@ncclementi ncclementi commented Apr 12, 2024

polars datetime types have a fixed time unit (us, ns, or ms). Ibis timestamp types do too (but we also have seocnds as an option). In the case of polars we're currently mapping all timestamps to ns, this PR makes things a bit more specific and uses proper units.

I'm still not sure how to handle the case Ibis seconds to maybe ms in polars ? Need a bit help on how the conversion of units happens, more specifically where is the value part handled? we would need to do ibis_seconds*1000 = polars_ms at the moment I'm raising, similarly to what we do in durations, but I need to tackle this differently, see all the current failures in CI.

@@ -518,6 +518,11 @@ def test_timestamp_unit():
assert dt.Timestamp(scale=scale).unit == TimestampUnit.NANOSECOND


def test_timestamp_unit_raise():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this because it seemed like this

else:
raise ValueError(f"Invalid scale {self.scale}")
wasn't covered.

But turns out this test is causing a raise in a different place, which is confusing.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The Timestamp op has a restriction that only the literal ints from 0 through 10 can be passed as scale -- so while it isn't technically unit-tested, it's not possible to set a value outside of that range without raising a SignatureValidationError

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That makes sense, but then why having a code path to "raise" that would never been triggered.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i would wager that the raise predates the enforcement of values at the op level

],
)
def test_from_ibis_type_seconds(ibis_dtype, polars_type):
# we accept seconds as an ibis type, default to ns in polars
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@jcrist in the original issue, you suggested maybe defaulting to "ms" for when we go from "s" in Ibis to polars. I though it made sense keeping the behavior we had before for that particular case which is "ns". Any objections?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It still seems a bit odd to me that asking for seconds I would get back milliseconds. I say we raise in the seconds case as an UnsupportedOperationError the same any other missing functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will break the current behavior where we ask for seconds and return ns. Just making sure we are ok with breaking that.

@ncclementi
Copy link
Contributor Author

After looking more into this PR, I'm not sure if we are correctly casting the values when mapping the timestamps, base on this code:

if dtype.is_integer():
typ = pl.Datetime(time_unit="us", time_zone=time_zone)
arg = (arg * 1_000_000).cast(typ)
if time_unit != "us":
arg = arg.dt.truncate(f"1{time_unit}")
return arg.alias(op.name)
elif dtype.is_string():
typ = pl.Datetime(time_unit=time_unit, time_zone=time_zone)
arg = arg.str.strptime(typ)
if time_unit == "s":
return arg.dt.truncate("1s")
return arg
typ = PolarsType.from_ibis(to)
return arg.cast(typ, strict=strict)

That being said the casting "potential issue" is separate from the mapping of the types, so I proposed opening a separate issue to investigate that, and for now get this PR in.

@ncclementi
Copy link
Contributor Author

I did a little more digging and on main (as well as this branch). The casting is broken, see #9091

I'm not sure what are the next steps here, regarding this PR.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

refactor(polars): more accurate type mapping for timestamps
3 participants