-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] NVTabular Dataset constructor cannot process cudf.StructType values. #1808
Comments
Updated repro that illustrates workflow issues in addition to Dataset creation. def f_to_pandas(col, df):
pd_series = col.to_pandas()
return cudf.from_pandas(pd_series)
def test_cudf_struct_type_conversion():
import cudf
import nvtabular as nvt
from nvtabular.ops import LambdaOp
from nvtabular.ops.operator import ColumnSelector
input_df = cudf.read_json("example.json") # different error if we use pd.read_json
single_op = ColumnSelector("properties") >> LambdaOp(f=f_to_pandas)
workflow = nvt.Workflow(single_op)
ds = nvt.Dataset(input_df)
result = workflow.fit_transform(ds).to_ddf().compute()
print(result) |
This is related to a lower-level issue that happens when converting cuDF struct columns that contain both nulls and empty structs to Pandas. It can be worked around by exploding structs into separate columns with |
This issue should be fully resolved when rapidsai/cudf#13315 goes in. |
@drobison00 hello! is the issue solved at your end. looks like rapidsai/cudf#13315 was merged. |
@rnyak I'll double check today. |
Describe the bug
Attempting to create an NVT Dataset using a cudf DataFrame containing a struct dtype fails.
Steps/Code to reproduce bug
Create a test file:
reproducer.py
output
Expected behavior
Since its a standard cuDF data type, I'd expect it to be processed correctly by NVT, or some type of graceful fallback behavior.
Environment details (please complete the following information):
Additional context
Add any other context about the problem here.
The text was updated successfully, but these errors were encountered: