-
Notifications
You must be signed in to change notification settings - Fork 512
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[BUG] cuml cannot split dataframe with string column. 5676 #5834
Comments
Currently we don't active test for string columns in the method if I'm not mistaken, since back when we originally added this to cuML we didn't think/expect string columns. That said, we will check if this is expected or a bug and update the issue. Thanks @AndreasKarasenko ! |
I was doing some cudf issue triage and came across the linked issue rapidsai/cudf#12989. Please take a look at my last comment, it might provide some insight into what caused this incompatibility in cuml (it looks like this used to work in 23.02 so maybe cuml started converting to a cupy array internally during 23.04 development). |
Describe the bug
When using cuml's train_test_split on a cudf dataframe with a string column it fails with "TypeError: String Arrays is not yet implemented in cudf". cudf refers back to cuml and there seems to be no news (see here).
Using sklearn or dask-ml for splitting works as expected, so I'm not sure if it even is a cudf issue.
Steps/Code to reproduce bug
This code uses a mix of the HPO example and the Naive Bayes example.
Expected behavior
It should split the dataframe like sklearn or dask.
Environment details:
The text was updated successfully, but these errors were encountered: