New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
In classification tasks, perform ordinal encoding on the target column of string types #1260
Comments
Can you add steps to reproduce an error? It's typical that things are eventually encoded as f32 in ML for hardware acceleration, even integer categoricals, depending on the underlying library implementation. |
For example, I have a data table test whose label column is a string type, either positive or negative. I preprocess the label column data using the following statement: |
Postgres does not have a "string" type. And |
yes |
This is related to #631, with mixed text/int datasets |
got it. Thanks! |
Strings cannot automatically be converted to meaningful representations (machine learning features) as ordinal ints. You must explicitly map them to floating point values that represent their position on a continuous number line that you explicitly define. You may want to represent "positive" as 100 and "negative” as -100 and "neutral" as 0. Or maybe 1, 0, and .5. |
In the classification task, when using ordinal to encode the target column of a string type, an error will be reported. After checking the source code, it was found that in the 1021st line of code in 'pgml extension/src/om/snapshot. rs', it will be converted to f32, which is a floating-point number, instead of "Encode each category as ascending integer values" as described in the 80th line of code. However, changing the task to Regression is sufficient, but that would be meaningless.
The text was updated successfully, but these errors were encountered: