Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Polars deprecation inbound #894

Closed
koaning opened this issue Mar 18, 2024 · 4 comments
Closed

Polars deprecation inbound #894

koaning opened this issue Mar 18, 2024 · 4 comments
Labels
bug Something isn't working

Comments

@koaning
Copy link

koaning commented Mar 18, 2024

Describe the bug

I was working on a dataset with skrub together with polars and I noticed this warning pop up while running the TableVectorizer.

[/Users/vincent/Development/probabl/venv/lib/python3.11/site-packages/skrub/_table_vectorizer.py:101](http://localhost:8888/lab/workspaces/auto-v/tree/venv/lib/python3.11/site-packages/skrub/_table_vectorizer.py#line=100): FutureWarning: Downcasting behavior in `replace` is deprecated and will be removed in a future version. To retain the old behavior, explicitly call `result.infer_objects(copy=False)`. To opt-in to the future behavior, set `pd.set_option('future.no_silent_downcasting', True)`
  column = column.replace(STR_NA_VALUES, np.nan).replace(r"^\s+$", np.nan, regex=True)

Figured I'd mention it here just as a ping. It seems like something that may break in the future. If folks appreciate it I can also have a deeper look.

Steps/Code to Reproduce

dicts = [{'id': 0,
  'Gender': 'Male',
  'Age': 24.443011,
  'Height': 1.699998,
  'Weight': 81.66995,
  'family_history_with_overweight': 'yes',
  'FAVC': 'yes',
  'FCVC': 2.0,
  'NCP': 2.983297,
  'CAEC': 'Sometimes',
  'SMOKE': 'no',
  'CH2O': 2.763573,
  'SCC': 'no',
  'FAF': 0.0,
  'TUE': 0.976473,
  'CALC': 'Sometimes',
  'MTRANS': 'Public_Transportation'},
 {'id': 1,
  'Gender': 'Female',
  'Age': 18.0,
  'Height': 1.56,
  'Weight': 57.0,
  'family_history_with_overweight': 'yes',
  'FAVC': 'yes',
  'FCVC': 2.0,
  'NCP': 3.0,
  'CAEC': 'Frequently',
  'SMOKE': 'no',
  'CH2O': 2.0,
  'SCC': 'no',
  'FAF': 1.0,
  'TUE': 1.0,
  'CALC': 'no',
  'MTRANS': 'Automobile'},
 {'id': 2,
  'Gender': 'Female',
  'Age': 18.0,
  'Height': 1.71146,
  'Weight': 50.165754,
  'family_history_with_overweight': 'yes',
  'FAVC': 'yes',
  'FCVC': 1.880534,
  'NCP': 1.411685,
  'CAEC': 'Sometimes',
  'SMOKE': 'no',
  'CH2O': 1.910378,
  'SCC': 'no',
  'FAF': 0.866045,
  'TUE': 1.673584,
  'CALC': 'no',
  'MTRANS': 'Public_Transportation'},
 {'id': 3,
  'Gender': 'Female',
  'Age': 20.952737,
  'Height': 1.71073,
  'Weight': 131.274851,
  'family_history_with_overweight': 'yes',
  'FAVC': 'yes',
  'FCVC': 3.0,
  'NCP': 3.0,
  'CAEC': 'Sometimes',
  'SMOKE': 'no',
  'CH2O': 1.674061,
  'SCC': 'no',
  'FAF': 1.467863,
  'TUE': 0.780199,
  'CALC': 'Sometimes',
  'MTRANS': 'Public_Transportation'},
 {'id': 4,
  'Gender': 'Male',
  'Age': 31.641081,
  'Height': 1.914186,
  'Weight': 93.798055,
  'family_history_with_overweight': 'yes',
  'FAVC': 'yes',
  'FCVC': 2.679664,
  'NCP': 1.971472,
  'CAEC': 'Sometimes',
  'SMOKE': 'no',
  'CH2O': 1.979848,
  'SCC': 'no',
  'FAF': 1.967973,
  'TUE': 0.931721,
  'CALC': 'Sometimes',
  'MTRANS': 'Public_Transportation'}]


import polars as pl
from skrub import TableVectorizer

TableVectorizer().fit_transform(pl.DataFrame(dicts))

Expected Results

No warning would be swell.

Actual Results

Mentioned above.

Versions

System:
    python: 3.11.6 (v3.11.6:8b6ee5ba3b, Oct  2 2023, 11:18:21) [Clang 13.0.0 (clang-1300.0.29.30)]
executable: /Users/vincent/Development/probabl/venv/bin/python
   machine: macOS-13.4.1-arm64-arm-64bit

Python dependencies:
      sklearn: 1.4.0
          pip: 23.2.1
   setuptools: 65.5.0
        numpy: 1.26.3
        scipy: 1.12.0
       Cython: None
       pandas: 2.2.0
   matplotlib: 3.8.2
       joblib: 1.3.2
threadpoolctl: 3.2.0

Built with OpenMP: True

threadpoolctl info:
       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/vincent/Development/probabl/venv/lib/python3.11/site-packages/numpy/.dylibs/libopenblas64_.0.dylib
        version: 0.3.23.dev
threading_layer: pthreads
   architecture: armv8

       user_api: openmp
   internal_api: openmp
    num_threads: 8
         prefix: libomp
       filepath: /Users/vincent/Development/probabl/venv/lib/python3.11/site-packages/sklearn/.dylibs/libomp.dylib
        version: None

       user_api: blas
   internal_api: openblas
    num_threads: 8
         prefix: libopenblas
       filepath: /Users/vincent/Development/probabl/venv/lib/python3.11/site-packages/scipy/.dylibs/libopenblas.0.dylib
        version: 0.3.21.dev
threading_layer: pthreads
   architecture: armv8
0.1.0
@koaning koaning added the bug Something isn't working label Mar 18, 2024
@jeromedockes
Copy link
Contributor

thanks for the report! the deprecation seems to actually be from pandas because the current version of the tablevectorizer just transforms polars dataframes to pandas, which will change after #877 . Also after that PR pd.NA should be used instead of np.nan so the warning might go away

@jeromedockes
Copy link
Contributor

here is a minimal reproducer, to add as a regression test:

import polars as pl
from skrub import TableVectorizer

df = pl.DataFrame(dict(a=[0, 1], b=['a', 'b']))
TableVectorizer().fit_transform(df)

@jeromedockes
Copy link
Contributor

(the problem seems to be that check_X would do pd.DataFrame(polars_dataframe) and end up with all object columns)

@jeromedockes
Copy link
Contributor

checked that this is fixed after #902

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants