Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Added integer_object_nulls parameter to table.to_pandas() call, since… #2953

Draft
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

aleczoeller
Copy link

@aleczoeller aleczoeller commented Jul 7, 2023

Added integer_object_nulls parameter to table.to_pandas() call, since table is of type pyarrow.parquet and can process columns this way.

I added a test that recreates this entire possible situation - a null value Object type series concatenated with an integer, written to parquet and then checked to ensure the original int64 type doesn't default to Object. This is added as test_read_parquet_int() method to test_pandas_methods.py.

Change to accommodate issue 2951

… table is of type pyarrow.parquet and can process columns this way
@m-richards
Copy link
Member

Hi @aleczoeller, thanks for the pull request on this. Please have a look at the comment I've just left in #2951 - I don't think this approach is the best way forward to solve that issue. But please feel free to join the discussion on that issue if you see otherwise, or update this pull request to accomodate.

Generally, it is always good to add a test when changing behaviour - even if the code path is well covered by existing tests - clearly our existing tests did not catch the issue that was created. Explicit tests are also very useful for us in order to prevent regressions in the code.

@aleczoeller
Copy link
Author

aleczoeller commented Jul 8, 2023

Hi @m-richards , thanks for your feedback on the issue thread (I'll respond there as well). This makes perfect sense, so I added a test that recreates this entire possible situation - a null value Object type series concatenated with an integer, written to parquet and then checked to ensure the original int64 type doesn't default to Object. This is added as test_read_parquet_int() method to test_pandas_methods.py.

I left the to_pandas() kwarg use_nullable_dtypes in (for now at least) because based on my reading geopandas is using pyarrow.parquet.read_table() instead of pandas.read_parquet().

@m-richards m-richards marked this pull request as draft October 21, 2023 05:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants