Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix client.insert_dataframe() for tables with tuple columns with acceptance of numpy.ndarrays #426

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

stankudrow
Copy link

@stankudrow stankudrow commented Mar 12, 2024

Packages:

  • black : 24.2.0
  • clickhouse-cityhash : 1.0.2.4
  • clickhouse-driver : 0.2.7
  • Cython : 3.0.9
  • flake8 : 7.0.0
  • freezegun : 1.4.0
  • lz4 : 4.3.3
  • mypy : 1.9.0
  • numpy : 1.26.4
  • pandas : 2.2.1
  • parametrized : 66.0.2 (there is the latest 66.0.3 -> to be removed in favour of pytest.mark.parametrize)
  • pytest : 7.4.4
  • ruff : 0.3.2
  • zstd : 1.5.5.1

Checklist:

  • Add tests that demonstrate the correct behaviour of the change.
  • Add or update relevant docs, in the docs folder and in code -> no docs to change.
  • Ensure PR doesn't contain untouched code reformatting: spaces, etc.
  • Run flake8 and fix issues;
  • Run pytest no tests failed. See the dev docs.
  • Update CHANGELOG.md

@stankudrow
Copy link
Author

@xzkostyan , does it look better now?

stankudrow pushed a commit to stankudrow/clickhouse-driver that referenced this pull request Mar 14, 2024
@stankudrow stankudrow changed the title fix client.insert_dataframe() for tables with tuple columns - issue#417 fix client.insert_dataframe() for tables with tuple columns with acceptance of numpy.ndarrays Mar 14, 2024
@stankudrow
Copy link
Author

@xzkostyan , the CHANGELOG.md was updated and the PR seems to bear the solution for the #356 issue.

Copy link
Member

@xzkostyan xzkostyan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good job.

Let's keep non-numpy/numpy things separated.

tests/test_insert.py Outdated Show resolved Hide resolved
CHECK_NUMPY_TYPES = False


def _check_sequence_to_be_an_expected_iterable(seq):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Numpy/pandas helpers should be located in clickhouse_driver/numpy/helpers.py.

I'd keep two different versions of chunks: one for non-numpy and another one for numpy.

column_values = dataframe[column].values
for idx, col_vals in enumerate(column_values):
if isinstance(col_vals, dict):
column_values[idx] = tuple(col_vals.values())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about speed and memory consumption here? Did you test the new code with large dataframes?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This part is to be refined later.

@stankudrow stankudrow force-pushed the fix-client-insert-dataframe-with-dicts branch from 9bf3ad8 to 7f0cc87 Compare March 26, 2024 20:58
from tests.numpy.testcase import NumpyBaseTestCase


@skipIf(not PANDAS_IMPORTED, reason="pandas cannot be imported")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please see how simple numpy/pandas tests are implemented: tests/numpy/test_generic.py. Skipping logic (@skipIf) is hidden under the hood. Variable PANDAS_IMPORTED is redundant in this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants