fix client.insert_dataframe() for tables with tuple columns with acceptance of `numpy.ndarray`s #426

stankudrow · 2024-03-12T20:42:31Z

fixes
- TypeError: Unsupported column type: <class 'numpy.ndarray'>. list or tuple is expected. #356
- Problems while inserting into a table with a Tuple column #417
reconsiders the PR fix client.insert_dataframe() for tuple columns in tables #425

Packages:

black : 24.2.0
clickhouse-cityhash : 1.0.2.4
clickhouse-driver : 0.2.7
Cython : 3.0.9
flake8 : 7.0.0
freezegun : 1.4.0
lz4 : 4.3.3
mypy : 1.9.0
numpy : 1.26.4
pandas : 2.2.1
parametrized : 66.0.2 (there is the latest 66.0.3 -> to be removed in favour of pytest.mark.parametrize)
pytest : 7.4.4
ruff : 0.3.2
zstd : 1.5.5.1

Checklist:

Add tests that demonstrate the correct behaviour of the change.
Add or update relevant docs, in the docs folder and in code -> no docs to change.
Ensure PR doesn't contain untouched code reformatting: spaces, etc.
Run flake8 and fix issues;
Run pytest no tests failed. See the dev docs.
Update CHANGELOG.md

stankudrow · 2024-03-14T17:15:18Z

@xzkostyan , does it look better now?

… and mymarilyn#417

stankudrow · 2024-03-21T09:47:37Z

@xzkostyan , the CHANGELOG.md was updated and the PR seems to bear the solution for the #356 issue.

xzkostyan

Good job.

Let's keep non-numpy/numpy things separated.

tests/test_insert.py

xzkostyan · 2024-03-26T19:11:08Z

clickhouse_driver/util/helpers.py

+ CHECK_NUMPY_TYPES = False
+
+
+def _check_sequence_to_be_an_expected_iterable(seq):


Numpy/pandas helpers should be located in clickhouse_driver/numpy/helpers.py.

I'd keep two different versions of chunks: one for non-numpy and another one for numpy.

xzkostyan · 2024-03-26T19:17:00Z

clickhouse_driver/client.py

+ column_values = dataframe[column].values
+ for idx, col_vals in enumerate(column_values):
+ if isinstance(col_vals, dict):
+ column_values[idx] = tuple(col_vals.values())


What about speed and memory consumption here? Did you test the new code with large dataframes?

This part is to be refined later.

xzkostyan · 2024-04-03T19:23:53Z

tests/numpy/test_insert.py

+from tests.numpy.testcase import NumpyBaseTestCase
+
+
+@skipIf(not PANDAS_IMPORTED, reason="pandas cannot be imported")


Please see how simple numpy/pandas tests are implemented: tests/numpy/test_generic.py. Skipping logic (@skipIf) is hidden under the hood. Variable PANDAS_IMPORTED is redundant in this case.

stankudrow mentioned this pull request Mar 12, 2024

fix client.insert_dataframe() for tuple columns in tables #425

Closed

5 tasks

stankudrow pushed a commit to stankudrow/clickhouse-driver that referenced this pull request Mar 14, 2024

update CHANGELOG.md with PR mymarilyn#426 to solve issues mymarilyn#356…

9bf3ad8

… and mymarilyn#417

stankudrow changed the title ~~fix client.insert_dataframe() for tables with tuple columns - issue#417~~ fix client.insert_dataframe() for tables with tuple columns with acceptance of numpy.ndarrays Mar 14, 2024

xzkostyan requested changes Mar 26, 2024

View reviewed changes

rebase

7f0cc87

stankudrow force-pushed the fix-client-insert-dataframe-with-dicts branch from 9bf3ad8 to 7f0cc87 Compare March 26, 2024 20:58

move insert_dataframe tests into the tests/numpy/ directory

2ae3485

stankudrow requested a review from xzkostyan March 31, 2024 14:19

xzkostyan requested changes Apr 3, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix client.insert_dataframe() for tables with tuple columns with acceptance of `numpy.ndarray`s #426

fix client.insert_dataframe() for tables with tuple columns with acceptance of `numpy.ndarray`s #426

stankudrow commented Mar 12, 2024 •

edited

stankudrow commented Mar 14, 2024

stankudrow commented Mar 21, 2024

xzkostyan left a comment

xzkostyan Mar 26, 2024

xzkostyan Mar 26, 2024

stankudrow Mar 27, 2024

xzkostyan Apr 3, 2024

		CHECK_NUMPY_TYPES = False


		def _check_sequence_to_be_an_expected_iterable(seq):

		from tests.numpy.testcase import NumpyBaseTestCase


		@skipIf(not PANDAS_IMPORTED, reason="pandas cannot be imported")

fix client.insert_dataframe() for tables with tuple columns with acceptance of numpy.ndarrays #426

Are you sure you want to change the base?

fix client.insert_dataframe() for tables with tuple columns with acceptance of numpy.ndarrays #426

Conversation

stankudrow commented Mar 12, 2024 • edited

stankudrow commented Mar 14, 2024

stankudrow commented Mar 21, 2024

xzkostyan left a comment

Choose a reason for hiding this comment

xzkostyan Mar 26, 2024

Choose a reason for hiding this comment

xzkostyan Mar 26, 2024

Choose a reason for hiding this comment

stankudrow Mar 27, 2024

Choose a reason for hiding this comment

xzkostyan Apr 3, 2024

Choose a reason for hiding this comment

fix client.insert_dataframe() for tables with tuple columns with acceptance of `numpy.ndarray`s #426

fix client.insert_dataframe() for tables with tuple columns with acceptance of `numpy.ndarray`s #426

stankudrow commented Mar 12, 2024 •

edited