Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BrokenPipeError when instert too large array #405

Open
pulina opened this issue Dec 12, 2023 · 2 comments
Open

BrokenPipeError when instert too large array #405

pulina opened this issue Dec 12, 2023 · 2 comments

Comments

@pulina
Copy link
Contributor

pulina commented Dec 12, 2023

Describe the bug

For column Array(FixedString(16)) when inserting records that cumulative binary size of data for single column is greater then MAX_STRINGS_SIZE cause clickhouse to raise TOO_LARGE_STRING_SIZE error. And when connection with no compression is established instead of getting clickhouse_driver.errors.ServerException. we get BrokenPipeError: [Errno 32] Broken pipe error on:

Traceback (most recent call last):
  File "/home/skulinski/piwik_src/clickhouse-driver/problem/./test_sessions_goal_uuids.py", line 45, in <module>
    client.execute(INSERT_QUERY, data)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/client.py", line 367, in execute
    rv = self.process_insert_query(
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/client.py", line 597, in process_insert_query
    rv = self.send_data(sample_block, data,
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/client.py", line 650, in send_data
    self.connection.send_data(block)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/connection.py", line 668, in send_data
    self.block_out.write(block)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/streams/native.py", line 43, in write
    write_column(self.context, col_name, col_type, items,
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/columns/service.py", line 167, in write_column
    column.write_data(items, buf)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/columns/arraycolumn.py", line 47, in write_data
    self._write(data, buf)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/columns/arraycolumn.py", line 111, in _write
    self._write_data(value, buf)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/columns/arraycolumn.py", line 94, in _write_data
    self.nested_column._write_data(value, buf)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/columns/arraycolumn.py", line 94, in _write_data
    self.nested_column._write_data(value, buf)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/columns/base.py", line 149, in _write_data
    self.write_items(prepared, buf)
  File "/home/skulinski/piwik_src/clickhouse-driver/venv/lib/python3.10/site-packages/clickhouse_driver-0.2.6-py3.10-linux-x86_64.egg/clickhouse_driver/columns/stringcolumn.py", line 49, in write_items
    buf.write_fixed_strings(items, self.length, encoding=self.encoding)
  File "clickhouse_driver/bufferedwriter.pyx", line 117, in clickhouse_driver.bufferedwriter.BufferedWriter.write_fixed_strings
  File "clickhouse_driver/bufferedwriter.pyx", line 40, in clickhouse_driver.bufferedwriter.BufferedWriter.write
  File "clickhouse_driver/bufferedwriter.pyx", line 130, in clickhouse_driver.bufferedwriter.BufferedSocketWriter.write_into_stream
BrokenPipeError: [Errno 32] Broken pipe

To Reproduce

#!/usr/bin/env python

from clickhouse_driver import Client
import tracemalloc

DATABASE = "***"
USER = "***"
PASSWORD = "***"
HOST = "***"
TABLE = "test_table"
DROP_TABLE_QUERY = f"""
DROP TABLE IF EXISTS `{DATABASE}`.{TABLE}
"""
CREATE_TABLE_QUERY = f"""
CREATE TABLE `{DATABASE}`.{TABLE}
                (
                    `col1` Array(FixedString(16))
                )
                ENGINE = MergeTree
                ORDER BY (col1)

"""
INSERT_QUERY = f"""
INSERT INTO `{DATABASE}`.{TABLE} (col1) VALUES
"""


tracemalloc.start()

data = [{"col1": [b'\r\xf7\x9c\xa1\xd7\xe4]\xee\x15\xeer["\xed^\xcc'] * 10000}] * 10000
try:
    with Client(
        host=HOST,
        port=39000,
        user=USER,
        password=PASSWORD,
        database=DATABASE,
        # compression="zstd",
    ) as client:
        client.execute(DROP_TABLE_QUERY)
        client.execute(CREATE_TABLE_QUERY)
        client.execute(INSERT_QUERY, data)
except:
    raise
finally:
    print(
        "Memory usage, Pick memory usage ",
        tracemalloc.get_traced_memory()[0] / 1024 / 1024,
        tracemalloc.get_traced_memory()[1] / 1024 / 1024,
    )

And as far as i can tell without deep knowledge about cython problem is located somewhere here

def write_fixed_strings(self, items, Py_ssize_t length, encoding=None):

Expected behavior
Best solution is when this is possible is slice this string into chunks that can be processed by clickhouse. I have no knowlage about clickhouse native protocol itself so I am not sure this is possible at all. Or when above solution is not possible just throw clickhouse_driver.errors.ServerException and inform about this behavior here: https://clickhouse-driver.readthedocs.io/en/latest/types.html#array-t .

Versions

  • Verivifed for current master and 0.2.6
  • CH version: 23.3.2 (but is probably irrelevant)
  • Python 3.10.12
@xzkostyan
Copy link
Member

@pulina should we close this issue as completed?

@pulina
Copy link
Contributor Author

pulina commented May 29, 2024

@xzkostyan Clickhouse still reset connection after receiving to much data and clickhouse-driver still do not have docs about it or proper catching of this error.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants