Add a bunch of important asserts #63723

Algunenano · 2024-05-13T18:33:52Z

Changelog category (leave one):

Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Introduce assertions to verify all functions are called with columns of the right size

Note that this PR reworks how short circuit optimization works internally:

Before we created partial columns by calling the function inside the branch (if-else) only when necessary and storing these results, then if managed the final result correctly. The problem is that this meant passing incomplete columns around, for example for if(cond, branchA, branchB) with 10 rows, branchA might have 6 rows and branchB would have 4. This made introducing validation of functions harder.

Now it's changed so in both cases we always pass full columns (10 rows) but we do it by inserting default values in those positions we are not going to use. While I expected this change to mean a slight performance degradation, it seems to be the opposite and queries are speed up: 3/4 4/4

Documentation entry for user-facing changes

Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

Modify your CI run

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Include tests (required builds will be added automatically):

Exclude tests:

Extra options:

do not test (only style check)
disable merge-commit (no merge from master before tests)
disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

1
2
3
4

This is broken with the new analyzer and needs to be fixed in subsequent commits, but I want to see if the CI detects issues

robot-clickhouse-ci-2 · 2024-05-13T18:36:08Z

This is an automated comment for commit a2edbac with description of existing statuses. It's updated for the latest CI running

⏳ Click here to open a full report in a separate page

Check name	Description	Status
A Sync	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	⏳ pending
CI running	A meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR	⏳ pending
Mergeable Check	Checks if all other necessary checks are successful	⏳ pending

Successful checks

Check name	Description	Status
ClickHouse build check	Builds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process	✅ success
Docs check	Builds and tests the documentation	✅ success
Fast test	Normally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here	✅ success
Integration tests	The integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests	✅ success
PR Check	There's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS	✅ success
Stateful tests	Runs stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Stateless tests	Runs stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc	✅ success
Style check	Runs a set of checks to keep the code style clean. If some of tests failed, see the related log from the report	✅ success
Unit tests	Runs the unit tests for different release types	✅ success

divanik · 2024-05-14T13:37:36Z

src/Functions/FunctionHelpers.cpp

+ /// to decide which is the size of all the inputs
+ /// Hopefully this will be slowly improved in the future
+
+ if (!isColumnConst(*arguments[0].column))


Maybe is it better to check that all non const columns have the same size? In this case we need to find first non_const column instead of this if-statement

It makes sense. I'll need to check how we deal with const to decide to "materialize" it.

Algunenano · 2024-05-15T18:32:18Z

Some interesting failures that must be investigated:

2024.05.13 23:43:00.379650 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> executeQuery: (from [::ffff:127.0.0.1]:48866) (comment: 02950_dictionary_ssd_cache_short_circuit.sh) SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id))) FROM 02950_database_for_ssd_cache_dictionary.source_table; (stage: Complete)
2024.05.13 23:43:00.380013 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> Planner: Query SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', _CAST(('v1', 'v2'), 'Tuple(String, String)'), 0, (intDiv(1, __table1.id), intDiv(1, __table1.id))) AS `dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id)))` FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 to stage Complete
2024.05.13 23:43:00.380118 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> Planner: Query SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', _CAST(('v1', 'v2'), 'Tuple(String, String)'), 0, (intDiv(1, __table1.id), intDiv(1, __table1.id))) AS `dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id)))` FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 from stage FetchColumns to stage Complete
2024.05.13 23:43:00.381201 [ 181524 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Will load the object 'f03b2b76-cffa-439b-b215-ec6ea8149d1a' in background, force = false, loading_id = 408
2024.05.13 23:43:00.381275 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Start loading object 'f03b2b76-cffa-439b-b215-ec6ea8149d1a'
2024.05.13 23:43:00.381429 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> LOCAL-Session: a06d6d9d-e66f-49c5-802c-d465116e6d40 Authenticating user 'default' from 127.0.0.1:0
2024.05.13 23:43:00.381453 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> LOCAL-Session: a06d6d9d-e66f-49c5-802c-d465116e6d40 Authenticated with global context as user 94309d50-4f52-5250-31bd-74fecac179db
2024.05.13 23:43:00.381613 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> LOCAL-Session: a06d6d9d-e66f-49c5-802c-d465116e6d40 Logout, user_id: 94309d50-4f52-5250-31bd-74fecac179db
2024.05.13 23:43:00.381644 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> DictionaryFactory: Created dictionary source 'ClickHouse: 02950_database_for_ssd_cache_dictionary.source_table' for dictionary 'f03b2b76-cffa-439b-b215-ec6ea8149d1a'
2024.05.13 23:43:00.381776 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Supposed update time for 'f03b2b76-cffa-439b-b215-ec6ea8149d1a' is never (loaded, does not support updates)
2024.05.13 23:43:00.381794 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Next update time for 'f03b2b76-cffa-439b-b215-ec6ea8149d1a' was set to 294247-01-10 05:00:54
2024.05.13 23:43:00.381978 [ 289058 ] {} <Debug> executeQuery: (internal) SELECT `id`, `v1`, `v2`, `v3` FROM `02950_database_for_ssd_cache_dictionary`.`source_table` WHERE `id` IN (0); (stage: Complete)
2024.05.13 23:43:00.382181 [ 289058 ] {} <Trace> Planner: Query SELECT __table1.id AS id, __table1.v1 AS v1, __table1.v2 AS v2, __table1.v3 AS v3 FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 WHERE __table1.id IN (0) to stage Complete
2024.05.13 23:43:00.382326 [ 289058 ] {} <Trace> Planner: Query SELECT __table1.id AS id, __table1.v1 AS v1, __table1.v2 AS v2, __table1.v3 AS v3 FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 WHERE __table1.id IN (0) from stage FetchColumns to stage Complete
2024.05.13 23:43:00.384954 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Error> executeQuery: Code: 49. DB::Exception: Expected the #2 column ( of type String) to have 2 rows, but it has 0: while executing 'FUNCTION dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary'_String :: 1, _CAST(('v1', 'v2')_Tuple(String, String), 'Tuple(String, String)'_String) :: 2, 0_UInt8 :: 3, tuple(intDiv(1_UInt8, __table1.id), intDiv(1_UInt8, __table1.id)) :: 5) -> dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary'_String, _CAST(('v1', 'v2')_Tuple(String, String), 'Tuple(String, String)'_String), 0_UInt8, tuple(intDiv(1_UInt8, __table1.id), intDiv(1_UInt8, __table1.id))) Tuple(v1 String, v2 Nullable(String)) : 0'. (LOGICAL_ERROR) (version 24.5.1.1) (from [::ffff:127.0.0.1]:48866) (comment: 02950_dictionary_ssd_cache_short_circuit.sh) (in query: SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id))) FROM 02950_database_for_ssd_cache_dictionary.source_table;), Stack trace (when copying this message, always include the lines below):

Algunenano · 2024-05-21T14:28:53Z

To be able to introduce this assert, which is very necessary to detect runtime problems, I've had to rework how short circuit optimization works. Now it generates full columns, inserting default values if necessary on skipped indices, instead of filling only partial columns and trusting if will do the right thing.

This wasn't much of a pain in general except for dictionaries, where it's madness.

I'm still cleaning and reviewing other things, but cc'ing @jsc0218 @Avogar in case you want to discuss or review the changes to short circuit optimization. Surprisingly, at least to me, the new system [seems faster].(https://s3.amazonaws.com/clickhouse-test-reports/63723/8cd3b275ac05540090516997cf06f4f59c738315/performance_comparison_[3_4]/report.html)

Avogar · 2024-05-21T16:43:49Z

src/Functions/if.cpp

@@ -76,75 +76,17 @@ inline void fillVectorVector(const ArrayCond & cond, const ArrayA & a, const Arr
 {

 size_t size = cond.size();
- bool a_is_short = a.size() < size;


There is similar logic in multiIf, maybe it should be adjusted too.

Algunenano · 2024-05-23T18:11:48Z

😮 This is all green somehow

divanik · 2024-05-28T18:24:45Z

src/Dictionaries/FlatDictionary.cpp

@@ -91,25 +91,21 @@ ColumnPtr FlatDictionary::getColumn(

 if (is_short_circuit)
 {
- IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
- size_t keys_found = 0;
+ IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get(); /// <<<<<<<<<<


What does this comment mean?

It means: "Raúl, this is the the line you need to check with the debugger, but remember to remove the comment before pushing" 😄 I'll remove it

divanik

LGTM

robot-clickhouse-ci-2 added the pr-bugfix Pull request with bugfix, not backported by default label May 13, 2024

Algunenano force-pushed the ifunction_column_size branch from c72ef7e to 45b6842 Compare May 13, 2024 18:38

divanik self-assigned this May 14, 2024

divanik reviewed May 14, 2024

View reviewed changes

Algunenano marked this pull request as draft May 15, 2024 15:29

Algunenano added 5 commits May 16, 2024 14:05

Needed a comment

8051e1e

Add a bunch of important asserts

a4f90d5

Better

698c53b

Style

36f1025

Temporarily disable 02950_dictionary_short_circuit

4afb14e

Algunenano force-pushed the ifunction_column_size branch from 11753df to 4afb14e Compare May 16, 2024 12:06

Algunenano added 5 commits May 16, 2024 21:35

Change how short circuit optimization works

4680d09

Incomplete adaptation of dictionary short circuit

371091c

Change dictionary short circuit to always create values

8cd3b27

Merge remote-tracking branch 'blessed/master' into ifunction_column_size

2e1f256

Fix issues found by fuzzer

a53bd57

Algunenano added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels May 21, 2024

Try using input_rows_count as validation

d4b723b

Avogar reviewed May 21, 2024

View reviewed changes

Algunenano added 3 commits May 21, 2024 19:37

Remove more code from old short circuit optimization

ba65bbe

Fix missing input_rows_count

3d717ac

Merge remote-tracking branch 'blessed/master' into ifunction_column_size

72604ab

Algunenano force-pushed the ifunction_column_size branch from b610351 to 72604ab Compare May 22, 2024 15:37

Tidy

a64ce7d

Algunenano marked this pull request as ready for review May 23, 2024 18:12

Algunenano requested a review from divanik May 28, 2024 15:15

divanik reviewed May 28, 2024

View reviewed changes

Remove comment

a2edbac

divanik approved these changes May 29, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add a bunch of important asserts #63723

Add a bunch of important asserts #63723

Algunenano commented May 13, 2024 •

edited

robot-clickhouse-ci-2 commented May 13, 2024 •

edited by robot-ch-test-poll4

divanik May 14, 2024 •

edited

Algunenano May 15, 2024

Algunenano commented May 15, 2024

Algunenano commented May 21, 2024

Avogar May 21, 2024

Algunenano May 21, 2024

Algunenano commented May 23, 2024

divanik May 28, 2024

Algunenano May 28, 2024

divanik left a comment

Add a bunch of important asserts #63723

Are you sure you want to change the base?

Add a bunch of important asserts #63723

Conversation

Algunenano commented May 13, 2024 • edited

Changelog category (leave one):

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Documentation entry for user-facing changes

Include tests (required builds will be added automatically):

Exclude tests:

Extra options:

Only specified batches in multi-batch jobs:

robot-clickhouse-ci-2 commented May 13, 2024 • edited by robot-ch-test-poll4

divanik May 14, 2024 • edited

Choose a reason for hiding this comment

Algunenano May 15, 2024

Choose a reason for hiding this comment

Algunenano commented May 15, 2024

Algunenano commented May 21, 2024

Avogar May 21, 2024

Choose a reason for hiding this comment

Algunenano May 21, 2024

Choose a reason for hiding this comment

Algunenano commented May 23, 2024

divanik May 28, 2024

Choose a reason for hiding this comment

Algunenano May 28, 2024

Choose a reason for hiding this comment

divanik left a comment

Choose a reason for hiding this comment

Algunenano commented May 13, 2024 •

edited

robot-clickhouse-ci-2 commented May 13, 2024 •

edited by robot-ch-test-poll4

divanik May 14, 2024 •

edited