Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a bunch of important asserts #63723

Open
wants to merge 16 commits into
base: master
Choose a base branch
from

Conversation

Algunenano
Copy link
Member

@Algunenano Algunenano commented May 13, 2024

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Introduce assertions to verify all functions are called with columns of the right size

Note that this PR reworks how short circuit optimization works internally:

Before we created partial columns by calling the function inside the branch (if-else) only when necessary and storing these results, then if managed the final result correctly. The problem is that this meant passing incomplete columns around, for example for if(cond, branchA, branchB) with 10 rows, branchA might have 6 rows and branchB would have 4. This made introducing validation of functions harder.

Now it's changed so in both cases we always pass full columns (10 rows) but we do it by inserting default values in those positions we are not going to use. While I expected this change to mean a slight performance degradation, it seems to be the opposite and queries are speed up: 3/4 4/4

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

Information about CI checks: https://clickhouse.com/docs/en/development/continuous-integration/

Modify your CI run

NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step

Include tests (required builds will be added automatically):

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Unit tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with Analyzer
  • All with Azure
  • Add your option here

Exclude tests:

  • Fast test
  • Integration Tests
  • Stateless tests
  • Stateful tests
  • Performance tests
  • All with ASAN
  • All with TSAN
  • All with MSAN
  • All with UBSAN
  • All with Coverage
  • All with Aarch64
  • Add your option here

Extra options:

  • do not test (only style check)
  • disable merge-commit (no merge from master before tests)
  • disable CI cache (job reuse)

Only specified batches in multi-batch jobs:

  • 1
  • 2
  • 3
  • 4

This is broken with the new analyzer and needs to be fixed in subsequent commits, but I want to see if the CI detects issues

@robot-clickhouse-ci-2 robot-clickhouse-ci-2 added the pr-bugfix Pull request with bugfix, not backported by default label May 13, 2024
@robot-clickhouse-ci-2
Copy link
Contributor

robot-clickhouse-ci-2 commented May 13, 2024

This is an automated comment for commit a2edbac with description of existing statuses. It's updated for the latest CI running

⏳ Click here to open a full report in a separate page

Check nameDescriptionStatus
A SyncThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS⏳ pending
CI runningA meta-check that indicates the running CI. Normally, it's in success or pending state. The failed status indicates some problems with the PR⏳ pending
Mergeable CheckChecks if all other necessary checks are successful⏳ pending
Successful checks
Check nameDescriptionStatus
ClickHouse build checkBuilds ClickHouse in various configurations for use in further steps. You have to fix the builds that fail. Build logs often has enough information to fix the error, but you might have to reproduce the failure locally. The cmake options can be found in the build log, grepping for cmake. Use these options and follow the general build process✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
PR CheckThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success

@divanik divanik self-assigned this May 14, 2024
/// to decide which is the size of all the inputs
/// Hopefully this will be slowly improved in the future

if (!isColumnConst(*arguments[0].column))
Copy link
Member

@divanik divanik May 14, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe is it better to check that all non const columns have the same size? In this case we need to find first non_const column instead of this if-statement

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It makes sense. I'll need to check how we deal with const to decide to "materialize" it.

@Algunenano Algunenano marked this pull request as draft May 15, 2024 15:29
@Algunenano
Copy link
Member Author

Some interesting failures that must be investigated:

2024.05.13 23:43:00.379650 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> executeQuery: (from [::ffff:127.0.0.1]:48866) (comment: 02950_dictionary_ssd_cache_short_circuit.sh) SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id))) FROM 02950_database_for_ssd_cache_dictionary.source_table; (stage: Complete)
2024.05.13 23:43:00.380013 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> Planner: Query SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', _CAST(('v1', 'v2'), 'Tuple(String, String)'), 0, (intDiv(1, __table1.id), intDiv(1, __table1.id))) AS `dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id)))` FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 to stage Complete
2024.05.13 23:43:00.380118 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> Planner: Query SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', _CAST(('v1', 'v2'), 'Tuple(String, String)'), 0, (intDiv(1, __table1.id), intDiv(1, __table1.id))) AS `dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id)))` FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 from stage FetchColumns to stage Complete
2024.05.13 23:43:00.381201 [ 181524 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Will load the object 'f03b2b76-cffa-439b-b215-ec6ea8149d1a' in background, force = false, loading_id = 408
2024.05.13 23:43:00.381275 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Start loading object 'f03b2b76-cffa-439b-b215-ec6ea8149d1a'
2024.05.13 23:43:00.381429 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> LOCAL-Session: a06d6d9d-e66f-49c5-802c-d465116e6d40 Authenticating user 'default' from 127.0.0.1:0
2024.05.13 23:43:00.381453 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> LOCAL-Session: a06d6d9d-e66f-49c5-802c-d465116e6d40 Authenticated with global context as user 94309d50-4f52-5250-31bd-74fecac179db
2024.05.13 23:43:00.381613 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Debug> LOCAL-Session: a06d6d9d-e66f-49c5-802c-d465116e6d40 Logout, user_id: 94309d50-4f52-5250-31bd-74fecac179db
2024.05.13 23:43:00.381644 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> DictionaryFactory: Created dictionary source 'ClickHouse: 02950_database_for_ssd_cache_dictionary.source_table' for dictionary 'f03b2b76-cffa-439b-b215-ec6ea8149d1a'
2024.05.13 23:43:00.381776 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Supposed update time for 'f03b2b76-cffa-439b-b215-ec6ea8149d1a' is never (loaded, does not support updates)
2024.05.13 23:43:00.381794 [ 288891 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Trace> ExternalDictionariesLoader: Next update time for 'f03b2b76-cffa-439b-b215-ec6ea8149d1a' was set to 294247-01-10 05:00:54
2024.05.13 23:43:00.381978 [ 289058 ] {} <Debug> executeQuery: (internal) SELECT `id`, `v1`, `v2`, `v3` FROM `02950_database_for_ssd_cache_dictionary`.`source_table` WHERE `id` IN (0); (stage: Complete)
2024.05.13 23:43:00.382181 [ 289058 ] {} <Trace> Planner: Query SELECT __table1.id AS id, __table1.v1 AS v1, __table1.v2 AS v2, __table1.v3 AS v3 FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 WHERE __table1.id IN (0) to stage Complete
2024.05.13 23:43:00.382326 [ 289058 ] {} <Trace> Planner: Query SELECT __table1.id AS id, __table1.v1 AS v1, __table1.v2 AS v2, __table1.v3 AS v3 FROM `02950_database_for_ssd_cache_dictionary`.source_table AS __table1 WHERE __table1.id IN (0) from stage FetchColumns to stage Complete
2024.05.13 23:43:00.384954 [ 210159 ] {1961f983-9326-4a53-909c-ec3769bc2815} <Error> executeQuery: Code: 49. DB::Exception: Expected the #2 column ( of type String) to have 2 rows, but it has 0: while executing 'FUNCTION dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary'_String :: 1, _CAST(('v1', 'v2')_Tuple(String, String), 'Tuple(String, String)'_String) :: 2, 0_UInt8 :: 3, tuple(intDiv(1_UInt8, __table1.id), intDiv(1_UInt8, __table1.id)) :: 5) -> dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary'_String, _CAST(('v1', 'v2')_Tuple(String, String), 'Tuple(String, String)'_String), 0_UInt8, tuple(intDiv(1_UInt8, __table1.id), intDiv(1_UInt8, __table1.id))) Tuple(v1 String, v2 Nullable(String)) : 0'. (LOGICAL_ERROR) (version 24.5.1.1) (from [::ffff:127.0.0.1]:48866) (comment: 02950_dictionary_ssd_cache_short_circuit.sh) (in query: SELECT dictGetOrDefault('02950_database_for_ssd_cache_dictionary.ssd_cache_dictionary', ('v1', 'v2'), 0, (intDiv(1, id), intDiv(1, id))) FROM 02950_database_for_ssd_cache_dictionary.source_table;), Stack trace (when copying this message, always include the lines below):

@Algunenano
Copy link
Member Author

To be able to introduce this assert, which is very necessary to detect runtime problems, I've had to rework how short circuit optimization works. Now it generates full columns, inserting default values if necessary on skipped indices, instead of filling only partial columns and trusting if will do the right thing.

This wasn't much of a pain in general except for dictionaries, where it's madness.

I'm still cleaning and reviewing other things, but cc'ing @jsc0218 @Avogar in case you want to discuss or review the changes to short circuit optimization. Surprisingly, at least to me, the new system [seems faster].(https://s3.amazonaws.com/clickhouse-test-reports/63723/8cd3b275ac05540090516997cf06f4f59c738315/performance_comparison_[3_4]/report.html)

@Algunenano Algunenano added pr-improvement Pull request with some product improvements and removed pr-bugfix Pull request with bugfix, not backported by default labels May 21, 2024
@@ -76,75 +76,17 @@ inline void fillVectorVector(const ArrayCond & cond, const ArrayA & a, const Arr
{

size_t size = cond.size();
bool a_is_short = a.size() < size;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is similar logic in multiIf, maybe it should be adjusted too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍

@Algunenano
Copy link
Member Author

😮 This is all green somehow

@Algunenano Algunenano marked this pull request as ready for review May 23, 2024 18:12
@Algunenano Algunenano requested a review from divanik May 28, 2024 15:15
@@ -91,25 +91,21 @@ ColumnPtr FlatDictionary::getColumn(

if (is_short_circuit)
{
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get();
size_t keys_found = 0;
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get(); /// <<<<<<<<<<
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this comment mean?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It means: "Raúl, this is the the line you need to check with the debugger, but remember to remove the comment before pushing" 😄 I'll remove it

Copy link
Member

@divanik divanik left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
pr-improvement Pull request with some product improvements
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants