-
Notifications
You must be signed in to change notification settings - Fork 6.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a bunch of important asserts #63723
base: master
Are you sure you want to change the base?
Conversation
This is an automated comment for commit a2edbac with description of existing statuses. It's updated for the latest CI running ⏳ Click here to open a full report in a separate page
Successful checks
|
c72ef7e
to
45b6842
Compare
src/Functions/FunctionHelpers.cpp
Outdated
/// to decide which is the size of all the inputs | ||
/// Hopefully this will be slowly improved in the future | ||
|
||
if (!isColumnConst(*arguments[0].column)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe is it better to check that all non const columns have the same size? In this case we need to find first non_const column instead of this if-statement
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It makes sense. I'll need to check how we deal with const to decide to "materialize" it.
Some interesting failures that must be investigated:
|
11753df
to
4afb14e
Compare
To be able to introduce this assert, which is very necessary to detect runtime problems, I've had to rework how short circuit optimization works. Now it generates full columns, inserting default values if necessary on skipped indices, instead of filling only partial columns and trusting This wasn't much of a pain in general except for dictionaries, where it's madness. I'm still cleaning and reviewing other things, but cc'ing @jsc0218 @Avogar in case you want to discuss or review the changes to short circuit optimization. Surprisingly, at least to me, the new system [seems faster].(https://s3.amazonaws.com/clickhouse-test-reports/63723/8cd3b275ac05540090516997cf06f4f59c738315/performance_comparison_[3_4]/report.html) |
@@ -76,75 +76,17 @@ inline void fillVectorVector(const ArrayCond & cond, const ArrayA & a, const Arr | |||
{ | |||
|
|||
size_t size = cond.size(); | |||
bool a_is_short = a.size() < size; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is similar logic in multiIf, maybe it should be adjusted too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
b610351
to
72604ab
Compare
😮 This is all green somehow |
src/Dictionaries/FlatDictionary.cpp
Outdated
@@ -91,25 +91,21 @@ ColumnPtr FlatDictionary::getColumn( | |||
|
|||
if (is_short_circuit) | |||
{ | |||
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get(); | |||
size_t keys_found = 0; | |||
IColumn::Filter & default_mask = std::get<RefFilter>(default_or_filter).get(); /// <<<<<<<<<< |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What does this comment mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means: "Raúl, this is the the line you need to check with the debugger, but remember to remove the comment before pushing" 😄 I'll remove it
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):
Note that this PR reworks how short circuit optimization works internally:
Before we created partial columns by calling the function inside the branch (if-else) only when necessary and storing these results, then
if
managed the final result correctly. The problem is that this meant passing incomplete columns around, for example forif(cond, branchA, branchB)
with 10 rows, branchA might have 6 rows and branchB would have 4. This made introducing validation of functions harder.Now it's changed so in both cases we always pass full columns (10 rows) but we do it by inserting default values in those positions we are not going to use. While I expected this change to mean a slight performance degradation, it seems to be the opposite and queries are speed up: 3/4 4/4
Documentation entry for user-facing changes
Modify your CI run
NOTE: If your merge the PR with modified CI you MUST KNOW what you are doing
NOTE: Checked options will be applied if set before CI RunConfig/PrepareRunConfig step
Include tests (required builds will be added automatically):
Exclude tests:
Extra options:
Only specified batches in multi-batch jobs:
This is broken with the new analyzer and needs to be fixed in subsequent commits, but I want to see if the CI detects issues