-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Filter IN pushdown to different decoders #1525
base: master
Are you sure you want to change the base?
Conversation
8c26e69
to
939b84a
Compare
@@ -1015,6 +1077,8 @@ void ObWhiteFilterExecutor::check_null_params() | |||
int ObWhiteFilterExecutor::init_obj_set() | |||
{ | |||
int ret = OB_SUCCESS; | |||
obj_array_sorted_ = false; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个成员不需要,保证数组有序就行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个成员不需要,保证数组有序就行
收到, 已修改 commit 63e156e
@@ -843,14 +848,67 @@ int ObRawDecoder::in_operator( | |||
|| NULL == row_index)) { | |||
ret = OB_INVALID_ARGUMENT; | |||
LOG_WARN("Pushdown in operator: Invalid arguments", K(ret), K(filter.get_objs())); | |||
} else if (OB_LIKELY(can_vectorized()) && OB_LIKELY(is_inited())) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个判断不需要吧
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个判断不需要吧
收到, 已修改 commit a2f7326
int64_t *row_ids_; | ||
common::ObIAllocator *allocator_; | ||
// for white filter IN batch_decode | ||
common::ObDatum *white_batch_datums_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
这个可以从表达式中拿到,不需要自己alloc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到, 已修改 commit d3522a7
int ObWhiteFilterExecutor::eval_right_val_to_objs() | ||
{ | ||
int ret = OB_SUCCESS; | ||
const ObExpr &expr = *(filter_.expr_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
先判null防御
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已修改 commit 2284e8c
|
||
struct ObWhiteFilterParamsCmpFunc | ||
{ | ||
OB_INLINE bool operator()(const common::ObObj &obj1, const common::ObObj &obj2) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obj compare函数可能返回ERROR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已修改 commit ecd9691
// 2. make params sorted | ||
if (OB_FAIL(ret)) { | ||
} else { | ||
std::sort(params_.begin(), params_.end(), ObWhiteFilterParamsCmpFunc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
一般cmp fun需要带入ret
sort之后如果报错需要带出错误码
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已修改 commit ecd9691
// obj > max_obj || obj < min_obj | ||
is_exist = false; | ||
} else { | ||
is_exist = std::binary_search(params_.begin(), params_.end(), obj, ObWhiteFilterParamsCmpFunc()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
同上, 尽量不用标准库
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,未带出错误码的问题已修改 commit ecd9691
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
如果不使用标准库函数的话,想问一下这里OB有自己更好的实现吗,或者说自己手撸一个?用这个主要是因为看到在其他位置 std::sort
和 std::binary_search
也有被用到😂
} else if (OB_FAIL(cur_arg->eval(ctx, right))) { | ||
LOG_WARN("failed to eval right datum", K(ret)); | ||
} else if (!null_param_contained_ && right->is_null()) { | ||
null_param_contained_ = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in(xxx, null) 的时候, null是不是可以忽略掉? 这里不需要标注null param?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已修改 commit 2284e8c
check_null_params(); | ||
if (WHITE_OP_IN == filter_.get_op_type() && OB_FAIL(init_obj_set())) { | ||
LOG_WARN("Failed to init Object hash set in filter node", K(ret)); | ||
int ObWhiteFilterExecutor::eval_in_right_val_to_objs() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
和 eval_right_val_to_objs代码重复比较多
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已修改 commit 2284e8c
…luated_datums`
ObPushdownWhiteFilterNode &filter_; | ||
ObDatum* batch_decode_datums_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
不需要加这个成员,直接调 get_datums_from_column就行
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
收到,已修改 commit 94ac9b4
Task Description
ref #1491
Solution Description
Add/Update white filter 'in' functions for RAW/DICT/RLE/CONST/INTEGER_BASE_DIFF decoders.
Basic process:
ref_bitset
result_bitmap
by usingref_bitset
ref_bitset
result_bitmap
by usingref_bitset
params
(i.e. right arguments of expr 'in'), flip overresult_bitmap
ref_bitset
. It needs to be combined with whether the const value is in the params.result_bitmap
by usingref_bitset
max(params) < base_value
, setresult_bitmap
all falseresult_bitmap
Passed Regressions
Unittest
Passed related unittests:
Mysql test
Part 1. Load data
Number of data:
1,000,000 rows.
Data types:
The data types included in each table are shown in the table below:
Data organization format:
In order to form the corresponding encoding, the data is arranged according to the following rules:
RAW: Completely random, values within each column are mostly different.
DICT: Values within the dictionary appear randomly. eg: the dictionary contains values {A, B, C}.
RLE: Values within the dictionary appear consecutively in order. eg: the dictionary contains values {A, B, C}.
CONST: 99.9% the default value C appearing, and a 0.1% probability of values {A, B} within the dictionary appearing.
INTEGER_BASE_DIFF: Random data appears within a certain range, and the minimum value within the data is taken as the base value B.
Part 2. Run
Pattern of SQL statements used for testing:
In the code above,
col_name
is the column name,table_name
is the table name, andparams
is the parameter list for IN, containingparams.size()
parameters that need to be matched.Variables in the test:
IN (F * n)
IN (T * 0.5n, F * 0.5n)
IN (F * n)
IN (D * k, F * (n-k))
, wherek
=min(n/2, len(dict)/2)
IN (F * n)
IN (D * k, F * (n-k))
, wherek
=min(n/2, len(dict)/2)
IN (C, F * (n-1))
4 / 40 / 400
Part 3. Result
Meaning of values:
The values in the table below represent the execution time of SQL queries on the master branch divided by the execution time of the same queries on the issue branch (i.e. The speedup ratio of the issue branch compared to the master branch).
Performance test results:
RAW:
DICT:
RLE:
CONST:
INTEGER_BASE_DIFF:
Other Information
Result csv files:
result-issue.csv
result-master.csv
mysql_test.zip