New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf(treesitter): use child_containing_descendant() in has-ancestor? #28512
Conversation
|
Obviously we need to bump the tree-sitter dependency in |
@@ -725,6 +725,7 @@ static struct luaL_Reg node_meta[] = { | |||
{ "descendant_for_range", node_descendant_for_range }, | |||
{ "named_descendant_for_range", node_named_descendant_for_range }, | |||
{ "parent", node_parent }, | |||
{ "child_containing_descendant", node_child_containing_descendant }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is TSNode
intended to directly mirror the TS api? or can we choose better names?
The ancestor checking part of the predicate can be moved into C, which would reduce the time down to 1.5ms from 7.5ms (measured on a different computer). ['has-ancestor?'] = function(match, _, _, predicate)
local nodes = match[predicate[2]]
if not nodes or #nodes == 0 then
return true
end
for _, node in ipairs(nodes) do
if node:__has_ancestor(predicate) then
return true
end
end
return false
end, static int __has_ancestor(lua_State *L)
{
TSNode descendant = node_check(L, 1);
if(lua_type(L, 2) != LUA_TTABLE) {
lua_pushboolean(L, false);
return 1;
}
int const pred_len = lua_objlen(L, 2);
TSNode node = ts_tree_root_node(descendant.tree);
while(!ts_node_is_null(node)) {
char const *node_type = ts_node_type(node);
size_t node_type_len = strlen(node_type);
for (int i = 3; i <= pred_len; i++) {
lua_rawgeti(L, 2, i);
if (lua_type(L, -1) == LUA_TSTRING) {
size_t check_len;
char const *check_str = lua_tolstring(L, -1, &check_len);
if(node_type_len == check_len && memcmp(node_type, check_str, check_len) == 0) {
lua_pushboolean(L, true);
return 1;
}
}
lua_pop(L, 1);
}
node = ts_node_child_containing_descendant(node, descendant);
}
lua_pushboolean(L, false);
return 1;
} Is this an overkill? Or how should I name the function? |
I wonder if that is not something upstream would be interested in as well? @amaanq |
Upstream doesn't use the Lua API, so this would need to be significantly rewritten to use only TS structs/types. I think this is fine as it is. Whether it's done in C or Lua doesn't matter too much IMO. |
The only worry here is that we're injecting our own API functions into the upstream tree-sitter API; that may lead to confusion. But maybe it's worth it? We're already not exposing the API exactly (e.g., I'd be fine with keeping it in Lua for now, but we could also keep it internal at first and discuss exposing it (as |
@vanaigr I've just bumped to tree-sitter 0.22.6 on |
Yeah has-ancestor is a pretty good candidate for a predicate for upstream to support - it'd be worth potentially opening a PR for Max's thoughts |
11445a3
to
589d39a
Compare
@vanaigr This PR needs one of two things:
As we need to bump anyway for wasm parsers, I would prefer 1. for simplicity. |
Bumping the min version sounds good to me. |
required for `ts_node_child_containing_descendant()`
589d39a
to
2f89f59
Compare
Force-pushed. @vanaigr is there any reason not to squash these commits before merging? |
Either way is fine for me. |
Squashed, then, with notes from the PR desciption added to the commit message. Thank you! |
…eovim#28512) Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time. This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested. Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n). For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
…eovim#28512) Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time. This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested. Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n). For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
This comment was marked as off-topic.
This comment was marked as off-topic.
it appears the treesitter version has to be bumped :) |
…eovim#28512) Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time. This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested. Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n). For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
Closes #24965.
has-ancestor?
is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time.This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested.
ts_node_child_containing_descendant()
matches how trees-sitter searches for the node's parent internally and makeshas-ancestor?
is O(n).The predicate is also rewritten in C to avoid allocations for each ancestor node and their type strings.
For the file in the issue, decreases the time taken by
has-ancestor?
from 360ms to 6ms.