Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf(treesitter): use child_containing_descendant() in has-ancestor? #28512

Merged
merged 5 commits into from May 16, 2024

Conversation

vanaigr
Copy link
Contributor

@vanaigr vanaigr commented Apr 26, 2024

Closes #24965.

has-ancestor? is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time.
This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested.

ts_node_child_containing_descendant() matches how trees-sitter searches for the node's parent internally and makes has-ancestor? is O(n).

The predicate is also rewritten in C to avoid allocations for each ancestor node and their type strings.

For the file in the issue, decreases the time taken by has-ancestor? from 360ms to 6ms.

@vanaigr
Copy link
Contributor Author

vanaigr commented Apr 26, 2024

ts_node_child_containing_descendant() is in tree-sitter's master, but not yet available in any release.

@zeertzjq zeertzjq added the performance issues reporting performance problems label Apr 26, 2024
@clason clason added this to the 0.11 milestone Apr 26, 2024
@clason
Copy link
Member

clason commented Apr 26, 2024

Obviously we need to bump the tree-sitter dependency in deps.txt and the minimal version. I don't think that will happen before the 0.10 release, but we can bump to a prerelease commit early in the 0.11 cycle. The new function(ality) will have to be hidden behind an #ifdef guard, similar to how we did it for the match limit, before we bump to a release.

@@ -725,6 +725,7 @@ static struct luaL_Reg node_meta[] = {
{ "descendant_for_range", node_descendant_for_range },
{ "named_descendant_for_range", node_named_descendant_for_range },
{ "parent", node_parent },
{ "child_containing_descendant", node_child_containing_descendant },
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is TSNode intended to directly mirror the TS api? or can we choose better names?

@vanaigr
Copy link
Contributor Author

vanaigr commented May 2, 2024

The ancestor checking part of the predicate can be moved into C, which would reduce the time down to 1.5ms from 7.5ms (measured on a different computer).

  ['has-ancestor?'] = function(match, _, _, predicate)
    local nodes = match[predicate[2]]
    if not nodes or #nodes == 0 then
      return true
    end

    for _, node in ipairs(nodes) do
      if node:__has_ancestor(predicate) then
        return true
      end
    end
    return false
  end,
static int __has_ancestor(lua_State *L)
{
  TSNode descendant = node_check(L, 1);
  if(lua_type(L, 2) != LUA_TTABLE) {
    lua_pushboolean(L, false);
    return 1;
  }
  int const pred_len = lua_objlen(L, 2);

  TSNode node = ts_tree_root_node(descendant.tree);
  while(!ts_node_is_null(node)) {
    char const *node_type = ts_node_type(node);
    size_t node_type_len = strlen(node_type);

    for (int i = 3; i <= pred_len; i++) {
      lua_rawgeti(L, 2, i);
      if (lua_type(L, -1) == LUA_TSTRING) {
        size_t check_len;
        char const *check_str = lua_tolstring(L, -1, &check_len);
        if(node_type_len == check_len && memcmp(node_type, check_str, check_len) == 0) {
          lua_pushboolean(L, true);
          return 1;
        }
      }
      lua_pop(L, 1);
    }

    node = ts_node_child_containing_descendant(node, descendant);
  }

  lua_pushboolean(L, false);
  return 1;
}

Is this an overkill? Or how should I name the function?

@clason
Copy link
Member

clason commented May 3, 2024

I wonder if that is not something upstream would be interested in as well? @amaanq

@lewis6991
Copy link
Member

I wonder if that is not something upstream would be interested in as well? @amaanq

Upstream doesn't use the Lua API, so this would need to be significantly rewritten to use only TS structs/types.

I think this is fine as it is. Whether it's done in C or Lua doesn't matter too much IMO.

@clason
Copy link
Member

clason commented May 3, 2024

The only worry here is that we're injecting our own API functions into the upstream tree-sitter API; that may lead to confusion. But maybe it's worth it? We're already not exposing the API exactly (e.g., named_descendant_for_range is not a tree-sitter API function).

I'd be fine with keeping it in Lua for now, but we could also keep it internal at first and discuss exposing it (as node_has_ancestor()) if some other plugin shows a separate use for it.

@clason
Copy link
Member

clason commented May 5, 2024

@vanaigr I've just bumped to tree-sitter 0.22.6 on master, so if you rebase, CI should pass.

@amaanq
Copy link
Contributor

amaanq commented May 5, 2024

I wonder if that is not something upstream would be interested in as well? @amaanq

Yeah has-ancestor is a pretty good candidate for a predicate for upstream to support - it'd be worth potentially opening a PR for Max's thoughts

@vanaigr vanaigr force-pushed the query-perf-has-ancestor branch 2 times, most recently from 11445a3 to 589d39a Compare May 6, 2024 00:16
@vanaigr vanaigr marked this pull request as ready for review May 6, 2024 00:16
@github-actions github-actions bot requested a review from wookayin May 16, 2024 10:42
@clason
Copy link
Member

clason commented May 16, 2024

@vanaigr This PR needs one of two things:

  1. bump the required tree-sitter version to 0.22.6, or
  2. put this change behind a feature guard as in https://github.com/neovim/neovim/pull/22710/files

As we need to bump anyway for wasm parsers, I would prefer 1. for simplicity.

@lewis6991 @jamessan @justinmk ?

@jamessan
Copy link
Member

Bumping the min version sounds good to me.

@clason
Copy link
Member

clason commented May 16, 2024

Force-pushed. @vanaigr is there any reason not to squash these commits before merging?

@vanaigr
Copy link
Contributor Author

vanaigr commented May 16, 2024

is there any reason not to squash these commits before merging?

Either way is fine for me.

@clason clason merged commit 4b02916 into neovim:master May 16, 2024
31 checks passed
@clason
Copy link
Member

clason commented May 16, 2024

Squashed, then, with notes from the PR desciption added to the commit message. Thank you!

altermo pushed a commit to altermo/neovim-fork that referenced this pull request May 16, 2024
…eovim#28512)

Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time.
This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested.

Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n).

For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
altermo pushed a commit to altermo/neovim-fork that referenced this pull request May 16, 2024
…eovim#28512)

Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time.
This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested.

Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n).

For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
@MangoIV

This comment was marked as off-topic.

@MangoIV
Copy link

MangoIV commented May 17, 2024

it appears the treesitter version has to be bumped :)

icholy pushed a commit to icholy/neovim that referenced this pull request May 17, 2024
…eovim#28512)

Problem: `has-ancestor?` is O(n²) for the depth of the tree since it iterates over each of the node's ancestors (bottom-up), and each ancestor takes O(n) time.
This happens because tree-sitter's nodes don't store their parent nodes, and the tree is searched (top-down) each time a new parent is requested.

Solution: Make use of new `ts_node_child_containing_descendant()` in tree-sitter v0.22.6 (which is now the minimum required version) to rewrite the `has-ancestor?` predicate in C to become O(n).

For a sample file, decreases the time taken by `has-ancestor?` from 360ms to 6ms.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci:skip-news performance issues reporting performance problems treesitter
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Poor performance with Treesitter highlighting on deep if-else in c
8 participants