Tree-sitter 1.0 Checklist #930

maxbrunsfeld · 2021-02-20T23:26:55Z

In the not-too-distant future, I'd like to bump Tree-sitter's version to 1.0, indicating a greater degree of stability and completeness. After that I'd like to regenerate all of the parsers in the tree-sitter github org, and bump them to 1.0 as well. Before doing this, there are several important problems with the framework that I think should be fixed.

Tasks

Stretch Goals

I'm recording these here even though they are a bit less urgent.

Incremental Parsing Perf - Enhance the external scanner API to allow for looser state comparisons, avoiding the catastrophic node-reuse failures seen in the HTML parser (Incremental parsing is ineffective when a new tag is opened tree-sitter-html#23)
- Figure out if the new scanner function can be made optional (with the parser generator inspecting scanner.c to decide whether to link against a _compare function).
- Update tree-sitter-html to use this API, improving its incremental performance
Native Library, WASM parsers - Add a compile-time option to link the C library against a standard WASM engine (V8, wasmtime, or wasmer). When this feature is enabled, allow the native library to load WASM parsers, marshaling the parse table into native memory, and using WASM execution only for the lexing phase. This will make it more useful to distribute parsers as pre-compiled .wasm files, instead of as C code. The performance cost should be small, because all of the expensive parsing operations will still be native. Add optional WASM feature to the native library, allowing it to run wasm-compiled parsers via wasmtime #1864

The text was updated successfully, but these errors were encountered:

maxbrunsfeld · 2021-02-20T23:27:22Z

For anyone who is interested, please let me know if I've left important things off of this list ☝️ .

razzeee · 2021-02-21T02:16:05Z

Reads like tag queries are not going to be a 1.0 feature?

theHamsta · 2021-02-21T08:26:07Z

An alternative to removing the generated files would be to let them be pushed automatically on master by a CI bot. User can create mergable PRs by not needing to change any generated files. In this repo https://github.com/neovim/nvim-lspconfig/blob/master/.github/workflows/docgen.yml user coot changes to a configuration and a bit updates the documentation after each push on master.

razzeee · 2021-02-21T11:05:34Z

I think #516 should also be addressed, even if the function is marked experimental? At least document the behavior.

ahlinc · 2021-02-21T18:03:04Z

I would suggest to reduce implicitness:

Provide dsl.js as a regular file shipped with the tree-sitter-cli npm package and make it possible to require it as a regular JS library. This would help to extend it easily and would reduce confusion for IDEs and auto completion functionality in them. Behavior when dsl.js is embed in tree-sitter binary also would be good to save if the dsl.js wasn't required in the grammar file explicitly, this will allow to continue to use tree-sitter CLI as pretty standalone tool. Also this will make possible to separate the grammar.json generation in case of extended DSL or simplify its generation debugging as a regular node.js script.
If to talk about tree-sitter's independence it would be good that tree-sitter would have an embedded JS runtime #465 with a fallback to a system node.js if this is requested explicitly by some CLI parameter, IMO a deno library looks promising.

ahlinc · 2021-02-21T18:56:11Z

Also I saw that *.so files always have zeros in version spec like libtree-sitter.so.0.0 it would be good that minimal ABI compatible version would be reflected in the *.so.X.X suffix somehow.

dcreager · 2021-02-21T20:48:53Z

Note that the version number in those file names aren’t the same as the 1.0 semver release that @maxbrunsfeld is proposing. If there are any backwards incompatible changes as part of putting together this release, we’d bump the SOVERSION to 1; if not, we’d keep it at 0. More details can be found here.

maxbrunsfeld · 2021-02-22T05:32:57Z

@razzeee Tag queries are already done, but you're right that we still need to document them. I envision those mostly being documented in a GitHub-specific context, since there isn't much generally-useful functionally specific to Tags; it's mostly just a convention for tree queries that GitHub is using for code navigation. All of the broadly-useful stuff has been generalized into the query system. I added that to the TODOs around documentation though.

I think #516 should also be addressed, even if the function is marked experimental?

Yeah, you're right about that API being broken. I'm inclined to just address that for 1.0 by marking the function as half-baked. For our use cases, the API was only ever needed for the Haskell parser, and then we discontinued development of that parser because it was hard to find a good subset of the language that was amenable to parsing with a context-free grammar. It could definitely be made to work some day, but I think it's low-priority for us. There is still a bit of work to do to get it to play properly with incremental parsing.

Nevermind, this got fixed.

razzeee · 2021-02-22T08:21:15Z

@razzeee Tag queries are already done, but you're right that we still need to document them. I envision those mostly being documented in a GitHub-specific context, since there isn't much generally-useful functionally specific to Tags; it's mostly just a convention for tree queries that GitHub is using for code navigation. All of the broadly-useful stuff has been generalized into the query system. I added that to the TODOs around documentation though.

So you don't think tags make sense for others? I hoped, that it would help moving the queries towards the parser and thus having multiple projects consume these/improve these.

I think #516 should also be addressed, even if the function is marked experimental?

Yeah, you're right about that API being broken. I'm inclined to just address that for 1.0 by marking the function as half-baked. For our use cases, the API was only ever needed for the Haskell parser, and then we discontinued development of that parser because it was hard to find a good subset of the language that was amenable to parsing with a context-free grammar. It could definitely be made to work some day, but I think it's low-priority for us. There is still a bit of work to do to get it to play properly with incremental parsing.

Understandable, do I need to be worried about the incremental parsing bit? Moved our parser to use this on a regular basis now and it seemed good, after figuring out, while it always gets stuck...

razzeee · 2021-02-25T22:36:09Z

Nice strech goals would be:

ubolonton · 2021-02-26T10:18:05Z

CLI commands - Add new pack and publish subcommands to the Tree-sitter CLI, for uploading tarballs and compiled .wasm files to the GitHub releases API.

This is awesome. Currently for Emacs, I have a custom package that compiles the grammar binaries for the 3 major platforms, and distributes them through GitHub Releases, in a single bundle. Having a standard tool for individual language package to do this on their own would be great.

Will the official language repositories start distributing these binaries through GitHub Releases as well? I think some GitHub actions on top of these subcommands would be very helpful for that.

maxbrunsfeld · 2021-02-26T21:45:37Z

@ubolonton I might not take on the automation of compilation and storage of binary files (except for wasm) right now. I was mostly planning to use GH releases to store tarballs of generated files like parser.c, to avoid having so many merge conflicts in development.

WhyNotHugo · 2021-03-07T19:31:05Z

Add new pack and publish subcommands to the Tree-sitter CLI, for uploading tarballs and compiled .wasm files to the GitHub releases API.

~~I find this item problematic; what about tree-sitter implementations that are not hosted on GitHub? What's the plan on how those should be redistributed?~~

Never mind, I see now that this only applies only to tree-sitters in this org.

dcreager · 2021-03-07T19:53:31Z

@WhyNotHugo Yes, to confirm, the plan is not to mandate any particular hosting platform. Those commands will be able to produce the generated artifacts without uploading them as a GitHub release.

maxbrunsfeld · 2021-03-11T23:00:56Z

@razzeee I think you're right that the get_column problem is important. It's especially relevant now that tree-sitter-haskell has been revived from the dead (thanks @tek). I believe I've addressed all of the problems with that API.

razzeee · 2021-03-12T00:14:42Z

while I agree, feel it's disappointing that it needed that to happen. as there have been other grammars suffering from it. still, thank you ❤️

ahlinc · 2021-03-18T00:04:09Z

It would be awesome to automate release process for all official tree-sitter tools, especially for tree-sitter-cli, for all official bindings Wasm, Rust, Node.js, Python, Haskell, Ruby and the Playground with its separately living parsers and keep all in sync with the core tree-sitter library releases. This would help to reduce misunderstanding and situations that some things work somewhere and somewhere don't.

Versions

Bindings

Notes

For now tree-sitter-cli installation from the crate seems the bad idea, the crate is stuck in 2 years old version.
tree-sitter-highlight 0.19.2 does not compile with tree-sitter 0.19.5 #1122 - tree-sitter-highlight 0.19.2 does not compile with tree-sitter 0.19.5 - demonstrates an issue that changes in tree-sitter's Rust binding requires bumping version in all dependencies that use changed parts. Otherwise there need to be a CI check that would test that the last dependent can be built against all equal or higher versions of the dependence.
I can't say about all bindings but Node and Python bindings use static linking to tree-sitter core library and this means that these are lag behind the core library and don't receiving core fixes and logic improvements synchronously. IMO that's the important reason why such updates need to be automated. This doesn't solves problem with the core lib features covering but at least bug fixes would be delivered in time.

XVilka · 2021-04-07T11:44:27Z

I am not sure if this is actually possible - it would be also awesome if generated parser/runtime never segfaults. Showing errors, warnings, exiting - yes, but never segfaulting.

maxbrunsfeld · 2021-04-07T19:35:01Z

I am not sure if this is actually possible - it would be also awesome if generated parser/runtime never segfaults.

Obviously the library should never segfault. AFAIK, that's already the case. I think you're referencing tree-sitter/tree-sitter-c#64, which I can't reproduce after stripping out third-party libraries.

If anyone is seeing Tree-sitter cause a segfaults, and you can reproduce the problem, please report it.

likern · 2021-05-10T19:49:17Z

For anyone who is interested, please let me know if I've left important things off of this list .

Add generating bindings for Zig programming language. It's successor of C language.

It provides a lot of safety features, like Rust, and might be more because of runtime checks.
Very low-level, like C. But at the same time syntax and safety and tooling of modern language.
Very fast (faster than C)

casouri · 2021-07-24T17:56:30Z

tree-sitter should provide means to replace memory allocation functions at runtime. This allows us to link to tree-sitter as a library instead of embedding it.

stevenbarragan · 2021-09-22T20:13:52Z

+1 for better error messages.
related comment

CreatCodeBuild · 2021-09-23T07:00:50Z

Native Library, WASM parsers I would love to use wasm in other runtimes. Currently I am only able to use wasm in JS. But I would want to use it in wasmer and I don't want to use the c version because the same parser is run in different runtimes.

oovm · 2021-10-05T08:12:23Z

For wasm target, how about wasm-bindgen, which can generate Rust
and Typescript binding at the same time.

Typescript typing is really useful when working with VSCode LSP(Language Server Protocol)

drwpow · 2022-09-07T15:12:13Z

Suggestion: ESM format

In the interest of an evergreen format for 1.0 I’d like to recommend ESM over CJS (e.g. basically just changing module.exports to export default. Now that that’s the official module system of JS in all forms and is supported on web and Node.js, that’s a breaking change that would be easier to do sooner than later.

Happy to help with this if this is a desirable change! But just a suggestion I’ll leave to the author/maintainers to decide 🙂

maxbrunsfeld · 2022-09-09T16:03:26Z

Suggestion: ESM format

Yeah, I've been thinking about this too @drwpow. I added this to the list, as well as an item about reducing our coupling to npm in general.

To clarify, do you want this to be WASM engine implementation agnostic, as per your link to wasm-c-api, or is it fine to just embed a specific WASM engine?

@lambdadog I started work on this issue in #1864. I ended up going with a solution that's specifically tied to wasmtime for now.

kevinbarabash · 2022-10-08T15:39:31Z

@maxbrunsfeld I worked around the issue of having to check-in build files by running yarn install and yarn generate as part of the build.rs file. Thankfully yarn generate doesn't clobber this file. One issue I ran into is that binding.gyp cannot be checked in otherwise yarn install fails. I got around this by renaming it to real-binding.gyp and then copying it to binding.gyp after running yarn install. This seems to work even if it is a bit janky. See escalier-lang/escalier#288 to see this approach in action.

xiaoma20082008 · 2023-11-21T15:49:03Z

maybe this issue Standardized node name need to be released

The ABI break seemed to be unintentional, but adding a subslot will be useful in the future as a break with version 1.0 of tree-sitter looks to be planned. Ref: tree-sitter/tree-sitter#930 (comment) Bug: https://bugs.gentoo.org/930039 Signed-off-by: Matthew Smith <[email protected]>

maxbrunsfeld pinned this issue Feb 20, 2021

silvanshade mentioned this issue Feb 26, 2021

Error reporting #946

Closed

josteink mentioned this issue Mar 4, 2021

Code-docs are highlighted as normal comments (tree-sitter-mode) emacs-csharp/csharp-mode#217

Closed

This was referenced Jun 5, 2021

Unable to setup on mac tree-sitter/tree-sitter-haskell#34

Closed

Add a changelog Markdown file. #1164

Closed

ahlinc mentioned this issue Jul 1, 2021

Reliable releases distribution for tree-sitter CLI and other components #1223

Open

23 tasks

sogaiu mentioned this issue Aug 2, 2022

tree-sitter web-ui doesn't show parse tree in right pane sogaiu/tree-sitter-clojure#17

Closed

dcreager mentioned this issue Oct 19, 2022

Documentation on general issues around language grammars #1859

Draft

ahlinc mentioned this issue Nov 10, 2022

Fix versioning in Makefile #1956

Closed

sogaiu mentioned this issue Dec 28, 2022

Update tree-sitter-cli to 0.20.6 sogaiu/tree-sitter-clojure#26

Closed

This was referenced Jan 27, 2023

Ignore generated files. sogaiu/tree-sitter-clojure#1

Merged

Which files and directories are maintained and important is unclear sogaiu/tree-sitter-clojure#38

Closed

ahlinc mentioned this issue Feb 10, 2023

predicate/directive evaluation in runtime library #2038

Open

theHamsta mentioned this issue Feb 18, 2023

treesitter distribution strategy (tree-sitter) neovim/neovim#22313

Open

Ben3eeE mentioned this issue Feb 21, 2023

Ruby parser.c size tree-sitter/tree-sitter-ruby#223

Closed

sogaiu mentioned this issue Mar 17, 2023

Modifying the tree-sitter grammar clojure-emacs/clojure-ts-mode#4

Open

ahlinc mentioned this issue Apr 10, 2023

TreeCursor.currentNode is a property in Node but a function in the browser #2195

Closed

stefnotch mentioned this issue May 12, 2023

Use partial precedence system stefnotch/aftermath-editor#50

Open

XVilka mentioned this issue Aug 14, 2023

Release process transparency #2501

Closed

tree-sitter deleted a comment from Kennobi19 Aug 20, 2023

amaanq unpinned this issue Aug 30, 2023

alex-pinkus mentioned this issue Sep 10, 2023

Include generated source files in the git repo alex-pinkus/tree-sitter-swift#315

Closed

amaanq pinned this issue Sep 10, 2023

sogaiu mentioned this issue Sep 10, 2023

Docstring indentation is too aggressive when semantic indentation is enabled. clojure-emacs/clojure-ts-mode#18

Open

rabbiveesh mentioned this issue Oct 26, 2023

Missing parser.c tree-sitter-perl/tree-sitter-perl#142

Closed

ahlinc mentioned this issue Nov 29, 2023

. #2797

Closed

alaviss mentioned this issue Dec 11, 2023

AST breaks with ( in declaration section alaviss/tree-sitter-nim#64

Closed

dundargoc added this to the 1.0 milestone Feb 6, 2024

alex-pinkus mentioned this issue Feb 26, 2024

Add build artifacts in repository alex-pinkus/tree-sitter-swift#362

Closed

amaanq unpinned this issue May 7, 2024

amaanq pinned this issue May 7, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tree-sitter 1.0 Checklist #930

Tree-sitter 1.0 Checklist #930

maxbrunsfeld commented Feb 20, 2021 •

edited by amaanq

maxbrunsfeld commented Feb 20, 2021

razzeee commented Feb 21, 2021

theHamsta commented Feb 21, 2021

razzeee commented Feb 21, 2021

ahlinc commented Feb 21, 2021 •

edited

ahlinc commented Feb 21, 2021

dcreager commented Feb 21, 2021

maxbrunsfeld commented Feb 22, 2021 •

edited

razzeee commented Feb 22, 2021

razzeee commented Feb 25, 2021

ubolonton commented Feb 26, 2021

maxbrunsfeld commented Feb 26, 2021

WhyNotHugo commented Mar 7, 2021 •

edited

dcreager commented Mar 7, 2021

maxbrunsfeld commented Mar 11, 2021

razzeee commented Mar 12, 2021 •

edited

ahlinc commented Mar 18, 2021 •

edited

XVilka commented Apr 7, 2021

maxbrunsfeld commented Apr 7, 2021

likern commented May 10, 2021

casouri commented Jul 24, 2021

stevenbarragan commented Sep 22, 2021

CreatCodeBuild commented Sep 23, 2021

oovm commented Oct 5, 2021

drwpow commented Sep 7, 2022 •

edited

maxbrunsfeld commented Sep 9, 2022

kevinbarabash commented Oct 8, 2022

xiaoma20082008 commented Nov 21, 2023

Tree-sitter 1.0 Checklist #930

Tree-sitter 1.0 Checklist #930

Comments

maxbrunsfeld commented Feb 20, 2021 • edited by amaanq

Tasks

Stretch Goals

maxbrunsfeld commented Feb 20, 2021

razzeee commented Feb 21, 2021

theHamsta commented Feb 21, 2021

razzeee commented Feb 21, 2021

ahlinc commented Feb 21, 2021 • edited

ahlinc commented Feb 21, 2021

dcreager commented Feb 21, 2021

maxbrunsfeld commented Feb 22, 2021 • edited

razzeee commented Feb 22, 2021

razzeee commented Feb 25, 2021

ubolonton commented Feb 26, 2021

maxbrunsfeld commented Feb 26, 2021

WhyNotHugo commented Mar 7, 2021 • edited

dcreager commented Mar 7, 2021

maxbrunsfeld commented Mar 11, 2021

razzeee commented Mar 12, 2021 • edited

ahlinc commented Mar 18, 2021 • edited

Versions

Bindings

Notes

XVilka commented Apr 7, 2021

maxbrunsfeld commented Apr 7, 2021

likern commented May 10, 2021

casouri commented Jul 24, 2021

stevenbarragan commented Sep 22, 2021

CreatCodeBuild commented Sep 23, 2021

oovm commented Oct 5, 2021

drwpow commented Sep 7, 2022 • edited

maxbrunsfeld commented Sep 9, 2022

kevinbarabash commented Oct 8, 2022

xiaoma20082008 commented Nov 21, 2023

maxbrunsfeld commented Feb 20, 2021 •

edited by amaanq

ahlinc commented Feb 21, 2021 •

edited

maxbrunsfeld commented Feb 22, 2021 •

edited

WhyNotHugo commented Mar 7, 2021 •

edited

razzeee commented Mar 12, 2021 •

edited

ahlinc commented Mar 18, 2021 •

edited

drwpow commented Sep 7, 2022 •

edited