Versioning and releases for tree-sitter grammar repos #1768
Replies: 9 comments 22 replies
-
Hi Will! Yes, the project has experienced some growing pains as the amount of open source contribution has increased. I'd really love to improve the situation, but I have limited time to work on these issues right now. Some possible interested parties:
In particular, @Luni-4 has suggested to me some ways of using GitHub actions to improve the situation (see tree-sitter/tree-sitter-java#98). Hi Luni-4! I'm really sorry for not taking any action on your suggestion yet; I've just been very busy these last several months. I'm wondering if you folks using Tree-sitter at bigger companies can help set some direction on how we want to do releases and versioning and such. It seems like there are some other folks using Tree-sitter who are willing to help out if we can decide on steps to take. |
Beta Was this translation helpful? Give feedback.
-
Thanks for opening this discussion, @Will-Sommers, it's been a topic that's been front of mind recently.
For the individual language grammars, we've found that tagged version numbers have actually been less helpful than we assumed they would be. We're relying on pinned dependencies on specific commit SHAs for each language grammar repo that we're consuming, such as: [dependencies]
tree-sitter-python = { git = "https://github.com/tree-sitter/tree-sitter-python", rev = "0d17ed665458c49eb9c6e95155e196a85bff67b7" } Or maybe said a different way, it could still be useful to tag versions, but I don't think we should choose version numbers that try to align with the releases of What we found — at least right now while there's still a bit of churn in the grammars — was that basically every merged PR was changing a grammar, or its highlight or tagging rules, enough that it would deserve a semver major version bump. So for now we've just been using pinned SHAs. |
Beta Was this translation helpful? Give feedback.
-
Hello everyone and thank you for opening this discussion @Will-Sommers! @maxbrunsfeld yep, I supposed you were busy, but no problem at all, let's try to find a solution that could be suitable for each of us! Well, the main problem we are experiencing in Why we think this approach could be helpful:
As developer, I'm willing to add the action for each grammar, unfortunately I can set a secret environment variable because I don't have privileges. Feedback and critiques about this approach are appreciated! Thanks! |
Beta Was this translation helpful? Give feedback.
-
Let me try to draw a conclusion from the previous threads (one and two):
Regardless of the consensus we reach, I think it would be useful to have a section on the Tree-sitter website documenting these practices for grammars, as a reference for maintainers and contributors. |
Beta Was this translation helpful? Give feedback.
-
Can we set up a deadline for this one? For example, one month from now? |
Beta Was this translation helpful? Give feedback.
-
I've been working for a while on integrating tree-sitter into vcpkg:
This is mainly to solve the issue of distributing parsers, by utilizing CMake's capabilities for cross-platform support (based on #1822) Take a look https://github.com/kylo252/tree-sitter-registry for a demo of how simpler and more robust the workflow can be with the help of vcpkg. Highlights
Extra goodies from using vcpkg
|
Beta Was this translation helpful? Give feedback.
-
This has come up again in a couple of places, e.g.:
Since we're rehashing this discussion down in grammar repos, I thought it would be good to escalate here again to see if we can come to a consensus. Broadly speaking, there seem to be two self-consistent strategies for publishing and consuming grammars, and as tree-sitter maintainers we should choose which one to endorse/enforce:
From what I can tell, it seems that consensus among the maintainers is to prefer (1). @maxbrunsfeld @ahlinc @amaanq, do you agree? Any other maintainers I should mention to weigh in? [For the record, I prefer (2), but I more prefer to have a decision made and recorded somewhere, so if more maintainers prefer (1) I'll defer to that.] Now, on the assumption that we decide to go with (1), I also want to bring up a corollary for discussion. tree-sitter/tree-sitter-javascript#294 came up as a specific issue because the version numbers we are publishing to crates.io do not line up with the "all bets are off" guarantees. When people see a patch release to a So that means we either have to carefully document everywhere that we are not making semver guarantees with our version numbers (and somehow get all of our downstream consumers to notice and pay attention to that). Or we should choose version numbers that do line up with our guarantees — for instance, a date-based version like |
Beta Was this translation helpful? Give feedback.
-
As nvim-treesitter maintainer (arguably the biggest consumer of tree-sitter grammars on the web with over 250 supported languages and counting), I am strongly in favor of 2 and have been pretty obnoxious about that in the past. In a nutshell, semver signals compatibility to downstream, which here is queries. So it's straightforward (if your repo contains queries, which there should be no reason not to) to verify whether a change
(Yes, that means you'll quickly end up with Chrome-level version numbers. There are worse things.) I'll add that this will be of limited use, though, as long as tree-sitter does not support parser introspection -- in particular, checking its version -- which would allow me to detect an incompatible query before YOLO parsing it and having to sweep up a heap of angry red errors.
I'll also add that semver has an explicit proviso for major version zero, which covers this behavior. It's cargo that is wrong here. (And in fact tree-sitter serves a much wider ecosystem than just Rust.) That doesn't mean that datever is not a reasonable versioning strategy, mind you; but I do think we can do better and convey meaningful semantic information. |
Beta Was this translation helpful? Give feedback.
-
It sounds like most downstream consumers prefer that we try to use semver in a more fine-grained way in grammar repos. I'm fine with adopting that policy. What code and documentation changes do we need to make in order to begin implementing this policy? |
Beta Was this translation helpful? Give feedback.
-
👋👋👋 Heyo! Thanks so much for the project, really really liking it and digging into it.
This isn't in a
tree-sitter-{lang}
repo as it appears to be a common issue across the org. I've noticed that nearly all of thetree-sitter-{lang}
repos are currently on versions that were mostly cut in March of 2021. Making a wildly successfully project probably comes with a lot of dev and maintenance.Thanks again!
Beta Was this translation helpful? Give feedback.
All reactions