Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update tree-sitter-cli to 0.20.6 #26

Closed

Conversation

NoahTheDuke
Copy link

Closes #17.

@sogaiu
Copy link
Owner

sogaiu commented Jul 14, 2022

The details are escaping me at the moment, but I think there was something on the tree-sitter end I'd been waiting for before going through with an upgrade to the 0.20.x series.

I'll see if I can dig up what that was.

@sogaiu
Copy link
Owner

sogaiu commented Jul 15, 2022

I've not turned up what was holding things up yet.

Regarding the PR though, there are a few things I would do differently.

  1. Not change the lock file version (and format) for package-lock.json -- it has been 1 and I'm not prepared to change that at this time. I don't know the details of how this gets set / decided but possibly it has to do with the version of npm being used.
  2. Not use ^ for the version info of tree-sitter-cli in package.json. IIUC, that ^ can end up in package.json "sneakily" (i.e. without a user realizing it). I've modified the repository README to try to prevent this sort of thing from occurring in the future.

I appreciate the effort, but If an update to tree-sitter-cli's version is to be made, it might be simpler if I made those changes to get the above sorts of details to work out. I might also modify Cargo.toml in the process.

@NoahTheDuke
Copy link
Author

Sure, do it however you want.

Moving to v2 of package-lock.json seems like a good thing tho, seeing as npm's v6 isn't the default in current or maintained node releases and it's backwards compatible. What's the goal of keeping it at v1?

@sogaiu
Copy link
Owner

sogaiu commented Jul 16, 2022

Re: version 1 - I think when I started working on tree-sitter-clojure and other things, the tooling that got set up was oriented around version 1 lock files -- node 12.x + npm 6.y. That's still what I'm currently using (and I have other projects that use the same tooling).

Perhaps I should switch to 14.x, but I haven't investigated potential consequences yet.

IIUC, according to: https://nodejs.org/en/about/releases/ Node.js v14 is in maintenance (and will be until next year?). When I install that with nvm, it looks like the corresponding npm is v6. I don't know if the following is correct, but looking at: https://www.abrahamberg.com/blog/npm-package-json-lock-version-1-or-2/ node v6 uses a version 1 lock file.

May be some of the above is wrong?

@NoahTheDuke
Copy link
Author

Oops, maybe I'm mistaken then. My apologies. I thought node 14 came with npm v6, but it looks like you are right. Haha I gotta double check things before I say shit I don't actually know.

@sogaiu
Copy link
Owner

sogaiu commented Jul 17, 2022

Thanks for bringing the point up in any case -- I wasn't aware of the finer details.

Just to be clear, I think node 12 comes with npm 6.x and node 14 comes with npm 6.y -- at least when I install with nvm that's what I see mentioned.

It looks like this file: https://nodejs.org/dist/index.json has these sorts of details. According to it:

{"version":"v12.22.12","date":"2022-04-05","files":["aix-ppc64","headers","linux-arm64","linux-armv7l","linux-ppc64le","linux-s390x","linux-x64","osx-x64-pkg","osx-x64-tar","src","sunos-x64","win-x64-7z","win-x64-exe","win-x64-msi","win-x64-zip","win-x86-7z","win-x86-exe","win-x86-msi","win-x86-zip"],"npm":"6.14.16","v8":"7.8.279.23","uv":"1.40.0","zlib":"1.2.11","openssl":"1.1.1n","modules":"72","lts":"Erbium","security":false},

and:

{"version":"v14.20.0","date":"2022-07-07","files":["aix-ppc64","headers","linux-arm64","linux-armv7l","linux-ppc64le","linux-s390x","linux-x64","osx-x64-pkg","osx-x64-tar","src","win-x64-7z","win-x64-exe","win-x64-msi","win-x64-zip","win-x86-7z","win-x86-exe","win-x86-msi","win-x86-zip"],"npm":"6.14.17","v8":"8.4.371.23","uv":"1.42.0","zlib":"1.2.11","openssl":"1.1.1q","modules":"83","lts":"Fermium","security":true},

which in short is:

  • node 12.22.12 - npm 6.14.16
  • node 14.20.0 - npm 6.14.17

@SignSpice
Copy link

SignSpice commented Jul 17, 2022

@NoahTheDuke
Copy link
Author

Thanks for the links and refernce, @SignSpice! Good to see a potential reason. Bummer that the emacs folks are stuck with an old version (for the time being).

From a brief overview of some available tree-sitter implementations (here's a pretty big list), it seems that there's no consensus on specific versions, that each implementation uses whichever version they feel is best.

@sogaiu
Copy link
Owner

sogaiu commented Jul 18, 2022

@SignSpice Ah, I remember what you pointed out but I don't remember if that was a factor in holding back :)

Thanks for the links in any case.

@sogaiu
Copy link
Owner

sogaiu commented Dec 26, 2022

I haven't seen any recent activity regarding the issue that is holding up elisp-tree-sitter from moving to the 0.20.x series.

IIUC, Emacs 29 may be built with tree-sitter support, so possibly before long, use of elisp-tree-sitter will be less prevalent. I've used some commits from the emacs-29 branch as well as the master branch recently and they've been working more or less ok for me.

I think @dannyfreeman mentioned the idea of having a branch of tree-sitter-clojure that still uses 0.19.x (for those who want to use elisp-tree-sitter) and switching the master branch over to the 0.20.y series.

@dannyfreeman
Copy link
Collaborator

I think maintaining two branches for the different versions wouldn't be all that bad (I would be happy to do that work).

It might be worth upgrading after #31 is merged. I'll also be that this issue #32 get fixed in tree-sitter itself and will require the use of a newer version of tree-sitter.

If we maintain two branches, I think we would need to adopt a new practice regarding commits in this repository. Since the generated files will generate a lot of merge conflicts that will be difficult to resolve between the two branches, going forward it would probably be best to commit changes to the grammar.js file in it's own commit, and changes to the generated files in a separate commit. This would allow cherry picking grammar changes from master to the 19.x branch, then generating in a separate commit on the 19.x branch without worrying about big conflicts.

@sogaiu
Copy link
Owner

sogaiu commented Dec 28, 2022

Thanks for your thoughts (and offer!)

It might be worth upgrading after #31 is merged. I'll also be that this issue #32 get fixed in tree-sitter itself and will require the use of a newer version of tree-sitter.

Yes, ATM it does seem like a good idea to wait at least for #31 to be merged.

It would be nice if #32 got fixed, but hard to say if / when it will happen. I mentioned this in #32, but it seems possible for Emacs' use case that it could be worked around for the time being as Emacs' treesit.c already has code that looks like it could be used or adapted to do what's necessary. So, if that's really the case, may be there wouldn't be so much motivation to change things in tree-sitter itself (I went looking elsewhere for users of the underlying ts_node__first_child_for_byte and didn't manage to turn up anything -- it doesn't appear to be used internally by tree-sitter either.)

If we do go with the 2-branch approach, the idea of separating commits for grammar.js vs the generated files sounds good.

On the note of generated files, there is the following bit at the main tree-sitter repository:

Mergeable Git Repos - Make it easier to collaborate on grammars by removing generated files from version control.

That's from the Tree-sitter 1.0 Checklist issue though so perhaps it won't be any time soon...

@dannyfreeman
Copy link
Collaborator

Mergeable Git Repos - Make it easier to collaborate on grammars by removing generated files from version control.

That's from the tree-sitter/tree-sitter#930 though so perhaps it won't be any time soon...

When I first started working on this I wondered why we did this and thought it would be better to remove the generated files from source control. Now after seeing this very large thread in emacs mailing list: https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg00692.html
I've come to think that keeping these files in source control is a little better because it makes distributing the grammars much easier. To compile them, users only need to have a C compiler. They can clone the repo, compile, and link, there is no need to nodejs or rust. This simplifies tooling needed by end users and makes the job of distro packagers that distribute grammars (like nixos does right now).

@sogaiu
Copy link
Owner

sogaiu commented Dec 29, 2022

Thanks for your comments and that link.

I've come to think that keeping these files in source control is a little better because it makes distributing the grammars much easier.

I can see that.

There is also this though:

[ The paranoid among us might point out that there's no guarantee the
.c file actually matches the accompanying sources, and that maybe
we should generate the .c file and distribute them from our how
repository. ]

via: https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01180.html

I would add that there's also the issue of an unintentional mismatch.

FWIW, I think there has been some discusson at the tree-sitter repository regarding the use of node.js:


On a side note, there was this issue where there was some discussion of using something other than grammar.js.

@sogaiu
Copy link
Owner

sogaiu commented Dec 30, 2022

Regarding:

It might be worth upgrading after #31 is merged.

There is at least one reason to consider ugprading to at least 0.19.5. I've documented that here.

IIUC, that ugprade should not adversely impact elisp-tree-sitter.

@sogaiu
Copy link
Owner

sogaiu commented Dec 31, 2022

Related to:

FWIW, I think there has been some discusson at the tree-sitter repository regarding the use of node.js

I came across the following 2 posts at the emacs-devel mailing list archives:

The first has a nice overview diagram of a pipeline for generating the C / C++ starting with grammar.js (edited a bit for width considerations):

{grammar.js} -> [Node.js] -> 
{grammar.json} -> [tree-sitter] -> 
{parser.c , scanner.c / scanner.cc} -> [cc or c++] -> 
{library file}
---------------------------------------------------------^

where {} are files and [] are programs.

The second has a JavaScript program that can be used with quickjs to produce grammar.json from grammar.js (modified slightly here to give a more realistic path):

global = {}
module = {}
process = { env: { TREE_SITTER_GRAMMAR_PATH: 'grammar.js' } }
function require(file) {
  const pref = [ "", "./", "../", "../../", "../../../" ];
  const suff = [ "", ".js" ];
  for (let i in pref)
    for (let j in suff) {
      const f = pref[i] + file + suff[j];
      if (std.open(f, "r") !== null) {
        os.chdir(f.match(/.*\//));
        eval(std.loadFile(f.replace(/.*\//, '')));
        return module.exports;
      }
    }
  throw Error('File ' + file + ' not found');
}
// XXX: path below should be to a local copy of https://github.com/tree-sitter/tree-sitter/blob/c669e5ee159e0c59a3f094327a01dd688bc67c56/cli/src/generate/dsl.js
std.loadScript('/home/alice/src/tree-sitter/cli/src/generate/dsl.js')

To produce grammar.json from grammar.js, first:

  • Ensure quickjs is installed / available -- straight-forward Makefile (might want to edit prefix if installing, but it should run in-place)
  • Edit the script to contain an appropriate path for dsl.js
  • Put the script (call it, say, gen.js) in tree-sitter-clojure's top-level directory
  • Ensure current working directory is tree-sitter-clojure

Then invoke qjs --std gen.js > grammar.json.

The result can be compared with what is typically generated via the "ordinary" method:

diff grammar.json src/grammar.json

Not all that different.

@sogaiu
Copy link
Owner

sogaiu commented Jan 2, 2023

Looks like quickjs might be usable at least for Linux, macos, and Windows.

Binary releases appear to be available for Linux and Windows from the author of quickjs: https://bellard.org/quickjs/binary_releases/ (On a side note, I didn't have luck building on Windows yet.)

There appears to be something for brew for macos: https://formulae.brew.sh/formula/quickjs

Not using Node.js would be nice [1], but I'm not sure yet whether it's worth transitioning to some other approach yet.


[1] Getting away from the seemingly needless churn and vulnerabilities would be welcome (the docs might get simpler too...).

@sogaiu
Copy link
Owner

sogaiu commented Jan 4, 2023

It looks like this line is currently where node is invoked by the Rust tree-sitter program.

I think dannyfreeman pointed out and maxbrunsfeld had earlier confirmed that while nodejs is (currently) necessary for producing grammar.json from grammar.js, npm is not.

On a related note, maxbrunsfeld also mentioned in another discussion that the version of nodejs does not matter. I don't know if that will continue to remain true as I'm not sure what nodejs' future will be like.

maxbrunsfeld also mentioned a few specific constructs that are in nodejs that are relied on -- process.env and require. I think these may not be in quickjs, but it looked straight-forward to "shim" them as evidenced by the code in an earlier comment.

maxbrunsfeld also appeared quite open to the idea of having changes in tree-sitter to allow alternative javascript runtimes. AFAIK, it hasn't been done yet though.

There is something related(?) mentioned in the Tree-sitter 1.0 Checklist:

Reduce Coupling to Node - Introduce some Tree-sitter specific GRAMMAR_PATH setting where the CLI will search for grammar modules, instead of relying on node_modules and npm.

Though I'm not quite sure what that means.

One of the other tree-sitter maintainers, ahlinc, also mentioned:

If to talk about tree-sitter's independence it would be good that tree-sitter would have tree-sitter/tree-sitter#465 with a fallback to a system node.js if this is requested explicitly by some CLI parameter, IMO a deno library looks promising.

@sogaiu
Copy link
Owner

sogaiu commented Jan 5, 2023

I tried replicating what the tree-sitter-cli does to generate grammar.json at the command line.

Assuming node is on one's PATH, do the following to prepare:

cd ~/src/
git clone https://github.com/tree-sitter/tree-sitter
git clone https://github.com/sogaiu/tree-sitter-clojure
cd tree-sitter-clojure

The following is roughly what these lines in tree-sitter-cli's source do:

export TREE_SITTER_GRAMMAR_PATH=./grammar.js
cat ../tree-sitter/cli/src/generate/dsl.js | node > grammar.json

Looks like an extra newline is thrown in at the end too.

@sogaiu
Copy link
Owner

sogaiu commented Feb 27, 2023

In 3daa97f (on the dev branch), *ependencies properties were removed from package.json and from 12fcfb9 (also on the dev branch), I used 0.20.7 to generate parser.c and friends.

As mentioned in #45, I intend to make it clear what version of the tree-sitter cli is used (as well as the precise invocation perhaps -- e.g. uses --abi 13) via some other means. Perhaps it can be encoded in a babashka task.

@sogaiu sogaiu added the candidate-on-dev The dev branch contains code to address label Mar 1, 2023
@sogaiu
Copy link
Owner

sogaiu commented Mar 1, 2023

I've labeled this PR with the candidate-on-dev label.

Once we transfer the appropriate changes from the dev branch having to do with expressing our use of tree-sitter 0.20.7 (or possibly a specific later commit) for generating the content within src, we will have made changes that are similar in spirit to this PR.

Note though that addressing #17 (which I believe was the point of the original PR) may be an ongoing issue as one of the main causes is the evolution of Emscripten -- which is still continuing.

For getting the web-ui / playground subcommand to work properly (actually the build-wasm subcommand) with 0.20.7, I had success with Emscripten 2.0.24. If a somewhat later version (currently unreleased) of tree-sitter is used, it's possible to use Emscripten 3.1.29.

I've written up more on this topic here.

@sogaiu
Copy link
Owner

sogaiu commented Mar 13, 2023

I intend to make it clear what version of the tree-sitter cli is used (as well as the precise invocation perhaps -- e.g. uses --abi 13) via some other means. Perhaps it can be encoded in a babashka task.

ATM, there is an invocation in another repository in these lines.

@sogaiu
Copy link
Owner

sogaiu commented May 8, 2023

@NoahTheDuke Thanks for opening this PR and the subsequent discussion. I think it was a significant factor in us arriving at an overall improvement 👍

Now that v0.0.12 (which includes some changes [1] that used tree-sitter v0.20.7 for generating parser source) has been released, may be it's ok to close this PR?


[1] Among others, this one: 12fcfb9

@sogaiu sogaiu removed the candidate-on-dev The dev branch contains code to address label May 8, 2023
@sogaiu
Copy link
Owner

sogaiu commented May 8, 2023

Ok, I'm going to take the reactions as an ok to close :)

Thanks again!

@sogaiu sogaiu closed this May 8, 2023
@NoahTheDuke NoahTheDuke deleted the nb/update-tree-sitter-cli branch May 8, 2023 12:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

tree-sitter web-ui doesn't show parse tree in right pane
4 participants