Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Which files and directories are maintained and important is unclear #38

Closed
sogaiu opened this issue Jan 28, 2023 · 12 comments
Closed

Which files and directories are maintained and important is unclear #38

sogaiu opened this issue Jan 28, 2023 · 12 comments

Comments

@sogaiu
Copy link
Owner

sogaiu commented Jan 28, 2023

Due to ignorance and history, the repository may contain files and/or directories that are not being maintained and/or tested.

Probably it's a good idea to work on spelling out more clearly which files and directories there are, whether they are relevant, and why.

When development started, there were no users of the grammar and lack of sufficient understanding on the part of yours truly resulted in various files / directories being committed to the repository.

Now though, I think there are some users and I know a bit better, so I'll work on spelling things out.

Perhaps I'll try to keep a summary in this first post.


What Is Likely To Be Kept

Currently, at least the following files / directories are important for some existing users and/or for proper functioning of the tree-sitter cli:

  • corpus
  • grammar.js
  • package.json
  • queries/highlights.scm
  • src/parser.c

corpus contains test files. Although I think these are not adequate for testing, they can be helpful for light consistency checks as well as for becoming familiar with parse results. I am thinking to move it though (test/corpus to be precise).

grammar.js and package.json are necessary for the proper functioning of some of the subcommands of the tree-sitter cli. package.json is consulted for at least the bit that looks like this:

  "tree-sitter": [
    {
      "scope": "source.clojure",
      "file-types": [
        "bb",
        "clj",
        "cljc",
        "cljs"
      ]
    }
  ]

IIUC vscode-parse-tree retrieves this repository using yarn and thus package.json must exist and have at least this bit:

  "name": "tree-sitter-clojure",
  "version": "0.0.11",

According to some docs:

If you plan to publish your package, the most important things in your package.json are the name and version fields as they will be required. The name and version together form an identifier that is assumed to be completely unique. Changes to the package should come along with changes to the version. If you don't plan to publish your package, the name and version fields are optional.

Not sure what "publish" means here, but when I tried to replicate vscode-parse-tree's build steps, leaving out the fields name and version resulted in failure.

src/parser.c can be generated using an appropriate version of the tree-sitter cli (possibly with options like --abi). However, Emacs 29+ and nvim-treesitter use src/parser.c for compiling a dynamic library for their own uses. ATM, many grammar repositories provide this file (though that might change in the future -- see "Mergeable Git Repos" at tree-sitter/tree-sitter#930).

queries/highlights.scm is used by the difftastic project.

Things I'm Thinking Of Removing

  • binding.gyp
  • bindings
  • Cargo.toml
  • package-lock.json
  • parts of package.json -- e.g. dependencies and devDependencies

binding.gyp is related to Node.js bindings which I do not use nor test. I have not heard of anyone using those bindings and this file is generated by the tree-sitter cli anyway, so if someone needs it, they can create it.

bindings contains Node.js and Rust bindings which I do not use nor test. I have not heard of anyone using these bindings and this directory and its content are generated by the tree-sitter cli anyway, so... :)

Cargo.toml is for the Rust bindings which...see above :)

package-lock.json is generated by package.json, but is optional and is used by npm when installing. As I'm leaning in the direction of not using npm for tree-sitter-clojure development, this file is a candidate for removal.

There are parts of package.json which do not need to exist if npm is not being used for development purposes. Some such parts include:

  "dependencies": {
    "nan": "2.14.2"
  },
  "devDependencies": {
    "tree-sitter-cli": "0.19.3"
  },

The following bits may also be removed:

  "description": "Clojure grammar for tree-sitter",
  "main": "bindings/node",
  "scripts": {
    "build": "npx tree-sitter generate && npx node-gyp build",
    "test": "npx tree-sitter test"
  },
  "author": "",
  "license": "",

Regarding main, I found this:

main

The main field is a module ID that is the primary entry point to your program. That is, if your package is named foo, and a user installs it, and then does require("foo"), then your main module's exports object will be returned.

This should be a module ID relative to the root of your package folder.

For most modules, it makes the most sense to have a main script and often not much else.

In our case it seems irrelevant and testing with it removed seemed fine (at least when locally trying for vscode-parse-tree's case).

@dannyfreeman
Copy link
Collaborator

If the bindings haven't caused any problems while maintaining this package, is it worth removing them and potentially breaking something? There could be someone out there using the rust or node bindings for this

@sogaiu
Copy link
Owner Author

sogaiu commented Jan 29, 2023

I think if we leave them in we should state that we don't test them and won't necessarily offer support for them. I think it's misleading to have them sit here with no statement.

I think the Rust bindings may be used by lapce, but I think they depend on a fork.

Hmm, may be they can look after the Rust bindings...

@sogaiu
Copy link
Owner Author

sogaiu commented Jan 29, 2023

Just a technical note: I think it's slightly messy to keep just the Rust bindings.

IIUC, it's easy to get both the Node.js and Rust bindings by just doing tree-sitter generate [1].

To look after the Rust bindings only, one would remove the Node.js bindings which I think involves getting rid of:

  • binding.gyp
  • bindings/node/*

tree-sitter generate also modifies package.json to add:

"main": "bindings/node"

if it's not there. So that would need to be removed if it got added.

To do things properly I think one would then look after the existing or generated Cargo.toml.

To not get either set of bindings, from tree-sitter 0.19.4, I believe one can do tree-sitter generate --no-bindings.

Arranging for removal of just the Node.js bindings could probably be automated and possibly the relevant files and directories could be added to .gitignore.

It might be possible to generate both sets of bindings once, remove the Node.js binding info and then just look after the Rust things, ever after using --no-bindings. Not sure how well that will work in practice.

I wonder whether binding removal is being considered along with generated source removal as mentioned here (search for "Mergeable Git Repos").


[1] There is also the option --abi for the generate subcommand. IIUC, nvim-treesitter appears to regenerate source files (such as src/parser.c) if necessary to get matching (or compatible) ABI numbers across many grammars -- though that would require a user of neovim have the tree-sitter cli installed.

In theory, other programs might do the same at some point...

@sogaiu
Copy link
Owner Author

sogaiu commented Feb 27, 2023

For the moment I've:

  • removed package-lock.json in a378c27
  • removed the author and license properties from package.json in 08b9e5f
  • removed the scripts property from package.json in 3f8dff3
  • removed the *ependencies properties from package.json in 3daa97f
  • moved corpus to test/corpus in df16f93

@NoahTheDuke
Copy link

So the only reason to keep package.json around is for the tree-sitter entry?

@sogaiu
Copy link
Owner Author

sogaiu commented Feb 27, 2023

I think there are a few more reasons:

  1. vscode-parse-tree (part of cursorless) retrieves this repository using yarn and thus package.json must exist and have at least this bit:

      "name": "tree-sitter-clojure",
      "version": "0.0.11",

    Removal of those parts resulted in failures, but no more than those two properties were necessary.

  2. I believe the tree-sitter cli is hard-wired to look for this file and will fail to execute certain subcommands if the file does not exist [1]. I don't have the details handy but I survived on just pretty much this bit (with a different version number):

    {
      "name": "tree-sitter-clojure",
      "version": "0.0.11",
      "tree-sitter": [
        {
          "scope": "source.clojure",
          "file-types": [
            "bb",
            "clj",
            "cljc",
            "cljs"
          ]
        }
      ]
    }
    

    for quite some time with no incident while experimenting with reorganization in this branch -- the package.json there has similar content to what I put above.


[1] I think I saw this happen at the command line while testing and one can also see references for checking from package.json in this file -- search for package.json if interested :)

@NoahTheDuke
Copy link

Cool, I'm glad they're not doing any additional introspection.

@sogaiu sogaiu added the candidate-on-dev The dev branch contains code to address label Mar 1, 2023
@sogaiu
Copy link
Owner Author

sogaiu commented Mar 1, 2023

As part of the work on #47, I put together some documentation that is supposed to make it clear(er) what files and directories are provided by the repository and why.

Also included is information about what I consider less stable and subject to change and removal.

@sogaiu
Copy link
Owner Author

sogaiu commented Mar 2, 2023

Regarding whether the tree-sitter cli requires the use of package.json to function, I've gone back to check the latest source and I also tested 0.20.7.

The tldr is that package.json may no longer be required for operation (at least not in the way we have used it so far).

I have seen some commits like CLI: Determine language symbol from grammar, not package.json (though that particular one is pretty old) or Allow dynamically loading grammars w/ no package.json file, so perhaps there's been an effort to reduce dependence on package.json over time.


At the end of this comment is a rough pseudo "call-graph"-ish collection of info.

The first bit covers subcommand entry points in cli/src/main.rs and which of them call find_all_languages, languages_at_path, and/or select_language. Essentially all of the subcommands we might care about end up calling one or multiple of these.

These three functions are defined in cli/loader/src/lib.rs and they all can eventually end up leading to execution of find_language_configurations_at_path (also defined in the same file). It's this last function that reads package.json.

Though it reads package.json, it looks like it won't fail if there's no package.json.

Interestingly, after that bit of code, there's a bit that does some processing if a grammar.json is found.

My impression from looking at this code (see below for a brief summary based on testing) is that package.json is not required for the tree-sitter cli subcommands we've been using (well, at least with the options / flags we've chosen).

Does it matter if package.json / grammar.json exist? Looking at this part of the code, I get the sense that some subcommands that might be affected are highlight and tags. Neither of these are things we've used, and for the moment, they don't seem relevant.


Note that the generate subcommand can lead to the creation of package.json. To avoid that, one can use --no-bindings, which seems to have been added here.

I moved package.json out of the way and tried various tree-sitter subcommand invocations (dump-languages, generate, parse, query, and test), even changing the extensions of files when trying parse. I didn't see any errors or broken functionality.


So it looks like having package.json allows cursorless to continue to use us, but we may not need it for our own use.


src/main.rs:

dump-languages
  find_all_languages

generate --build (generate on its own may create package.json)
  languages_at_path

highlight
  find_all_languages

parse
  select_language, find_all_languages

query
  select_language, find_all_languages

tags
  find_all_languages

test
  languages_at_path

---

loader/src/lib.rs:

select_language
  languages_at_path

languages_at_path, find_all_languages
  find_language_configurations_at_path <- interacts with package.json directly

@sogaiu
Copy link
Owner Author

sogaiu commented Mar 2, 2023

I've updated some of the docs on the dev branch to incorporate some of the information in the comment above in 701c948 and 7307282.

@sogaiu
Copy link
Owner Author

sogaiu commented Mar 12, 2023

Commits relevant to this issue can now be seen at:

  • d38aafd Remove package-lock.json
  • 566cdff Further package.json trimming
  • bf2b6b5 Remove scripts from package.json
  • 44ed28f Remove *ependencies from package.json
  • c3db36b Move corpus to test/corpus
  • 3ccbb02 Update README and adapt docs
  • 0cc8a6e Expand binding info and adjust package.json

@sogaiu
Copy link
Owner Author

sogaiu commented May 8, 2023

This document attempts to cover our thinking around the time of the v0.0.12 release.

@sogaiu sogaiu closed this as completed May 8, 2023
@sogaiu sogaiu removed the candidate-on-dev The dev branch contains code to address label May 8, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants