Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: add -sanitize flag to CLI #3186

Open
ahelwer opened this issue Mar 17, 2024 · 2 comments
Open

Feature request: add -sanitize flag to CLI #3186

ahelwer opened this issue Mar 17, 2024 · 2 comments
Labels
cli enhancement Feature request

Comments

@ahelwer
Copy link
Contributor

ahelwer commented Mar 17, 2024

Problem

Modern sanitizers have done a lot to make C/C++ development tolerable. Especially with the requirement to move all external scanners to C, I believe it would be incredibly useful to add a -sanitize flag to the tree-sitter CLI which compiles & links all parser/scanner code against every sanitizer under the sun then uses that binary to run the subcommand (most usefully parse and test). This is certain to unmask various memory leaks and undefined behavior existing in external scanners, as I can tell you from personal experience.

Expected behavior

I use sanitizers to test the grammar I develop, which has a fairly complicated external scanner. In lieu something like the proposed -sanitize flag I instead created a small C++ shim (adapted from the fuzzer shim) that just loads a single file into the tree-sitter C library (itself also specially compiled with sanitizers) then parses it. You can see the build script & shim here, which requires the tree-sitter repo be present as a git submodule.

I believe the -sanitize flag would unlock these powerful C/C++ development tools for people who don't want to futz around with compiling & linking things themselves. In particular, it would let people run their own test corpus instead of being restricted to parsing a single file like I am with my simple C++ shim.

There are a number of difficulties I foresee when implementing this feature, because I have experienced them myself - but hopefully someone here has a much higher frustration threshold for debugging cross-platform C/C++ linking issues than I do, and can overcome them:

  1. The CLI is a rust program loading native binaries, and I had trouble successfully loading address sanitizers from it when I tried; see this stackoverflow question I made
  2. There is a fair amount of cross-platform inconsistency with sanitizers. My setup only works on linux, but for some reason sometimes segfaults on the github Linux CI machines (see the logs of the Corpus Tests (Linux) step here); it also possibly(?) works on macOS although errors about being unable to find certain library symbols are also spit out. Allegedly Windows has sanitizers in MSVC but I've never even tried to use them.
  3. Ideally people would be able to step through their scanner in a debugger, but I don't know how well gdb handles the transition from the CLI into the C/C++ code (never tried).

There is a fair amount of risk this feature would generate a lot of "doesn't work on my machine" type bugs but also its value is undeniable, especially if everything is transitioning to C.

@ahelwer ahelwer added the enhancement Feature request label Mar 17, 2024
@amaanq
Copy link
Member

amaanq commented Mar 18, 2024

Yeah I was thinking about adding a --fuzz/--harden flag that encompasses this and fuzzing edits for random mutations which can find (and has found some before!) hidden bugs when we have a properly structured tree to work with instead of random bytes fed in with libfuzzer

@ObserverOfTime
Copy link
Member

In the meantime this exists and can also be used locally.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cli enhancement Feature request
Projects
None yet
Development

No branches or pull requests

3 participants