Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fuzzer improvements (JIT, option entropy) #317

Merged
merged 2 commits into from
Nov 9, 2023

Conversation

addisoncrump
Copy link
Contributor

This PR makes two changes to the PCRE2 fuzzer. Namely, it uses part of the input to determine the selection of the options, allowing for a greater selection of options (historically, it was restricted to 256 combinations of options -- significantly lower than potential) and adds an extra step in which JIT matches are compared to non-JIT matches as a simple differential oracle allowing for the detection of JIT miscompilation or incorrectness in either implementation.

Caveats:

  • These changes change the input shape of the fuzzer, now using the first 8 bytes of input to determine the options and the rest as arbitrary input. This means that the existing corpus will be less useful, but will likely be back at coverage saturation in very little time by Clusterfuzz.
  • The crashes which occur as a result of a differential crash (i.e. the aborts in comparisons between JIT and non-JIT) will always have the same crash site. As a result, they will be put under a single issue in Monorail. This is unfortunately unavoidable, but changes to these code regions per-commit are likely small enough to be deduplicated to specific changes quite easily by bisecting.

These strategies have been shown to be effective in rapidly finding issues in rust-lang/regex in the presence of various Regex optimisation strategies. If there are other scenarios in which we can compare the results of two different sets of options consistently (i.e. they should always be the same result, or one/both should error out and we skip the compare) then those should also be added.

I have run this fuzzer with the most up-to-date Clusterfuzz corpus and found no crashes at this time, but this may uncover future issues quite rapidly.

@PhilipHazel
Copy link
Collaborator

Because of your JIT comparisons, I don't think I can merge this just at the moment, because there have been some changes to the interpreter that have not yet been implemented in JIT. The expectation is that the updated handling of \w etc. will make it into JIT, but the new support for variable-length lookbehinds may lag for some time - this won't matter in a release because it involves an opcode that is not recognized by JIT, so JIT compilation will fail and matching should automatically fall back to the interpreter.
(At the moment, the tests for the new behaviour of \w etc. have the "no_jit" option.)

@addisoncrump
Copy link
Contributor Author

How's this? I skip the comparison if the compilation failed, and I wrapped everything in #ifdef SUPPORT_JIT. In cases where parity isn't present, this should just skip the comparison.

@PhilipHazel
Copy link
Collaborator

Yes, that's good for the new feature that JIT doesn't yet support. However, we still have to wait for JIT to catch up with the \w changes as otherwise the comparisons will fail. I am hoping these changes will happen fairly soon because we can't put out a new release until they are done.

@addisoncrump
Copy link
Contributor Author

Understood. I think I need to change CI + upstream OSS-Fuzz configuration (currently it doesn't --enable-jit) before this lands anyways.

@zherczeg
Copy link
Collaborator

zherczeg commented Nov 7, 2023

I think the jit simply falls back to interpreter for the new opcode. So it will be an interpreter-interpreter comparison.

@PhilipHazel PhilipHazel merged commit bec9bf3 into PCRE2Project:master Nov 9, 2023
8 checks passed
@addisoncrump
Copy link
Contributor Author

@PhilipHazel Thanks for merging. Can you approve the related PR in OSS-Fuzz?

@PhilipHazel
Copy link
Collaborator

As the JIT is now updated, I have merged this patch. The compiler now complains about two unused variables: ovector_count and errorcode_jit. I will take a look and remove them if it's obvious they are redundant. Then wait to see what ClusterFuzz does.....

@addisoncrump
Copy link
Contributor Author

The compiler now complains about two unused variables: ovector_count and errorcode_jit.

Ah, these variables are declared even when --enable-jit is not specified. That's a pretty quick fix (just wrapping them in #ifdef SUPPORT_JIT).

@PhilipHazel
Copy link
Collaborator

Will do. I don't think I have any status to approve anything in OSS-Fuzz do I?

@addisoncrump
Copy link
Contributor Author

In the OSS-Fuzz issue, it lists you as one of the potential reviewers. Not sure if that's accurate though.

@PhilipHazel
Copy link
Collaborator

I didn't know that; I have not had many dealings with OSS-Fuzz; I just support pcre2_fuzzsupport. I will take a look.

@PhilipHazel
Copy link
Collaborator

I seem to have managed to approve.

@PhilipHazel
Copy link
Collaborator

Now that I look at the code in pcre2_fuzzsupport.c, I see a problem. It doesn't #include config.h and so SUPPORT_JIT will never be set! I don't suppose there's any reason why it shouldn't #include config.h, though. However, I have to finish for the day now.

@addisoncrump
Copy link
Contributor Author

😬 Well, that's a dumb mistake on my part. I'll fix that and push changes, mark OSS-Fuzz PR as draft until then.

DavidKorczynski pushed a commit to google/oss-fuzz that referenced this pull request Feb 21, 2024
This corresponds to an issue on PCRE2 regarding fuzzer improvements
w.r.t. JIT: PCRE2Project/pcre2#317
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants