Feedback on fuzzer benchmarking setup #103

wuestholz · 2023-03-23T13:54:56Z

I'm trying to compare Hybrid-Echidna with Echidna and the Forge fuzzer on several benchmark contracts.

To make the comparison as fair as possible, I've created a benchmark generator that automatically generates challenging contracts. The benchmarks intentionally use a limited subset of Solidity to avoid language features that could be handled differently by different tools. Each contract contains ~50 assertions (some can fail, but others cannot due to infeasible path conditions). (If you're curious, you can find one of the benchmarks here. The benchmark-generation approach is inspired by the Fuzzle benchmark generator for C-based fuzzers.) To find the assertions that can fail, a fuzzer needs to generate up to ~15 transactions and satisfy some input constraints for each transaction.

Since I'm not deeply familiar with Hybrid-Echidna I'd like to check if there are any potential issues with my benchmark setup before sharing results.

Since Hybrid-Echidna does not support limiting the execution time (see issue at #101), I'm repeatedly running the fuzzer for shorter periods until the time limit for all fuzzers (for instance, 1 hour for each contract). For each of these shorter fuzzing campaigns I'm using the following settings that deviate from the defaults:

seq-len: 100 (instead of 10)
test-limit: 50000 for the first short campaign and 500 for all subsequent ones (instead of 50000)
max-iters: 3 for the first short campaign and 1 for all subsequent ones (instead of leaving the option unspecified)
no-incremental: false for the first short campaign and true for all subsequent ones (instead of false)
codeSize (Echidna): 0xc00000 (instead of 0x6000)

I increased seq-len to 100 since some assertions may require up to ~15 transactions, and some generated transactions may fail. Echidna uses 100 by default.

I observed very high memory consumption when leaving max-iters unspecified or using larger values. For this reason, I bound the number of iterations.

I reduced test-limit to make sure that the short campaigns terminate reasonably fast. I also observed increased memory consumption for larger values.

I enable no-incremental for subsequent short campaigns since the first campaign will have already performed incremental seeding once.

I also increased the codeSize setting to handle larger contracts, if necessary. Currently, all benchmark contracts are below the EVM limit when using the solc optimizer (0.8.19).

Somewhat surprisingly, Echidna performs much better on these benchmarks than Hybrid-Echidna. It would be great to understand why. For instance, I tried setting a solver timeout. However, this did not have a noticable effect on the fuzzing performance.

Please let me know if you see any potential issues with this setup.

The text was updated successfully, but these errors were encountered:

ggrieco-tob · 2023-03-23T14:14:52Z

Hybrid-Echidna is a very experimental tool. I'm afraid we don't have enough information to fully guide your research. However, it is clear that since optik relies on SMT solvers (which can fail or timeout with some queries), then reducing the queries complexity is needed. In particular, the sequence of transaction length should be reduced (100 is extremely large can result of very high memory consumption).

We haven't experimented enough with optik, but I personally think that using hybrid-echidna should be used first with a small numbers of consecutive transactions (e.g. seqLen equal to 10 or 5). This should help to explore "difficult" inputs. Once this scenario is fully explored, the corpus should be provided to Echidna for a larger campaign (e.g using seqLen as 100). Keep in mind that once you do this, you cannot use optik again. Sadly, we do not have anything "automatic" developed for this particular case. Happy to discuss your results using this tool.

wuestholz · 2023-03-23T15:51:23Z

@ggrieco-tob Thanks a lot!

I used a smaller sequence length (30) before, but the increase to 100 didn't negatively affect the performance. Is the size of the SMT constraints proportional to the sequence length? I somehow assumed that only a few transactions would be executed with symbolic inputs to prevent a blowup.

For instance, one way to interpret your suggestion is to only make the first K (e.g., 5 or 10) transactions symbolic. In that case, a longer sequence list with lots of concrete transactions should not hurt. Have you considered this?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feedback on fuzzer benchmarking setup #103

Feedback on fuzzer benchmarking setup #103

wuestholz commented Mar 23, 2023

ggrieco-tob commented Mar 23, 2023

wuestholz commented Mar 23, 2023

Feedback on fuzzer benchmarking setup #103

Feedback on fuzzer benchmarking setup #103

Comments

wuestholz commented Mar 23, 2023

ggrieco-tob commented Mar 23, 2023

wuestholz commented Mar 23, 2023