Description
Recently we've added the wasmtime
project to OSS fuzz and we've gotten a lot of bugs (yay!). Some of the bugs are related to timeouts and when investigating locally I was surprised that running un-fuzzed code was orders of magnitude faster than the fuzzed code so it seemed that simply fuzzing caused code to get so slow it would time out (but we saw reasonable performance on the same tests outside of fuzzing).
One big chunk of the slowdown is in #215, but it looks like the various sanitizer coverage options are also a source of quite a large slowdown.
I ran some tests locally and was executing a command like so:
$ cargo +nightly run --release --target x86_64-unknown-linux-gnu --bin differential ./the-test-file
Basically I was curious to just run the fuzzer binary on one particular test case, timing how long it took. I varied RUSTFLAGS
according to below and got the following timings. Note that each of these has -Ccodegen-units=1
to account for #215.
RUSTFLAGS |
test time |
---|---|
(empty) | 89ms |
sancov level 1 | 385ms |
sancov level 4 | 428ms |
sancov level 4 + -sanitizer-coverage-pc-table |
353ms |
sancov level 4 + -sanitizer-coverage-prune-blocks=0 |
506ms |
sancov level 4 + -sanitizer-coverage-trace-geps |
1708ms |
sancov level 4 + -sanitizer-coverage-trace-compares |
2327ms |
sancov level 4 + all options | 4113ms |
Note there that "sancov level 1" corresponds to -Cpasses=sancov -Cllvm-args=-sanitizer-coverage-level=1 -Cllvm-args=-sanitizer-coverage-inline-8bit-counters
and "sancov level 4" correspond to level=4
there. The default in cargo fuzz
right now is sancov level 4. Also note that the last row above, "sancov level 4 + all options", is the default for cargo fuzz
today.
What I largely want to highlight here is that cranking up the sancov level and options causes the slowdown of a binary to get quite large to the point that a timeout on a fuzzer may not be too relevant if it's executing up to 50x slower than it would otherwise. This leads me to a few questions:
- What are these sancov options and why are they passed by default? Do they improve libfuzzer's discovery of test cases?
- Why are all these options passed? Or, in other words, how was this list of options selected to get passed? Are we matching
clang
or just trying to pass as many options as possible? - Would it make sense to make these options configurable? If they were configurable is there a reasonable subset that should be used all the time, or are there some projects that only want to use some?
- How beneficial is each option individually? It looks like options like
trace-compares
are extremely expensive, similar withtrace-geps
. Should those be turned off by default for everything other than "let's create a big corpus" runs or something like that?
I saw that many of these options were tweaked in 7a31745, and I'd be curious to see how the options in cargo fuzz
compare to whatever clang
defaults to nowadays, for example.