|
| 1 | +# Hopper |
| 2 | + |
| 3 | +Hopper is an tool for generating fuzzing test cases for libraries automatically using **interpretative fuzzing**. It transforms the problem of library fuzzing into the problem of interpreter fuzzing, enabling exploration of a vast range of API usages for library fuzzing out of the box. |
| 4 | +Some key features of Hopper include: |
| 5 | +- Interpretative API invoking without any fuzz driver. |
| 6 | +- Type-aware mutation for arguments. |
| 7 | +- Automiac intra- and inter-API constraints leanring. |
| 8 | +- Binary instrumentation support. |
| 9 | + |
| 10 | +To learn more about Hopper, check out our [paper](https://arxiv.org/pdf/2309.03496) at CCS '23. |
| 11 | + |
| 12 | +## Build Hopper |
| 13 | +### Build Requirements |
| 14 | +- Linux-amd64 (Tested on Ubuntu 20.04 and Debian Buster) |
| 15 | +- Rust stable (>= 1.60), can be obtained using [rustup](https://rustup.rs/) |
| 16 | +- Clang (>= 5.0, [Install Clang](https://rust-lang.github.io/rust-bindgen/requirements.html)), [rust-bindgen](https://rust-lang.github.io/rust-bindgen/) leverages libclang to preprocess, parse, and type check C and C++ header files. |
| 17 | + |
| 18 | +### Build Hopper itself |
| 19 | +```sh |
| 20 | +./build.sh |
| 21 | +``` |
| 22 | + |
| 23 | +The script will create a `install` directory in hopper's root directory, then you can use `hopper`. |
| 24 | + |
| 25 | +### Using Docker |
| 26 | +You can choose to use the Dockerfile, which build the requirements and Hopper. |
| 27 | +``` |
| 28 | +docker build -t hopper ./ |
| 29 | +docker run --name hopper_dev --privileged -v /path-to-lib:/fuzz -it --rm hopper /bin/bash |
| 30 | +``` |
| 31 | + |
| 32 | +## Compile library with Hopper |
| 33 | +Take `csjon` for example ([More examples](./examples/)). |
| 34 | +```sh |
| 35 | +hopper compile --header ./cJSON.h --library ./libcjson.so --output output |
| 36 | +``` |
| 37 | + |
| 38 | +Use `hopper compile --help` to see detailed usage. If the compiling reports errors about header file, refer to the usage of [rust-bindgen](https://rust-lang.github.io/rust-bindgen/), which we used for parsing header file. |
| 39 | +You may wrap the header file with the missing definitions. |
| 40 | +Hopper uses [E9Patch](https://github.com/GJDuck/e9patch) to instrument binaries by default. |
| 41 | + |
| 42 | +After running `compile`, you will find that it generates the following files in the output directory: |
| 43 | +- `bin/hopper-fuzzer`: generates inputs, maintatins states, and use `harness` to excuted the inputs. |
| 44 | +- `bin/hopper-harness`: executes the inputs. |
| 45 | +- `bin/hopper-translate`: translates inputs to C source code. |
| 46 | +- `bin/hopper-generator`: replays the generate process. |
| 47 | +- `bin/hopper-sanitizer`: sanitize and minimize crashes. |
| 48 | + |
| 49 | +#### Header files |
| 50 | +- If there are multiple header files, you can crate a new header file, and *include* all of them. |
| 51 | +- If header files are compiled depending on specific envoironment variables. You can set it by : `BINDGEN_EXTRA_CLANG_ARGS`. |
| 52 | +- If the header file includes API functions that you do not want to test, use `--func-pattern` to filter them while running the fuzzer. |
| 53 | + |
| 54 | +#### Environment variable for compiling |
| 55 | +- `HOPPER_MAP_SIZE_POW2`: controls the size of coverage path. The defult value is 16, and it should be in the range of [16, 20]. e.g. `HOPPER_MAP_SIZE_POW2=18`. |
| 56 | +- `HOPPER_INST_RATIO`: controls how likely a block will be chosen for instrumentation. The default value is 100, and it should be in the range of (0, 100]. e.g. `HOPPER_INST_RATIO=75`. |
| 57 | +- `HOPPER_INCLUDE_SEARCH_PATH`: includes the search path of file in header files. e.g. `HOPPER_INCLUDE_SEARCH_PATH=../`. |
| 58 | +- `HOPPER_FUNC_BLACKLIST`: includes function blacklists that hopper won't compile. `bindgen` will not generate code for the functions. e.g. `HOPPER_FUNC_BLACKLIST=f1,f2`. |
| 59 | +- `HOPPER_TYPE_BLACKLIST`: includes type blacklists that hopper won't compile. `bindgen` will not generate code for the types. e.g. `HOPPER_TYPE_BLACKLIST=type1,type2`. |
| 60 | +- `HOPPER_ITEM_BLACKLIST`: includes item(constants/variables) blacklists that hopper won't compile. `bindgen` will not generate code for the items. e.g. `HOPPER_ITEM_BLACKLIST=IPPORT_RESERVED` |
| 61 | +- `HOPPER_CUSTOM_OPAQUE_LIST`: includes custom opaque types we defined. e.g. `HOPPER_CUSTOM_OPAQUE_LIST=type1`. |
| 62 | + |
| 63 | +#### Tips |
| 64 | +- You can set the arguments and environment variables for compiling and running in a configuration file named `hopper.config`, see `examples/*` for details. |
| 65 | + |
| 66 | +- Reduce density: If density is larger than 20%, the IDs of edges is likely to have hash-collisions. We can a) increase `HOPPER_MAP_SIZE_POW2` or b) reduce `HOPPER_INST_RATIO`. |
| 67 | + |
| 68 | +- Multiple libraries: (1) merge the archives into one shared library, e.g. `gcc -shared -o c.so -Wl,--whole-archive a.a b.a -Wl,--no-whole-archive`; (2) pass all of them into hopper compiler by `--library a.so b.so`. |
| 69 | + |
| 70 | +## Fuzz Library with Hopper |
| 71 | + |
| 72 | +``` |
| 73 | +hopper fuzz output --func-pattern cJSON_* |
| 74 | +``` |
| 75 | + |
| 76 | +Use `hopper fuzz output --help` to see detailed usage. |
| 77 | + |
| 78 | +After running `fuzz`, it will generate following directories. |
| 79 | +- `queue`: generated normal inputs. |
| 80 | +- `hangs`: generated timeout inputs. |
| 81 | +- `crashes`: generated crash inputs. |
| 82 | +- `misc`: store some temporal files or stats. |
| 83 | + |
| 84 | +#### Environment variable for running |
| 85 | +- `DISABLE_CALL_DET`: disables call's deterministic mutating. |
| 86 | +- `DISABLE_GEN_FAIL`: disables generating programs for functions that have been failed to invoke. |
| 87 | +- `HOPPER_SEED_DIR`: provides seeds for byte-like arguments (default: output/seeds if t exists). |
| 88 | +- `HOPPER_DICT`: provides dictionary for byte-like arguments. The grammar is the same as AFL's. |
| 89 | +- `HOPPER_API_INSENSITIVE_COV`: disables API-sensitive branch counting. |
| 90 | +- `HOPPER_FAST_EXECUTE_LOOP`: number of programs excuted (in a loop) for each fork, set as 0 or 1 to break the loop. e.g. `HOPPER_FAST_EXECUTE_LOOP=10`. |
| 91 | + |
| 92 | +#### System configuration |
| 93 | +Set system core dumps as AFL (on the host if you execute Hopper in a Docker container). |
| 94 | +``` |
| 95 | +echo core | sudo tee /proc/sys/kernel/core_pattern |
| 96 | +``` |
| 97 | + |
| 98 | +### Function pattern |
| 99 | +Hopper generates inputs for all functions in libiries by default. However, there are two ways to filter functions in Hopper: exlucding functions or including functions. This way, it can be focus on intersting functions. |
| 100 | + |
| 101 | +#### `--func-pattern` |
| 102 | +``` |
| 103 | +hopper fuzz output --func-pattern @cJSON_parse,!cJSON_InitHook,cJSON_* |
| 104 | +``` |
| 105 | + - The pattern can be a function name, e.g. `cJSON_parse`, or a simple pattern, e.g. `cJSON_*`. |
| 106 | + - If you have multiple patterns, use `,` to join them, e.g `cJSON_*,HTTP_*`. |
| 107 | + - You can use `@` prefix to limit the fuzzer to only fuzz specific function, while the others can be candidates that provding values for fields or arguments, e.g. `@cJSON_parse,cJSON_*`. |
| 108 | + - `!` is used as prefix for excluding some specific functions, e.g `!cJSON_InitHook,cJSON_*`. |
| 109 | + |
| 110 | +#### `--custom-rules` |
| 111 | +The patterns can be defined in the file passed by `--custom-rules`. |
| 112 | + |
| 113 | +```rust |
| 114 | +// hopper fuzz output --custom-rules path-to-file |
| 115 | +func_target cJSON_parse |
| 116 | +func_exclude cJSON_InitHook |
| 117 | +func_include cJSON_*,HTTP_* |
| 118 | +``` |
| 119 | + |
| 120 | +### Constraints |
| 121 | +Hopper infers both intra- and inter-API constraints to invoking the APIs correctlly. |
| 122 | +The constraints are written in `output/misc/constraint.config`. You can remove the file to reset the constraints. |
| 123 | +Addtionally, users can defined a file that describe custom constraints for API invocations, which passed by `--custom-rules`. The constraints will override the infered ones. |
| 124 | +```java |
| 125 | +// hopper fuzz output --custom-rules path-to-file |
| 126 | +// Grammar: |
| 127 | +// func, type : prefix for adding a rule for function or type |
| 128 | +// $[0-9]+ : function's i-th argument, or index in array |
| 129 | +// [a-zA-Z_]+ : object field |
| 130 | +// 0, 128 .. : integer constants |
| 131 | +// "xxxx" : string constants |
| 132 | +// methods : $len, $range, $null, $non_null, $need_init, $read_file, $write_file, $ret_from, $cast_from, $use, $arr_len, $opaque, $len_factors |
| 133 | +// others : pointer(&) , option(?), e.g &.$0.len, `len` field in the pointer's first element |
| 134 | +// |
| 135 | +// Set one argument in a function to be specific constant |
| 136 | +func test_add[$0] = 128 |
| 137 | +// One argument must be the length of another one |
| 138 | +func test_arr[$1] = $len($0) |
| 139 | +// Or one field must be the length of another field |
| 140 | +func test_arr[$0][len] = $len([$0][name]) |
| 141 | +// One argument must be in a certain range |
| 142 | +func test_arr[$1] = $range(0, $len($0)) |
| 143 | +// Argument should be non-null |
| 144 | +func test_non_null[$0] = $non_null |
| 145 | +// Argument should be null |
| 146 | +func test_null[$0] = $null |
| 147 | +// Argument should be specific string |
| 148 | +func test_magic[$0] = "magic" |
| 149 | +// Argument should be a file and the file will be read |
| 150 | +func test_path[$0] = $read_file |
| 151 | +// Argument should be use the value of specific function's return |
| 152 | +func test_use[$0] = $ret_from(test_create) |
| 153 | +// Argument should be specific type for void pointer. The type should start with *mut or *cosnt. |
| 154 | +func test_void[$0] = $cast_from(*mut u8) |
| 155 | +// The array suppose has a minimal array length |
| 156 | +func test_void[$0][&] = $arr_len(256) |
| 157 | +// The array's length is formed by the factors |
| 158 | +func fread[$0][&] = $len_factors(1, $2) |
| 159 | +// Or |
| 160 | +func gzfread[$0][&] = $len_factors($1, $2) |
| 161 | +// Field in argument should be specific constant |
| 162 | +func test_field[$0][len] = 128 |
| 163 | +// Deeper fields |
| 164 | +func test_field[$0][&.elements.$0] = 128 |
| 165 | + |
| 166 | +// One field `len` in a type must be the length of another field `p` |
| 167 | +type ArrayWrap[len] = $len(p) |
| 168 | +// One nested union `inner_union` in a type must be set to `member2` |
| 169 | +type ComplicatedStruct[inner_union] = $use(member2) |
| 170 | +// Type is opaque that used as an opaque pointer |
| 171 | +type Partial = $opaque |
| 172 | +// A type should be init with specific function |
| 173 | +type Partial = $init_with(test_init, 0) |
| 174 | + |
| 175 | +// ctx: set context for specific function |
| 176 | +// Add a context for function |
| 177 | +ctx test_use[$0] <- test_init |
| 178 | +// Add implicit context |
| 179 | +ctx test_use[*] <- test_init |
| 180 | +// Add optional context that prefered to use |
| 181 | +ctx test_use[$0] <- test_init ? |
| 182 | +// Add forbidden context |
| 183 | +ctx test_use[$0] <- ! test_init |
| 184 | + |
| 185 | +// alias: alias types across different function |
| 186 | +alias handleA <- useA($0),createA($ret),freeA($0) |
| 187 | + |
| 188 | +// assert: adding specific assertions for calls |
| 189 | +assert test_one == 1 |
| 190 | +assert test_non_zero != 0 |
| 191 | + |
| 192 | +``` |
| 193 | + |
| 194 | +### Seeds for bytes arguments |
| 195 | +If there is a `seeds` direcotry (Set by `HOPPER_SEED_DIR`), Hopper will try to read files inside it and uses them as the seeds for bytes arguments (e.g. char*). Also, you can indicate the seeds for specific argument via its parameter names, e.g make the subdirectory as `@buf` for parameter whose name is `buf`. |
| 196 | + |
| 197 | +### Logging |
| 198 | +Hopper uses Rust's log crate to print log information. The default log level is `INFO`. If you want to print all logging information (`DEBUG` and `TRACE`), you can set the environment `LOG_TYPE` during running Hopper, e.g. `LOG_TYPE=trace ./hopper`. |
| 199 | +The detailed logging will be written at `output/fuzzer_r*.log` and `output/harness_r*.log`. |
| 200 | + |
| 201 | +### Reproduce execution |
| 202 | +Hopper can reproduce the execution of programs at output directories. |
| 203 | + |
| 204 | +- `hopper-harness` can parse and explain the inputs by Hopper's runtime. It wiil print the internal states during execution in detail. |
| 205 | +``` |
| 206 | +./bin/hopper-harness ./queue/id_000000 |
| 207 | +``` |
| 208 | + |
| 209 | +- `hopper-translate` can translate the input to C source code. The C files can be a witness for reporting issues. |
| 210 | +``` |
| 211 | +./bin/hopper-translate --input ./queue/id_000000 --header path-to/xx.h --output test.c |
| 212 | +# then compile it with specific library |
| 213 | +gcc -I/path-to-head -L/path-to-lib -l:libcjson.so test.c -o test |
| 214 | +``` |
| 215 | + |
| 216 | +- `hopper-generator` is able to replay input generation except execution. You can use it to analyse how the input was generated or mutated. |
| 217 | +``` |
| 218 | +./bin/hopper-generator ./queue/id_000000 |
| 219 | +``` |
| 220 | + |
| 221 | +- `hopper-sanitizer` can minimize and verify the crashes generated by Hopper. It excludes crashes that violate constraints and de-duplicate crashes according to call stacks. |
| 222 | +``` |
| 223 | +./bin/hopper-sanitizer |
| 224 | +``` |
| 225 | + |
| 226 | +## Test |
| 227 | +### Test rust code |
| 228 | +- Run all testcases |
| 229 | +``` |
| 230 | +RUST_BACKTRACE=1 cargo test -- --nocapture |
| 231 | +``` |
| 232 | + |
| 233 | +### Testsuite (test libraties) |
| 234 | +- [How to run and write testuite](./testsuite/README.md) |
| 235 | + |
| 236 | +## Evaluating results via source-based code coverage |
| 237 | +- Compile the libraies' source code with LLVM source-based code sanitizer(https://clang.llvm.org/docs/SourceBasedCodeCoverage.html). You should set the compiling flags, e.g. |
| 238 | + |
| 239 | +``` |
| 240 | +export CFLAGS="${CFLAGS:-} -fprofile-instr-generate -fcoverage-mapping -gline-tables-only -g" |
| 241 | +make |
| 242 | +``` |
| 243 | + |
| 244 | +- Compile the libraries with `cov` instrumentation mode. e.g. |
| 245 | +``` |
| 246 | +hopper compile --instrument cov --header ./cJSON.h --library ./libcjson_cov.so --output output_cov |
| 247 | +``` |
| 248 | + |
| 249 | +- Run the interpreter with all generated seed inputs (SEED_DIR). |
| 250 | +``` |
| 251 | +# run hopper and use llvm-cov to compute the coverage. |
| 252 | +SEED_DIR=./output/queue hopper cov output_cov |
| 253 | +``` |
| 254 | + |
| 255 | +## Contributing guidelines |
| 256 | +We have listed some tasks in [Readmap](https://github.com/FuzzAnything/hopper/discussions/2). |
| 257 | +If you are interested, please feel free to discuss with us and contribute your code. |
| 258 | + |
| 259 | +### Coding |
| 260 | +- *Zero* `cargo check` warnning |
| 261 | +- *Zero* `cargo clippy` warnning |
| 262 | +- *Zero* `FAILED` in `cargo test` |
| 263 | +- *Try* to write tests for your code |
| 264 | + |
| 265 | +### Profiling |
| 266 | +- [Profiling Rust Applications](https://gist.github.com/KodrAus/97c92c07a90b1fdd6853654357fd557a) |
| 267 | +- [Inferno](https://github.com/jonhoo/inferno) |
| 268 | + |
| 269 | +```bash |
| 270 | +perf record --call-graph=dwarf ./bin/hopper-fuzzer |
| 271 | +# use flamegraph directly |
| 272 | +perf script | stackcollapse-perf.pl | rust-unmangle | flamegraph.pl > flame.svg |
| 273 | +# use inferno |
| 274 | +perf script | inferno-collapse-perf | inferno-flamegraph > flamegraph.svg |
| 275 | +``` |
| 276 | + |
| 277 | +perf will produce huge intermediate data for analysis, so *do not* run fuzzer more than 2 minutes. |
0 commit comments