Assignment 4

Using Fuzz Testing Tools

As we discussed in class, one critical way to perform ongoing, security oriented testing is to use fuzz testing. From a security perspective, fuzz testing is sometimes even called automated penetration testing (also now used to refer to breach and attack simulation). It provides a critical part of developing proactively with an eye toward security.

For this project, you will gain experience using different fuzz testing tools on real world software. You will see some of the different benefits and costs of fuzz testing. The tools should not take long to set up and do not require more than a sophomore level of competence to use. You will, however, need to allocate 12 hours of total running time for the tools. This can be done overnight. For each approach, it is recommended that you try running it for a few minutes first in order to look for progress and sanity check that the way you are using the tool is working.

You will work in groups of two or three students for the project.

Peach Fuzzer: A task you do not need to complete

To accommodate the circumstances of the semester, I am not having you use Peach Fuzzer. Recall that this is a generative fuzzer driven by a specification or model, as we discussed in class. While gaining experience with Peach Fuzzer and writing a specification in the form of a Peach Pit is an excellent skill and excellent task, you will not need to do it this semester. Feel free to explore it on your own. The documentation pages are presently broken but are available through the Wayback Machine as in the Peach Pit link above.

Libfuzzer

Libfuzzer is a fuzzing engine produced at Google within the LLVM project and constantly running within their OSS-Fuzz initiative. As per that site, OSS-Fuzz has found more than 16,000 bugs by January 2020. You can read more about fuzzing at Google here. Or watch some of their videos [1] [2].

Of the tools you will use, Libfuzzer also requires more work to set up, but it isn't too bad. The burden comes because Libfuzzer does not fuzz an entire application. Instead, it only fuzzes a single function at a time. This gives developers more precise control over where time is spent when fuzzing, but it also requires a developer to specify how a random string of bits is translated into input for that function, e.g. by constructing the arguments out of a sequence of bits. This is particularly useful because many bugs and security vulnerabilities are found in file parsing functions, and libFuzzer tends to be straightforward to use in these contexts. It also enables libFuzzer to perform in process fuzzing, where the tested function is called repeatedly in a loop instead of in a fresh process. This tends to be an order of magnitude faster than out of process fuzzing, thus finding more bugs with fewer resources.

You will use libFuzzer to test a single parsing function for some open source software. You will need to write a small test driver that calls a single function from the software and passes in a sequence of bytes. You can find a complete tutorial for libFuzzer here. Note, writing the test harness may require you to link against some of the libraries or object files of the software project in question. It may also require you to modify the build process of the project in question to include the -fsanitize=address,fuzzer command line options. You should document the process that you used to get it working in your project write up. You should run the fuzzer for at least 4 hours or until the first crash. Note this limitation of in process fuzzing: after finding the first crash, it stops. Consider: why does this limitation exist?

Ideally, you would run it for at least 24 hours to get a better picture of the behavior and find more interesting things. Feel free to do so if you are able to. You can use the screen or tmux command to log out of a machine while your tool continues to run and then log back in later to see the results. Google now also provides support for evaluating novel fuzz testing methodologies. Consider what their report presentations indicate about the behavior of the fuzzer and how well it performed.

libFuzzer & Clang in CSIL (or at home)

libFuzzer is distributed as a part of the LLVM project and Clang. In CSIL, I have made Clang 8.0 along with libFuzzer available via a shared directory. You can take advantage of these by modifying your path. Specifically, at the end of your ~/.bash_profile to include, you can add:

export PATH=/usr/shared/CMPT/faculty/wsumner/base/bin/:/usr/shared/CMPT/faculty/wsumner/llvm/bin:$PATH

The next time you log in, these will be available to you. You can try the provided toy example to double check that it is working.

If you are working at home, you simply need to install LLVM version 6, 7, or 8 for FuzzFactory to run. You'll learn more about FuzzFactory in the next task.

FuzzFactory

FuzzFactory (based on american fuzzy lop) is recent work that demonstrates how a consistent fuzzing process can push through traditionally challenging code (as we saw in class) and automatically identify not only crashing bugs but also performance bugs, regressions, and more. If you are interested, read the paper or watch the talk.

Recall that fuzzing can be guided by a notion of coverage, and that coverage does not need to be statement coverage or branch coverage. Coverage in fuzzing is a general abstraction for observing interesting behavior. FuzzFactory supports several notions of coverage. It tracks coverage by changing the behavior of the compiler to add extra instructions into the program to measure coverage as it executes (it is a dynamic analysis). You must select the types of coverage that you are interested in at compile time and modify the build process as seen here.

You will use FuzzFactory to test the behavior of a program or library when reading in some sort of input from the user (or a file). You will run the experiment twice with two different sets of coverage criteria. The first experiment should use only cmp coverage to help explore the control flow of the program. The second experiment should use cmp, and either mem or perf together to explore the control flow graph while also looking for memory consumption or performance related problems. These two experiments may produce the same results, and they may not. You can examing the results inside the output directory in order to compare the two techniques.

You should again run the fuzzer for at least 4 hours. For fuzzers based on american fuzzy lop, curating the initial set of tests can be useful. Try to use the test inputs provided with the software you are testing. If the inputs are too varied (e.g. source code from many different programming languages), then perhaps limit the test suite to focus the process more.

Writing things up

As a group, you should reflect on the challenges faced, effort required, and either potential or recieved benefits of the tools you used for the projects you examined. What are the strengths and weaknesses of the different fuzz testing tools you used? Are these reflected in your results? Why or why not? How?

Your write-up for the assignment should again include the challenges you faced during this process, as well as your approaches for overcoming them. You should also report any errors indicated by the analyses. For groups of k members, the groups should also explain whether k of the crashes/hangs found were real bugs or not. If fewer than k bugs were found, then all discovered errors should be explained. You should also explain why fewer than k bugs were found if possible. For instance, if the fuzz tester got stuck performing the same tests on irrelevant code. This may depend on both the tool and the observed results. As a group, you should contrast your experiences with both FuzzFactory and libFuzzer and discuss their strengths and weaknesses with evidence based on your experiences.

In addition to this write-up, you should also submit any test harnesses, invocations, and the full set of constructed outputs that cause problems as well as the overall statistics. For FuzzFactory, this means that you should submit the crashes/ and hangs/ subdirectories of the output directory as well as the fuzzer_stats file for each experiment. For libFuzzer, you should include any test files corresponding to failures along with the output of the overall testing process.

Be careful. FuzzFactory may warn you that the program you are analyzing is not instrumented or that you are only identifying one path after many executions. These can be signs that you are not correctly running the fuzzer to fuzz the program of interest. Again, I expect you to not make these mistakes. We have discussed some of the potential causes in class.

Selecting Projects to Analyze

The software that you analyze should be an open source project of some sort. Any analyzed project should contain at least 4000 lines of code and must include an established test suite and automated test process. You are also free to analyze two different projects, one for each type of tool. Once again, you are free to consider different projects listed on www.github.com, www.sourceforge.net, www.openhub.net, www.gnu.org, or other major collections of open source software. If you have questions about the suitability of a particular project, please ask.

Once again, you should identify and consider:

Identification of the open-source project.
Size of the code base.
Build time to compile and link an executable from source code.
Execution time for the test suite.

Again, include this information in your report.

Other directions you may be interested in

Even taking advantage of more advanced fuzzing with FuzzFactory, we are only scratching the surface. Other fuzzing tools can do things like combine coverage guided fuzzing with language based models to make sure that generated inputs are syntactically valid [3]. They can help to automatically identify side-channel vulnerabilities [4]. Google has even worked to integrate libFuzzer with protobufs to generate valid inputs by piggybacking on common infrastructure in many projects [5]. With all of these advances also come challenges. Fuzzing can be nondeterministic, so knowing when a fuzzer is better than other fuzzers requires careful methodology [6].