Assignment 4

Using Program Analysis Tools

For this project, you will gain experience using dynamic analysis tools on real world software.

You will work in groups of two or three students, and the requirements will scale with the group size.

Overview

As a group, you will apply two types dynamic analysis tools we examined in class to open source projects and report on the results. One set of tools will look for bugs in a preexisting set of executions, while the other will create new executions/tests in an effort to find bugs. Note that not all tools will work with all programs, so the type of program that you analyze must be compatible with the tools that you use. This follows from our in class discussions.

Detecting bugs in an existing test suite

Possible dynamic analysis tools for this assignment include:

Recall that dynamic analysis tools analyze a single execution at a time. Thus, it is common practice to integrate the dynamic analysis tools with the test process for a project in order to detect potential bugs over a set of predetermined executions. In order to apply your selected tool to a given open source project, you shall integrate the dynamic analysis tool into the existing test suite and automated testing infrastructure for the project and run the analysis over every execution in the test suite. If you are interested in using an unlisted dynamic analysis tool, ask to make sure that it is appropriate. Your write-up for the assignment should include the challenges you faced during this process, as well as your approaches for overcoming them. You should also report any errors indicated by the analysis. For groups of k members, the groups should also explain whether k of the errors found were real bugs or not. If fewer than k bugs were found, then all errors should be explained. Include the full instructions for reproducing your results based on files that you had to modify and how.

Be careful. A very common mistake is to run valgrind on make instead of running it on the project you wish to analyze. We discussed issues like this in class, and I expect you to not make this mistake.

Fuzzing

For this part of the assignment, you will gain experience with two different fuzz testing infrastructures. One of these will be american fuzzy lop, and the other is libFuzzer. These are both mutation based fuzzers that focus on finding crashes or potential security vulnerabilities inside programs.

In contrast to the previous set of tools, fuzz testing tools attempt to generate new inputs (and thus executions) of a program and look for general correctness issues as those new tests run. They do not need to be integrated into a test suite, but you sometimes need to construct a test harness for particular functions of interest, as we saw in class. It can sometimes also be beneficial to carefully select the initial input test corpus to guide the process.

First, you will use american fuzzy lop to test the behavior of a program or library when reading in some sort of input from the user (or a file). You should run the fuzzer for at least 8 hours. Ideally, and if you want more interesting results, you would let it run for longer. You can use the screen or tmux command to log out of a machine while your tool continues to run and then log back in later to see the results. For american fuzzy lop, curating the initial set of tests can be useful. Try to use the test inputs provided with the software you are testing. If the inputs are too varied (e.g. source code from many different programming languages), then perhaps limit the test suite.

Second, you will use libFuzzer to test a single parsing function for the software. You will need to write a small test driver that calls a single function from the software and passes in a sequence of bytes. You can find a complete tutorial for libFuzzer here. Note, writing the test harness may require you to link against some of the libraries or object files of the software project in question. It will also require you to modify the build process of the project in question to include the -fsanitize=address,fuzzer command line options. You should document the process that you used to get it working in your project write up. libFuzzer differs from AFL in that it performs in process fuzzing. By performing both the fuzzing and the reasoning about fuzzing in the same process, it can run much faster and thus potentially test more code over the same amount of time. As with AFL, you should run the fuzzer for at least 8 hours or until the first crash. Note this limitation of in process fuzzing: after finding the first crash, it stops. Why does this limitation exist?

Your write-up for the assignment should again include the challenges you faced during this process, as well as your approaches for overcoming them. You should also report any errors indicated by the analyses. For groups of k members, the groups should also explain whether k of the crashes/hangs found were real bugs or not. If fewer than k bugs were found, then all discovered errors should be explained. You should also explain why fewer than k bugs were found if possible. This may depend on both the tool and the observed results. As a group, you should contrast your experiences with both AFL and libFuzzer and discuss their strengths and weaknesses with evidence based on your experiences.

In addition to this write-up, you should also submit any test harnesses, invocations, and the full set of constructed outputs that cause problems as well as the overall statistics. For AFL, this means that you should submit the crashes/ and hangs/ subdirectories of the output directory as well as the fuzzer_stats file. For libFuzzer, you should include any test files corresponding to failures along with the output of the overall testing process.

Be careful. afl may warn you that the program you are analyzing is not instrumented or that you are only identifying one path after many executions. These are signs that you are not correctly running afl to fuzz the program of interest. Again, I expect you to not make these mistakes.

libFuzzer & Clang in CSIL

libFuzzer is distributed as a part of the LLVM project and Clang. In CSIL, I have made Clang 6.0 along with libFuzzer available via a shared directory. You can take advantage of these by modifying your path. Specifically, at the end of your ~/.bash_profile to include, you can add:

export PATH=/usr/shared/CMPT/faculty/wsumner/base/bin/:/usr/shared/CMPT/faculty/wsumner/llvm/bin:$PATH

The next time you log in, these will be available to you.

Selecting Projects to Analyze

For software that you analyze should be an open source project of some sort. Any analyzed project should contain at least 4000 lines of code and must include an established test suite and automated test process. You are also free to analyze two different projects, one for each type of tool. Once again, you are free to consider different projects listed on www.github.com, www.sourceforge.net, www.openhub.net, www.gnu.org, or other major collections of open source software. If you have questions about the suitability of a particular project, please ask. Finally, do not select one of the programs that I already showed you in class.

Once again, you should identify and consider:

  1. Identification of the open-source project.
  2. Identification of the supporting organization.
  3. Size of the code base.
  4. Build time to compile and link an executable from source code.
  5. Execution time for the test suite.

Again, include this information in your report.

Submission

As a group, you should reflect on the challenges faced, effort required, and either potential or recieved benefits of the tools you used for the projects you examined. What are the strengths and weaknesses of the different types of dynamic analysis tools that you used? Are these reflected in your results? Why or why not? How? How might these compare to the strengths and weaknesses of static analysis tools? You should form and justify an opinion as to which was more useful for the project(s) you examined.

Final Notes

While libFuzzer and AFL focus on finding crashes and likely vulnerabilities, fuzzing just seeks to find inputs satisfying some interesting objectives. This can include things like inputs with worst case complexity [1] [2], identifying side channels [3], and more [4]. It can also be challenging to tell whether one fuzz testing technique is actually better than another.