Exercise 6: Dataflow Analysis

This exercise will help you practice and demonstrate your ability to implement static analyses (specifically dataflow analysis) by using Soot for dataflow analysis of Java bytecode. You will implement a simple analysis that looks for bugs in the uses of a class that expects a specific protocol to be followed. This is similar to classic typestate analyses, but your task will be more straightforward. I highly recommend reading Quentin Stievenart's notes on implementing an intraprocedural dataflow analysis using Soot. Specifically, sections 3 and 4 explain the way that Soot organizes its IR and the basic classes you need in order to define a dataflow analysis. The Soot Survivor's Guide also provides a straightforward introduction. Your experiences interacting with LLVM should translate well. A template for the project is available here.

Kittens are devious creatures. When treated properly they can be a delight, however, interacting with kittens in the wrong way at the wrong time can lead to disastrous results and tiny bite marks. An API modeling kittens has been provided to you in the template code. Like both kittens and industry code, it is not very documented and is poorly understood. It is easy to make mistakes. Your job is to design and write a dataflow analysis that conservatively identifies all locations where mistakes may be occuring with respect to uses of this API.

Conveniently, kittens behave much like protocols. The methods or behaviors of a kitten are governed by simple state machine mechanics. Individual methods of the kitten API attempt to transition a kitten into a specific state, and it is invalid to transition from some states into other states. The valid and invalid semantics are captured by the table below. For instance, calling pet() on a Kitten will transition the Kitten into the sleeping state, but invoking pet() when a Kitten is running or playing is invalid behavior. A Kitten starts out in a sleeping state upon construction. By tracking the states that a Kitten can be in, you can determine whether any particular method invocation on a Kitten may be invalid.

Method Target State Invalid From
pet() sleeping running, playing
feed() eating
tease() playing sleeping, eating
ignore() plotting sleeping, eating, playing
scare() running

You will track the states of variables holding Kittens through individual methods and report errors when a method call may be invalid. A KittenErrorReporter class has been provided to you. You should use it to report the errors that you discover, providing the variable name, line number, target state, and invalid possible source state of the error. The states should be lower case strings as in the table above. Any changes to the reporter will be discarded during grading.

Note that solving this problem well can be challenging. We will limit the scope of what you need to consider in order to make the analysis more feasible in the time you have. Specifically

Building using Maven

The project uses a standard Maven build process. Maven manages dependencies and the build process by following specific conventions in project structure and specifying anything extra in a pom.xml file in the root of the project. For example, our project depends on Soot, so Maven will read this from the pom.xml and download the appropriate version automatically when compiling. To compile, test, and build a jar for the project you can simply run the following from the template directory:

mvn package

You can compile using:

mvn compile

You can clean your build using:

mvn clean

And you can run unit tests using:

mvn test

Maven may not be available by default on CSIL. Again, if you are working on CSIL, you can use the provided virtual environment with:

source  /usr/shared/CMPT/faculty/wsumner/base/env745/bin/activate

Running your analysis

The aforementioned unit tests can run and check your results for several sample of cases, but you may wish to run the analysis outside of a unit testing environment. You can directly run the resulting binary on java class files in order to analyze them by using:

mvn exec:exec -DSOOT_TARGET=<package qualified class name>

For example:

mvn exec:exec -DSOOT_TARGET=ca.sfu.cmpt745.ex06.examples.Test_01_Basic

Task 1

As we discussed in class, any dataflow analysis requires a clear design for the abstract domain, the meet or merge operator, and the transfer functions involved.

Define each of these (domain, meet, and transfers) and explain why you chose that design. This explanation is not expected to require much space.

Task 2

Actually implement your analysis using Soot. Your implementation should match your presented design. While some tests are provided to you, you should consider what additional tests you need to add. All of your implementation should reside in KittenChecker.java.

To go the extra mile, you may consider implementing the analysis using Soot's infrastructure for inter-procedural analyses or consider extending the analysis to handle branch sensitive information. This is not expected.

Submitting

From your implementation, submit only KittenChecker.java to CourSys.

You can submit your design explanation separately through CourSys.

Useful Hints and Soot Resources

See also:

Again, there is substantial work related to this problem. Protocols that govern valid component behavior are pervasive, and the techniques to check them are numerous. Approaches for improving general dataflow analysis have also been built within Soot. A limited sample includes: