Exercise 6: Dataflow Analysis

This exercise will help you practice and demonstrate your ability to implement static analyses (specifically dataflow analysis) by using Soot for dataflow analysis of Java bytecode. You will implement a simple analysis that looks for bugs in the uses of a class that expects a specific protocol to be followed. This is similar to classic typestate analyses, but your task will be more straightforward. I highly recommend reading Quentin Stievenart's notes on implementing an intraprocedural dataflow analysis using Soot. Specifically, sections 3 and 4 explain the way that Soot organizes its IR and the basic classes you need in order to define a dataflow analysis. The Soot Survivor's Guide also provides a straightforward introduction. Your experiences interacting with LLVM should translate well. A template for the project is available here.

Kittens are devious creatures. When treated properly they can be a delight, however, interacting with kittens in the wrong way at the wrong time can lead to disastrous results and tiny bite marks. An API modeling kittens has been provided to you in the template code. Like both kittens and industry code, it is not very documented and is poorly understood. It is easy to make mistakes. Your job is to design and write a dataflow analysis that conservatively identifies all locations where mistakes may be occuring with respect to uses of this API.

Conveniently, kittens behave much like protocols. The methods or behaviors of a kitten are governed by simple state machine mechanics. Individual methods of the kitten API attempt to transition a kitten into a specific state, and it is invalid to transition from some states into other states. The valid and invalid semantics are captured by the table below. For instance, calling pet() on a Kitten will transition the Kitten into the sleeping state, but invoking pet() when a Kitten is running or playing is invalid behavior. A Kitten starts out in a sleeping state upon construction. By tracking the states that a Kitten can be in, you can determine whether any particular method invocation on a Kitten may be invalid.

Method	Target State	Invalid From
`pet()`	sleeping	running, playing
`feed()`	eating	—
`tease()`	playing	sleeping, eating
`ignore()`	plotting	sleeping, eating, playing
`scare()`	running	—

You will track the states of variables holding Kittens through individual methods and report errors when a method call may be invalid. A KittenErrorReporter class has been provided to you. You should use it to report the errors that you discover, providing the variable name, line number, target state, and invalid possible source state of the error. The states should be lower case strings as in the table above. Any changes to the reporter will be discarded during grading.

Note that solving this problem well can be challenging. We will limit the scope of what you need to consider in order to make the analysis more feasible in the time you have. Specifically

You do not need to consider the nullness of the underlying references in Java. You may assume that all references to Kittens are non-null when methods are called upon them.
You only need to consider direct local variables and function arguments. You may disregard accesses of Kittens through fields. You may disregard the possibility of aliasing relationships.
You may disregard modeling any behaviors that are not method calls. While complex relationships over integers and other variables may be present and determine the flow of the program, you may ignore these when considering the conservative set of states a Kitten may be in at a particular point.
While you must model the impact of conditions through the branches they execute, you do not need to model the logic of any condition.
Your analysis may be intraprocedural.
You do not need to handle reflection.

Building using Maven

The project uses a standard Maven build process. Maven manages dependencies and the build process by following specific conventions in project structure and specifying anything extra in a pom.xml file in the root of the project. For example, our project depends on Soot, so Maven will read this from the pom.xml and download the appropriate version automatically when compiling. To compile, test, and build a jar for the project you can simply run the following from the template directory:

mvn package

You can compile using:

mvn compile

You can clean your build using:

mvn clean

And you can run unit tests using:

mvn test

Maven may not be available by default on CSIL. Again, if you are working on CSIL, you can use the provided virtual environment with:

source  /usr/shared/CMPT/faculty/wsumner/base/env745/bin/activate

Running your analysis

The aforementioned unit tests can run and check your results for several sample of cases, but you may wish to run the analysis outside of a unit testing environment. You can directly run the resulting binary on java class files in order to analyze them by using:

mvn exec:exec -DSOOT_TARGET=<package qualified class name>

For example:

mvn exec:exec -DSOOT_TARGET=ca.sfu.cmpt745.ex06.examples.Test_01_Basic

Task 1

As we discussed in class, any dataflow analysis requires a clear design for the abstract domain, the meet or merge operator, and the transfer functions involved.

Define each of these (domain, meet, and transfers) and explain why you chose that design. This explanation is not expected to require much space.

Task 2

Actually implement your analysis using Soot. Your implementation should match your presented design. While some tests are provided to you, you should consider what additional tests you need to add. All of your implementation should reside in KittenChecker.java.

To go the extra mile, you may consider implementing the analysis using Soot's infrastructure for inter-procedural analyses or consider extending the analysis to handle branch sensitive information. This is not expected.

Submitting

From your implementation, submit only KittenChecker.java to CourSys.

You can submit your design explanation separately through CourSys.

Useful Hints and Soot Resources

Every Unit in the jimple IR is a Stmt
Calls are represented via InvokeExprs, and specific subtypes deal with methods of objects, static methods, constructors, etc. From these, you can get the receiving objects and the method names, if applicable.
You can get far by thinking about assignment and method calls as the instructions you need to interpret.
Your abstract domain should be able to represent the composition of domains for each value or variable that you are interested in. Many of the examples just use a FlowSet. You do not need to. You might be inspired by the ParityAnalysis that Soot provides.