This exercise will help you practice and demonstrate your ability to implement static analyses (specifically dataflow analysis) by using Soot for dataflow analysis of Java bytecode. You will implement a simple analysis that looks for bugs in the uses of a class that expects a specific protocol to be followed. This is similar to classic typestate analyses, but your task will be more straightforward. I highly recommend reading Quentin Stievenart's notes on implementing an intraprocedural dataflow analysis using Soot. Specifically, sections 3 and 4 explain the way that Soot organizes its IR and the basic classes you need in order to define a dataflow analysis. The Soot Survivor's Guide also provides a straightforward introduction. Your experiences interacting with LLVM should translate well. A template for the project is available here.
Kittens are devious creatures. When treated properly they can be a delight, however, interacting with kittens in the wrong way at the wrong time can lead to disastrous results and tiny bite marks. An API modeling kittens has been provided to you in the template code. Like both kittens and industry code, it is not very documented and is poorly understood. It is easy to make mistakes. Your job is to design and write a dataflow analysis that conservatively identifies all locations where mistakes may be occuring with respect to uses of this API.
Conveniently, kittens behave much like protocols. The methods or behaviors of
a kitten are governed by simple state machine mechanics. Individual methods of
the kitten API attempt to transition a kitten into a specific state, and it is
invalid to transition from some states into other states. The valid and invalid
semantics are captured by the table below. For instance, calling pet()
on a
Kitten
will transition the Kitten
into the sleeping state, but invoking
pet()
when a Kitten
is running or playing is invalid behavior. A Kitten
starts out in a sleeping state upon construction. By tracking the states that
a Kitten
can be in, you can determine whether any particular method invocation
on a Kitten
may be invalid.
Method | Target State | Invalid From |
---|---|---|
pet() |
sleeping | running, playing |
feed() |
eating | — |
tease() |
playing | sleeping, eating |
ignore() |
plotting | sleeping, eating, playing |
scare() |
running | — |
You will track the states of variables holding Kitten
s through individual
methods and report errors when a method call may be invalid.
A KittenErrorReporter
class has been provided to you.
You should use it to report the errors that you discover,
providing the variable name, line number, target state, and invalid possible
source state of the error.
The states should be lower case strings as in the table above.
Any changes to the reporter will be discarded during grading.
Note that solving this problem well can be challenging. We will limit the scope of what you need to consider in order to make the analysis more feasible in the time you have. Specifically
Kittens
through fields.
You may disregard the possibility of aliasing relationships.Kitten
may be in at a particular point.The project uses a standard Maven build process. Maven manages dependencies and
the build process by following specific conventions in project structure and
specifying anything extra in a pom.xml
file in the root of the project. For
example, our project depends on Soot, so Maven will read this from the pom.xml
and download the appropriate version automatically when compiling.
To compile, test, and build a jar for the project you can simply run the
following from the template directory:
You can compile using:
You can clean your build using:
And you can run unit tests using:
Maven may not be available by default on CSIL. Again, if you are working on CSIL, you can use the provided virtual environment with:
The aforementioned unit tests can run and check your results for several sample of cases, but you may wish to run the analysis outside of a unit testing environment. You can directly run the resulting binary on java class files in order to analyze them by using:
For example:
As we discussed in class, any dataflow analysis requires a clear design for the abstract domain, the meet or merge operator, and the transfer functions involved.
Define each of these (domain, meet, and transfers) and explain why you chose that design. This explanation is not expected to require much space.
Actually implement your analysis using Soot. Your implementation should match
your presented design. While some tests are provided to you, you should
consider what additional tests you need to add. All of your implementation
should reside in KittenChecker.java
.
To go the extra mile, you may consider implementing the analysis using Soot's infrastructure for inter-procedural analyses or consider extending the analysis to handle branch sensitive information. This is not expected.
From your implementation, submit only KittenChecker.java
to
CourSys.
You can submit your design explanation separately through CourSys.
Unit
in the jimple IR is a
StmtInvokeExpr
s,
and specific subtypes deal with methods of objects, static methods,
constructors, etc. From these, you can get the receiving objects and the
method names, if applicable.FlowSet
. You do not need to. You might be inspired by the
ParityAnalysis
that Soot provides.See also:
Again, there is substantial work related to this problem. Protocols that govern valid component behavior are pervasive, and the techniques to check them are numerous. Approaches for improving general dataflow analysis have also been built within Soot. A limited sample includes: