Project 2: Interprocedural Dataflow

This project is intended to help you get acquainted with interprocedural dataflow analysis and the sorts of approximations that arise when balancing precision and scalability.

When a thread A attempts to acquire lock/mutex L1 held by thread B, it waits until thread B releases L1. If thread B is already waiting for thread A to release a lock L2, then the system enters deadlock. Neither thread can make progress because each is waiting for the other to release a lock before it will proceed.

One simple form of deadlock is also known as double locking or self-deadlock. In this scenario, a single thread of execution may attempt to acquire the same non-reentrant lock twice and enter deadlock while waiting for itself to release the lock. A similar pattern of double unlocking occurs when releasing a lock that was already released or possibly never acquired. This does not produce deadlock, but it is bad practice and often produces undefined behavior.

In this project, you will be implementing an interprocedural dataflow analysis to detect and report double (un)locking errors. The analysis should be context-sensitive and flow-sensitive, using the approach that we talked about in class. Recall that context sensitivity is approximated to a fixed depth. For this project, you should use a depth of 2.

A good strategy for this project would be to first implement an intra-procedural version of the analysis. Recall that we looked at an analysis detecting problems with operations on opened or closed files in class. You should find that the problem of double (un)locking is strikingly similar, although there are some twists. You will also be able to make use of your call graphs from project 1 to assist in the inter-procedural aspects.

Pthreads

There are many implementations of mutexes and locking policies. For this project, you will only need to consider a subset of the Pthreads API. To handle the full expressiveness and range of behaviors of mutexes in the Pthreads API would be too much to expect, so we shall make several simplifying assumptions.

Mutex Creation

A mutex variable is created by simply declaring a pthread_mutex_t as either a local or a global variable:

pthread_mutex_t m;

However a mutex variable cannot be used until it is initialized. There are two ways that mutexes can be initialized via the Pthreads API, but for this project, we shall assume that every mutex must be initialized via a call to pthread_mutex_init(). The first argument to this function is a pointer to the mutex being initialized.

pthread_mutex_init(&m, NULL);

Initializing a mutex is analogous to opening a file. It makes the mutex variable available for meaningful uses and puts it into the unlocked state. The mutex theoretically remains available until the mutex is destroyed via a call to pthread_mutex_destroy(). For this project, you are not required to consider errors resulting from destroying a mutex that was not initialized or that was locked when pthread_mutex_destroy() was called. You also do not need to guarantee that pthread_mutex_destroy() gets called for every mutex that is initialized.

In short, you can use pthread_mutex_init() to recognize mutexes within the dataflow analysis.

Locking and Unlocking

A mutex can be locked using pthread_mutex_lock(), which moves the mutex into the locked state, and a mutex can be unlocked using pthread_mutex_unlock(), which moves the mutex into the unlocked state:

pthread_mutex_lock(&m);
// code protected by m
pthread_mutex_unlock(&m);

You must identify double (un)locking errors where pthread_mutex_lock() is called on a mutex in the locked state or pthread_mutex_unlock() is called on a mutex in the unlocked state.

Pointers and Mutex Locations

Note that mutexes can be created as global variables, stack variables, or allocated on the heap. To simplify the analysis, you may assume that no heap allocations occur within the analyzed programs. However, if a pointer to a mutex is passed to a function, you should identify which mutex variables that argument might point to and proceed accordingly. You can also assume that every mutex is declared as an independent global or local variable, so there is no need to worry about delving into aggregates like arrays or struct types. Note: in context sensitive analyses, stack variables are usually identified by their context identifiers.

There are some subtleties in this part of the project for which I will not be checking. Once the class encounters them, we might have an eye opening discussion or two.

LLVM

A template for the project is available here. The compilation process for the project should be the same as before. To complete the project, you must implement the analysis in tools/deadlocker/DoubleLockingPass.cpp. You may add or change any other files as you see fit.

To use the tool, run ./deadlocker <input bitcode or llvm assembly>.

What you turn in

You must turn in two things via CourSys. Make sure that your name is clearly specified in both.

  1. The source files for your project. Compress your project directory into a .tar.gz file and submit. I will build and test all projects on a 64-bit machine. Make sure that I can test your project by simply running the tool that gets built by the provided template.

  2. A brief (1 page max) write up of your project that explains your basic design as well as what results you may have observed when running your tool on tests. Where does does it produce false positives? False negatives? How might you change things to make it more useful (feel free to brainstorm)?

Both of these should be submitted to CourSys by the deadline.

Deadline

Monday, February 16 at 11:59pm

Further reading

As longstanding a problem as deadlock may be, it still is not solved in general. A host of recent papers attack the problem in a variety of ways.

  1. Detecting Errors with Configurable Whole-Program Dataflow Analysis
  2. An effective dynamic analysis for detecting generalized deadlocks
  3. Multithreaded test synthesis for deadlock detection
  4. Sherlock: scalable deadlock detection for concurrent programs
  5. ConLock: a constraint-based approach to dynamic checking on deadlocks in multithreaded programs