Exercise 5: Dynamic Analysis

This exercise will help you practice and demonstrate your ability to implement dynamic analyses by using LLVM for static program instrumentation. You will implement several transformations on a program in order to make its behavior tolerate faults, albeit with uncertain or approximate quality of service.

Your analysis will roughly be a whole program analysis. That is, you may assume that you have access to (almost) the entire program. Any program state that needs to be inspected, tracked, or otherwise manipulated will be made available to you via a single compiled .ll file containing the IR of the program. All functions whose instructions you need to observe will also be present in this IR. External function calls to libraries like libc may occur. You should be able to safely interact with these libraries, but you need not modify them or consider their behavior in any way.

You will protect programs from different types of common bugs by checking a few simple policies at runtime. These are all standard bugs that will cause a program to crash, which can cause problems with availability.

  1. Heap, global, and stack pointer accesses. Any (1) load from or (2) store to a pointer that occurs in the given IR must access an address that is a valid global, local variable, or heap allocated memory that occurs within the provided IR. Both spatial and temporal safety should be enforced. Once memory is no longer temporally valid, accesses should be invalid. You may assume that all heap memory will be allocated via a call to malloc(). Allocated heap memory will only be freed via a call to free().
  2. Invalid frees. Any call to free() must pass a pointer that is either valid or NULL.
  3. Divide by zero bugs. Any integer division must have a nonzero denominator.

Your implementation does not need to be concerned about performance. No performance benchmarking of your solution will be done unless it is slow enough to impact the grading process. Penalties may apply in those extreme cases.

You will implement a few different policies for handling these types of faults, varying in difficulty. The different policies will allow you to control the quality of service (QOS) of the instrumented program in order to make a trade off between preservation of the original program semantics and availability.

A template for the project is available here.

To build this project, you should use LLVM 17.X, available on the LLVM download page. Downloading the pre-built binaries should provide you everything you need. This should already be available in the CSIL lab at /usr/shared/CMPT/faculty/wsumner/llvm/. The bin/ subdirectory also contains appropriate versions of clang and other LLVM related tools that you may use for the project.

Building using cmake

First create a build directory separate from your source directory in order to perform an out-of-source build. From within the build directory for the project, run:

cmake -DLLVM_DIR=<installation, build, or extraction path of LLVM>/lib/cmake/llvm <path to project source>

Then you can simply run make to build the project within the build directory. The tolerator program will reside at bin/tolerator in the build directory.

The CSIL virtual environment makes this process simpler. After activating it with:

source  /usr/shared/CMPT/faculty/wsumner/base/env745/bin/activate

You can configure with just:

cmake <path to project source>

Task 1 (Terminating with warnings)

The simplest way to handle incorrect behaviors is simply to terminate the running program and print (log) an error message describing what form of error has occurred. For this task, you will handle each of the three classes of bugs by simply printing out an error message to stderr and ending the program, making the program return -1 to the system to indicate an error.

The specific error messages should be:

Make sure your error messages are correct so that any automated grading does take points that you would otherwise earn. You have been provided with some test cases, but there are more that you can write on your own.

Your instrumentor should be able to run as:

tolerator <toinstrument.ll> -o instrumented

This should produce a program that can run as:

./instrumented <normal> <command> <line> <arguments>

Several tests are provided for you. They will automatically run your instrumentor and execute the instrumented code. You can execute them with:

make test

However, unlike previous tests that you have seen, the goal is NOT to make all tests pass! Instead, the given policies should make some tests pass, but they may even make other tests fail. I recommend discussing on the course forum to decide what the impact should be on individual test cases.

Task 2 (Task 1 + Ignoring invalid side effects)

Consider the impact of behaviors that don't produce a value but rather produce a side effect. In our case, this includes (1) stores to memory and (2) frees of memory. These instructions need not necessarily do anything in order for the program to be well formed. Storing to an invalid location will not even matter if we only read from valid locations. Similarly, no valid memory can be freed when we call free with an invalid address. Thus, we might simply ignore these instructions.

When your instrumentor is called with the -ignore argument, the resulting program shall log the bad behavior, as in part 1, but it will ignore the effects of invalid operations without values and continue running.

Task 3 (Task 2 + Default values)

We might take this a step further. Note that we have not been able to avoid reads from invalid memory or division by zero. In these case some value must be produced; we cannot simply ignore the operation. However, we might consider whether the produced value matters a great deal…. If the value does not matter, we could simply return a default value instead.

When your instrumentor is called with the -defaults argument, the resulting program shall log the bad behavior and discard valueless faults, as in part 2, but it will provide erroneous loads from memory and divisions by zero with the value 0 and keep running.

Note, now none of the original fault kinds will result in a program that terminates!

Task 4 (Task 1 + local bypass)

Even more aggressively, we might say that all computation within a function that performs an invalid memory operation or division is suspect, so we really ought to exit the function if we want to avoid the bad behavior while continuing to run. For any function returning void, this is easy, we can simply return directly from the function. For a function that returns a value, we can try returning a zero initialized version of that value type and keep executing.

When your instrumentor is called with the -returns argument, the resulting program shall log the bad behavior and return to the caller of the current function. Similar to task 3, any returned from function that produces a value should produce the zero initalized form of that value.

Submitting

Create an archive of your solution by changing into the directory containing your project template and running:

tar zcvf e5.tar.gz se-fault-tolerant-template/

You can then upload this archive to CourSys. Again, make sure to not include your build artifacts.

There is a substantial amount of work on fault tolerant and failure oblivious computing. More recently, there has been interest in work that can scale the quality of service with other requirements of the software (whether fault tolerance or performance).