This project is intended to help you get acquainted with creating dynamic analyses using LLVM.
Identifying which paths in a program execute most frequently can be an effective tool for identifying regions of a program to optimize and for double checking that a program is behaving as expected. Beyond these more expected uses, it can benefit and drive additional analyses ranging from dataflow analysis to statistical debugging For this project, you will implement a limited form of a technique called "Efficient Path Profiling". You have previously seen this technique presented in class, so the basics should already be familiar to you.
Recall that Efficient Path Profiling (EPP) compactly numbers all paths through a directed acyclic graph (DAG) and then annotates DAGs within a control flow graph (CFG) so that executing a path of a DAG within a CFG will increase a counter for that path. When this newly instrumented program is then compiled and executed, the counters for all paths are logged when the program terminates and can be decoded to identify the frequency of all paths.
Note that there are three key components to this system:
You will be implementing all three of these components, but you will only have to do so for the bodies of innermost loops within a function. These are loops that do not contain any loops within themselves. Thus, in the following function, you will increment a counter for the path taken by each iteration of loops A and C.
def foo():
while someCondition: # Loop A
# Identify the paths here and increment their counters
while anotherCondition: # Loop B
# Some code that is ignored
while yetAThirdCondition # Loop C
Identify the paths here and increment their counters
A function containing no loops is uninstrumented.
The body of a loop with multiple exits should be modeled as though all actual
exits are immediately followed by a unique pseudo-exit for the loop body.
Similarly, the latch, or last basic block before the backedge of the loop
that returns to the loop header, represents the last node of the DAG before
exiting the loop body.
Information about loops inside LLVM is available via the
LoopInfo
class, which is enabled in the template provided to you. You will be
particularly interested in the begin()
, end()
, and empty()
methods
method of LoopInfo
as well as the methods of the same name and
contains
, getLoopLatch
, and getExitingBlocks
or isExitingBlock
for Loop
.
When instrumenting code, you may also find the SplitEdge()
function in
BasicBlockUtils.h
to be useful.
The tool that you create, called pathprofiler
shall instrument and compile
a given program so that it identifies the acyclic paths within the innermost
loops of every function and counts how many times they execute. When the
instrumented program runs and terminates, it should write out a file called
path-profile-results
containing these path identifiers as well as how many
times each of the paths was executed in that one run. If your tool is instead
given the original bitcode module and the path-profile-results
then it
shall print out the top five most frequent paths and how many times they
executed, e.g.
Top 3 Paths
=========================================
Path, occurrences: 1
File c/02-simpleloop.c line 7
Path, occurrences: 3
File c/02-simpleloop.c line 7
File c/02-simpleloop.c line 8
File c/02-simpleloop.c line 11
File c/02-simpleloop.c line 13
File c/02-simpleloop.c line 7
Path, occurrences: 7
File c/02-simpleloop.c line 7
File c/02-simpleloop.c line 8
File c/02-simpleloop.c line 9
File c/02-simpleloop.c line 10
File c/02-simpleloop.c line 13
File c/02-simpleloop.c line 7
The exact nature of the input is discussed in the LLVM section. The exact algorithm for computing edge increments can be found in Figure 5 on page six of the Efficient Path Profiling paper. Do not forget that we had a lecture on how to write an analysis using LLVM. In particular remember that you must initialize your data when the program starts, update it as the program runs using your runtime code, and write out your data when the program ends.
You may assume:
##LLVM
A template for the project is available here. The compilation process for the project should be the same as before. To complete the project, you must implement the analyses in
lib/pathprofiler-inst/PathProfilingPass.cpp
lib/pathprofiler-inst/PathEncodingPass.cpp
lib/pathprofiler-inst/PathDecodingPass.cpp
.You may add or change any other files as you see fit.
To use the tool, run
./pathprofiler <input bitcode or llvm assembly> -o <compiled path>
to instrument and compile the module into a new and executable binary. Running
the compiled program then produces a file named path-profile-results
in
the current directory. Running
./pathprofiler <input bitcode or llvm assembly> -p <path-profile-results>
shall print out the appropriate list of most executed paths.
By default, the project template assumes that your static instrumentation
code can be found in lib/pathprofiler-inst/
and your runtime library that
will be linked with the instrumented program can be found in
lib/pathprofiler-rt/
.
##What you turn in
You must turn in one thing via CourSys. Make sure that your name is clearly specified.
This should be submitted to CourSys by the deadline.
##Deadline Tuesday, March 8 at 11:59pm
##References