Exercise 2: Performance

This exercise will help you practice and demonstrate your ability to investigate and improve the performance of small programs. You are provided a template containing (1) small functions and data structures, (2) correctness tests that already pass and must continue to pass for a valid submission, (3) performance tests that allow you to measure the performance of the functions and data structures. Your objective will be to improve the performance of the provided functions and data structures on the given performance tests. All tasks in this exercise use C++ or Java. The template can be found here. You can clone the repository with:

git clone https://github.com/nsumner/se-perf-template.git

The C++ tasks make use of the Google Benchmark library. If you are working on your own machine, you will want to build it from source and install it. If you are working on CSIL, you can again use the provided virtual environment with:

source  /usr/shared/CMPT/faculty/wsumner/base/env745/bin/activate

and build as expected per the template instructions:

cmake ../se-perf-template/c++ \
    -DCMAKE_BUILD_TYPE=Release

When building the C++ tasks make sure to build in Release mode. In the above example, -DCMAKE_BUILD_TYPE=Release enables this.

The Java benchmarks make use of JMH. The dependencies should be resolved automatically if you have Maven installed, as is normal for Java programs. This should also already be installed in the CSIL virtual environment. You can build the Java parts of the template with:

cd se-perf-template/java
mvn clean package

In performing these tasks you will modify libraries in very specific directories of the exercise. Any modifications that you make to other directories will be overwritten during grading, so you should only modify the specified directories. All solutions should be able to run in CSIL. In reality, any up to date Linux distribution (or WSL) should suffice, but CSIL is the standard. Use of external libraries and code other than the standard library in C++ or the already used classes for Java is explicitly prohibited unless otherwise mentioned. You should assume that I will control the build process and build optimized programs in release mode.

Measuring Speedup

Each of your tasks will involve improving the running time of a small benchmark. In practice, this kind of performance improvement is measured in part through a metric known as speedup. Given two solutions to a problem, the speedup captures how much improvement the second gives over the first (where the first is the baseline).

Often, we want to use an old solution as a baseline and compute the relative speedup of a new solution. When both solutions are running on the same workload, we can compute the speedup of running time as:

Layer 1 Speedup latency = old new

Consider what this means. If the old solution ran 3 seconds and the new solution ran 2 seconds, then the speedup is 3/2 = 1.5. This would then be referred to as a 1.5x speedup.

That may initially seem surprising, but it is more intuitive whn you think about the amount of work done. If each "unit of work" takes the same amount of time, then the new solution could complete 50% more work in the same amount of time. Thus, a higher speedup is better.

In each of the tasks that you complete, you must achieve at least a given speedup. You are free to compete with eachother to do better. In all cases, you may not use threads or SIMD operations.

When measuring performance, you'll also find that different machines can behave wildly differently. To provide consistency, grading will be done on desktop machines in the ASB 9840 lab of CSIL. You can remotely log into specific desktops remotely if you so wish. Your speedup targets are also selected for such an environment. When working on your own machine, prefer a desktop or a laptop plugged in and with CPU frequency scaling disabled. Running on the CSIL servers, for instance, would instead produce misleading results.

C++ Tasks

Task 1) The Tracker

You have been provided a tracking system in c++/lib/tracker/. The tracking system maintains a list of suspicious entities who may be spies, monitoring their locations, names, and secret information about whatever devious plans they may be plotting. Of particular importance are the operations countDangerous and checkPlans. You should examine the performance bottlenecks of these functions and refactor the tracking system so that the performance of both functions improves. The EntityTracker API must remain the same, as must the semantics of both functions. As an extra constraint, you may not use explicit parallelism or precomputed indices. Note that the isDangerous and check arguments passed in may change during actual testing.

You can measure the performance of these particular functions in the API using the tests in c++/perf/tracker.cpp. From the build directory, you can run them with out/bin/perf/trackerPerf.

For full marks, aim to improve the countQuadrant performance test by a speedup of 1.3 or higher while not decreasing the performance of checkSecrets. (A decrease of less than 1% is allowed.) It is possible to do much better (still without parallelism). Your improvements to checkSecrets may have been low. Why is that?

Consider: Are there issues with the performance test that make it harder to determine what improvements exist? What sorts of problems might be easier to fix?

Task 2) The Executor

You have been provided with a simple workload Executor in c++/lib/executor/. The Executor takes in different types of Work and executes them. There are 3 different kinds of work that you can see in Task1, Task2, and Task3. The exact nature of the work in the tasks does not matter, but your job is to improve the performance of the overall Executor. You may assume that all Tasks are final and semantically unordered.

You can measure the performance of the executor using the tests in c++/perf/executor.cpp. From the build directory, you can run them with out/bin/perf/executorPerf. For full marks, aim to improve the executeTasks performance test by a speedup of 2.5 or higher.

Are there simple changes to your constraints that would make these performance gains impossible?

Task 3) The Frob

You have been provided with a Frob calculator in c++/lib/frob. A Frob object models a simple mechanical device and contains several numerical facts about Frobs. The frobnosticate method of a Frob models its mechanical behavior by taking in two numbers and computing a simple formula. For some reason this method seems slow. It is your job to improve the performance of frobnosticate.

You can measure the performance of the executor using the tests in c++/perf/frob.cpp. From the build directory, you can run them with out/bin/perf/frobPerf. For full marks, aim to improve the frobnosticate performance test by a speedup of 1.5 or higher. Even a speedup of 3 is achievable in CSIL!

How do your modifications relate to different programming languages? What implications and consequences does this have?

Running correctness tests

Some correctness tests have been provided to you within the c++/test/ directory of the project. The tests make use of the APIs that were provided to you in order to make sure you are not changing the APIs in ways that are incompatible at the source level with existing usage. ABI compatibility is not a requirement.

You can run tests in a Linux shell by through the test target of build system, e.g. make test. You can get more detailed results from the doctest framework by running the individual test programs inside the out/bin/test/ subdirectory of the build directory.

Note, these tests may not cover everything. You are responsible for guaranteeing that any changes you make are source compatible with the original, preserving the high level semantics.

Any changes you make to these tests will be removed during grading.

Running performance tests

Additional performance tests have been provided to you within the c++/perf/ directory. These tests make use of the Google Benchmark framework in order to perform automated microbenchmarking of your underlying tests.

Within your build directory, you can find the binaries for these tests within out/bin/perf/. You should use these binaries both for your investigation of the causes of performance degradation and for checking whether or not you have improved performance. Graded performance will come from the average latency as reported by the tests, but adding new tests or running with additional commandline options may be useful when investigating performance.

Any changes you make to these tests will be removed during grading.

When collecting the Google Benchmark results, you will see columns showing (1) the time, (2) the number of iterations, and (3) the number of items per second. These reflect helpful statistics and some of the internal process used by the benchmarking framework. Google benchmark determines how many iterations of a benchmark are required in order to collect meaningful measures (in a non-statistical sense). This is reported as the number of iterations. The reported time is the time per single iteration of the benchmark. The main number of interest is the average iteration time, indicated with the _mean suffix in the benchmark column.

Java Tasks

For additional information on antipatterns that can occur when using JMH for benchmarking, consider looking at this empirical study. While simple on the surface, benchmarking can be quite error prone.

Task 4

As sometimes happens, you now have to deal with a familiar problem in a slightly different context. In Task 2, you worked to improve the performance of a basic task executor in C++. Now you have been provided a similar interface in Java, and you need to be able to speed it up as well.

You have been provided a simple OperationRunner in java/src/main/java/ca/sfu/cmpt745/ex02/. The OperationRunner takes in different types of Operations through its constructor and allows the operations to be executed later via its run() method. There are again 3 different types of Operations in the provided code.

You may assume that the order of operations does not matter (they are commutative). However, any number of additional unknown operation kinds could be added, and any modifications you make must be able to tolerate that. You may make use of standard classes in java.lang.* and java.util.* but may not make the code parallel in any way. You may modify only OperationRunner.java. For full marks, aim to improve the performance test by a speedup of 2.2 or higher.

Running correctness tests

You can run the correctness tests for Java from the java/ directory using:

mvn clean test

Running performance tests

You can run the performance benchmarks from the java/ directory using:

mvn clean package
java -jar target/ex02-benchmarks.jar

The graded time is the average measured operation time (avgt) reported as the "Score" in JMH.

Submitting

First, clean your Java build artifacts by running the following within the java subdirectory of the project:

mvn clean

Also make sure that no build artifacts from your C++ work are in the project directory. Create an archive of your solution by changing into the directory containing your project template and running:

tar zcvf e2.tar.gz se-perf-template/

You can then upload this archive to CourSys.

If your archive is too large, then you have likely included build artifacts in your submission, and you should remove them before archiving.