This exercise will help you practice and demonstrate your ability to investigate and improve the performance of small programs. You are provided a template containing (1) small functions and data structures, (2) correctness tests that already pass and must continue to pass for a valid submission, (3) performance tests that allow you to measure the performance of the functions and data structures. Your objective will be to improve the performance of the provided functions and data structures on the given performance tests. All tasks in this exercise use C++ or Java. The template can be found here. You can clone the repository with:
The C++ tasks make use of the Google Benchmark library. If you are working on your own machine, you will want to build it from source and install it. If you are working on CSIL, you can again use the provided virtual environment with:
and build as expected per the template instructions:
When building the C++ tasks make sure to build in Release mode. In the above
example, -DCMAKE_BUILD_TYPE=Release
enables this.
The Java benchmarks make use of JMH. The dependencies should be resolved automatically if you have Maven installed, as is normal for Java programs. This should also already be installed in the CSIL virtual environment. You can build the Java parts of the template with:
In performing these tasks you will modify libraries in very specific directories of the exercise. Any modifications that you make to other directories will be overwritten during grading, so you should only modify the specified directories. All solutions should be able to run in CSIL. In reality, any up to date Linux distribution (or WSL) should suffice, but CSIL is the standard. Use of external libraries and code other than the standard library in C++ or the already used classes for Java is explicitly prohibited unless otherwise mentioned. You should assume that I will control the build process and build optimized programs in release mode.
Each of your tasks will involve improving the running time of a small benchmark. In practice, this kind of performance improvement is measured in part through a metric known as speedup. Given two solutions to a problem, the speedup captures how much improvement the second gives over the first (where the first is the baseline).
Often, we want to use an old solution as a baseline and compute the relative speedup of a new solution. When both solutions are running on the same workload, we can compute the speedup of running time as:
Consider what this means. If the old solution ran 3 seconds and the new solution ran 2 seconds, then the speedup is 3/2 = 1.5. This would then be referred to as a 1.5x speedup.
That may initially seem surprising, but it is more intuitive whn you think about the amount of work done. If each "unit of work" takes the same amount of time, then the new solution could complete 50% more work in the same amount of time. Thus, a higher speedup is better.
In each of the tasks that you complete, you must achieve at least a given speedup. You are free to compete with eachother to do better. In all cases, you may not use threads or SIMD operations.
When measuring performance, you'll also find that different machines can behave wildly differently. To provide consistency, grading will be done on desktop machines in the ASB 9840 lab of CSIL. You can remotely log into specific desktops remotely if you so wish. Your speedup targets are also selected for such an environment. When working on your own machine, prefer a desktop or a laptop plugged in and with CPU frequency scaling disabled. Running on the CSIL servers, for instance, would instead produce misleading results.
You have been provided a tracking system in c++/lib/tracker/
. The tracking system
maintains a list of suspicious entities who may be spies, monitoring their
locations, names, and secret information about whatever devious plans they may
be plotting. Of particular importance are the operations countDangerous
and
checkPlans
. You should examine the performance bottlenecks of these
functions and refactor the tracking system so that the performance of both
functions improves. The EntityTracker
API must remain the same, as must the
semantics of both functions. As an extra constraint, you may not use explicit
parallelism or precomputed indices. Note that the isDangerous
and check
arguments passed in may change during actual testing.
You can measure the performance of these particular functions in the API using
the tests in c++/perf/tracker.cpp
. From the build directory, you can run
them with out/bin/perf/trackerPerf
.
For full marks, aim to improve the countQuadrant
performance test by a speedup
of 1.3 or higher while not decreasing the performance of checkSecrets
.
(A decrease of less than 1% is allowed.)
It is possible to do much better (still without parallelism). Your improvements
to checkSecrets
may have been low. Why is that?
Consider: Are there issues with the performance test that make it harder to determine what improvements exist? What sorts of problems might be easier to fix?
You have been provided with a simple workload Executor
in c++/lib/executor/
.
The Executor
takes in different types of Work
and executes them. There are
3 different kinds of work that you can see in Task1
, Task2
, and Task3
.
The exact nature of the work in the tasks does not matter, but your job is to
improve the performance of the overall Executor
. You may assume that all
Task
s are final
and semantically unordered.
You can measure the performance of the executor using the tests in
c++/perf/executor.cpp
. From the build directory, you can run
them with out/bin/perf/executorPerf
.
For full marks, aim to improve the executeTasks
performance test
by a speedup of 2.5 or higher.
Are there simple changes to your constraints that would make these performance gains impossible?
You have been provided with a Frob
calculator in c++/lib/frob
. A Frob
object
models a simple mechanical device and contains several numerical facts about
Frob
s. The frobnosticate
method of a Frob
models its mechanical behavior
by taking in two numbers and computing a simple formula. For some reason this
method seems slow. It is your job to improve the performance of
frobnosticate
.
You can measure the performance of the executor using the tests in
c++/perf/frob.cpp
.
From the build directory, you can run
them with out/bin/perf/frobPerf
.
For full marks, aim to improve the frobnosticate
performance test
by a speedup of 1.5 or higher. Even a speedup of 3 is achievable in CSIL!
How do your modifications relate to different programming languages? What implications and consequences does this have?
Some correctness tests have been provided to you within the c++/test/
directory
of the project. The tests make use of the APIs that were provided to you in
order to make sure you are not changing the APIs in ways that are incompatible
at the source level with existing usage. ABI compatibility is not a requirement.
You can run tests in a Linux shell by through the test
target of build system,
e.g. make test
. You can get more detailed results from the doctest framework
by running the individual test programs inside the out/bin/test/
subdirectory of
the build directory.
Note, these tests may not cover everything. You are responsible for guaranteeing that any changes you make are source compatible with the original, preserving the high level semantics.
Any changes you make to these tests will be removed during grading.
Additional performance tests have been provided to you within the c++/perf/
directory. These tests make use of the Google Benchmark framework in order to
perform automated microbenchmarking of your underlying tests.
Within your build directory, you can find the binaries for these tests within
out/bin/perf/
. You should use these binaries both for your investigation of
the causes of performance degradation and for checking whether or not you have
improved performance. Graded performance will come from the average latency
as reported by the tests, but adding new tests or running with additional
commandline options may be useful when investigating performance.
Any changes you make to these tests will be removed during grading.
When collecting the Google Benchmark results, you will see columns showing
(1) the time, (2) the number of iterations, and (3) the number of items per
second. These reflect helpful statistics and some of the internal process used
by the benchmarking framework. Google benchmark determines how many iterations
of a benchmark are required in order to collect meaningful measures
(in a non-statistical sense). This is reported as the number of iterations.
The reported time is the time per single iteration of the benchmark.
The main number of interest is the average iteration time,
indicated with the _mean
suffix in the benchmark column.
For additional information on antipatterns that can occur when using JMH for benchmarking, consider looking at this empirical study. While simple on the surface, benchmarking can be quite error prone.
As sometimes happens, you now have to deal with a familiar problem in a slightly different context. In Task 2, you worked to improve the performance of a basic task executor in C++. Now you have been provided a similar interface in Java, and you need to be able to speed it up as well.
You have been provided a simple OperationRunner
in
java/src/main/java/ca/sfu/cmpt745/ex02/
. The OperationRunner
takes in different
types of Operation
s through its constructor and allows the operations to be
executed later via its run()
method. There are again 3 different types of
Operation
s in the provided code.
You may assume that the order of operations does not matter
(they are commutative). However, any number of additional unknown operation kinds
could be added, and any modifications you make must be able to tolerate that.
You may make use of standard classes in java.lang.*
and java.util.*
but may not make the code parallel in any way.
You may modify only OperationRunner.java
.
For full marks, aim to improve the performance test
by a speedup of 2.2 or higher.
You can run the correctness tests for Java from the java/
directory using:
You can run the performance benchmarks from the java/
directory using:
The graded time is the average measured operation time (avgt) reported as the "Score" in JMH.
First, clean your Java build artifacts by running the following within the
java
subdirectory of the project:
Also make sure that no build artifacts from your C++ work are in the project directory. Create an archive of your solution by changing into the directory containing your project template and running:
You can then upload this archive to CourSys.
If your archive is too large, then you have likely included build artifacts in your submission, and you should remove them before archiving.