Assignment 1: pthreads
Due Date : May 30th
The purpose of this assignment is to give you some general
experience in writing shared memory programs using the standard
pthreads library package. For more information on pthreads, you may
want to consult the pthreads man page and the tutorial from Lawrence
Livermore National Lab.
Part I
Create a program that launches t threads and have them share a counter. Have each thread increment the counter i times
- with no synchronization
- with pthread locks
- with TAS locks
- with TATAS locks
- with ticket locks
- with MCS locks
Try each option with varying numbers of threads, both greater and
fewer than the number of processors in the machine. Report final
counter values and execution times. Try any other tests that occur to
you. Explain your results (in writing).
To simplify testing of your code, please write your program to take
the number of threads t and the number of iterations i as command-line
arguments, specified with "-t t" and "-i i" (in either order). If the
arguments are not specified, use t = 4 and i = 10,000.
Be sure to include a README.pdf file that explains what you did and
what you learned. It should include your timing results and analysis
of the various locks. We will be grading the assignment on a roughly
equal mixture of completeness and correctness and quality of write-up.
Part II
Use the infrastructure you built for Part II to test concurrent queue
algorithms. Specifically, have each thread run a loop in which it
repeatedly enqueues data to, or dequeues from (with 50-50 probability),
a shared queue.
-
Implement the queue using a TAS lock with exponential backoff.
For backoff, simply have your thread spin some number of times around a
loop that reads a volatile variable (without the
volatile read, the compiler may "optimize" the
loop away: Wiki about volatile variables). Experiment with different values for the backoff base
(initial delay) and cap (maximum delay). Pick (and report in your
README file) values that seem to maximize throughput.
-
Implement the queue using a two lock solution from M&S two-lock
queue algorithm. [
Pseudo code ]
-
Implement the queue using an Michael and Scott non-blocking
algorithm [
Paper and pseudo code ]
As described in the linked sources above, be sure to use "counted
pointers" to avoid the ABA problem.
Also, to avoid "polluting" your results with the overhead of
memory management (the standard malloc and
free don't scale well), you should have each thread
pre-allocate its own supply of nodes, which it can keep on a private
list when they are not in the queue.
Finally, to avoid pollution from calls to the pseudo-random (the
standard rand isn't even thread-safe), you should
pre-generate enough random bits to drive the choice between enqueueing
and dequeueing for the entire test run.
Be sure to use a barrier (described in the notes below) to "synch
up" your threads after this setup is complete, and before starting
the timing run.
As in Part II, your README.pdf file should contain a detailed
description of the experiments you ran, graphs of the results, and
discussion that explains those results—why they look the
way they do.
Assignment Notes
- Pseudocode for locking algorithms can be found
here
- The Gnu C compiler provides a very flexible mechanism to insert
assembly language instructions (e.g. the various atomic primitives)
into your code. Unfortunately it’s a rather confusing mechanism, so
we’ve written the magic incantations for you: atomics_ops.h
- Pthreads are not part of the C standard library. To use them you must link in a separate library explicity. Add -lpthread to the end of your gcc command line. We strongly recommend that you create a Makefile for your assignment, even if you have only a few files. Something like the following is a good start:
CC = gcc # CFLAGS = -m32 -Wa -g
test: test.c $(CC) -o test $(CFLAGS) test.c -lpthread
test.s: test.c $(CC) -S -fverbose-asm $(CFLAGS) test.c
clean: rm test
This assumes that your test program is named test . The test.s
rule allows you to see the x86 assembly code you’re running, if
you’re curious, or if you need to debug any in-line assembler.
- 32 bit compilation. When encoding our atomic ops we assumed 32
bit mode and hence used instructions such as cmpxchg8b for
double-word swaps. Hence, the whole program needs to be compiled in
32 bit mode. This is achieved by passing
-m32 to the
CFLAGs . We also need to compile against the 32 bit
libstdc++ library. We have installed these libraries on our linux
machine. Use the following LDFLAGS
= -L/usr/lib32 .
-
To time programs, use the gethrtime() . Run your tests multiple times. See which results seem
repeatable, and which vary greatly from one run to the next. (And try
to explain why.) To gain the absolute maximum performance you may
have to resort to techniques such as binding a thread to
processor. Take a look at techniques such as
processor bind.
-
Submission Instruction
- Create separate folders for Part I and Part II
- We will be using portal.cs.sfu.ca for submission.
- Submit tar.gz of assignment. Name the gz. [Groupid].tar.gz
- Each group submits one copy of the assignment.
|
|