The Max Function¶
Introduction¶
Given \(v_0, v_1, \ldots, v_{n-1}\) with \(n > 0\)
Find \(i\) such that \(v_i \geq v_j\) for all \(j\) in \(0, 1, \ldots , n-1\)
in other words, find the largest element in \(v_0, v_1, \ldots, v_{n-1}\)
Basic Solution¶
\(k \leftarrow 0\) (\(\leftarrow\) is assignment)
if \(v_1 \geq v_k\) then \(k \leftarrow 1\)
if \(v_2 \geq v_k\) then \(k \leftarrow 2\)
if \(v_3 \geq v_k\) then \(k \leftarrow 3\)
…
if \(v_i \geq v_k\) then \(k \leftarrow i\)
…
if \(v_{n-1} \geq v_k\) then \(k \leftarrow n-1\)
return \(k\)
Basic Solution Performance¶
how many comparisons does it do?
exactly \(n-1\)
it does this in every case
so we say the number of comparisons it does is proportional to \(n\)
i.e. it’s linear
Can fewer comparisons be done?¶
no
any of \(v_0, \ldots, v_{n-1}\) could be the max
so we can’t skip any even one comparison
A C++ Implementation¶
// Pre-condition:
// v.size() > 0
// Post-condition:
// Returns an index value mi such that 0 <= mi < v.size()
// and v[mi] >= v[i] for 0 <= i < v.size().
int index_of_max(const vector<int>& v) {
if (v.empty()) cmpt::error("empty vector");
int mi = 0;
for(int i = 1; i < v.size(); i++) {
if (v[i] > v[mi]) {
mi = i;
}
}
return mi;
}
notice it’s easy to implement a max function like this:
int max(const vector<int>& v) {
return v[index_of_max(v)];
}
A Recursive Max¶
max has a natural recursive implementation:
- if v has 1 element, return the that 1 element (base case)
- otherwise, return the max of the first element and the rest of the vector
A Recursive Max¶
the begin
parameter tells us at what index v
starts at
it’s a trick for efficiently getting the “rest” of the array
int index_of_max_rec(const vector<int>& v, int begin) {
if (begin >= v.size()) cmpt::error("empty vector");
if (begin == v.size() - 1) { // only 1 element, so return it
return begin;
} else {
int mi = index_of_max_rec(v, begin + 1);
if (v[mi] > v[begin]) {
return mi;
} else {
return begin;
}
}
}
extra parameters are quite common in recursive functions
we add a single-parameter version for convenience
int index_of_max_rec(const vector<int>& v) {
return index_of_max_rec(v, 0);
}
A while-loop Implementation¶
this version uses a while-loop to make the individual statements more obvious
int index_of_max2(const vector<int>& v) {
if (v.empty()) cmpt::error("empty vector");
int mi = 0;
int i = 1;
while (i < v.size()) {
if (v[i] > v[mi]) {
mi = i;
}
i++;
}
return mi;
}
this version is useful when counting how many times each line of the function runs
Counting Lines¶
when index_of_max2(v)
is called, how many times is each line of
index_of_max2
called?
we assume that v
contains n
elements
the numbers in comments at the end of each line is the count of how many times that line executes
int index_of_max2(const vector<int>& v) {
if (v.empty()) cmpt::error("empty vector");
int mi = 0; // 1
int i = 1; // 1
while (i < v.size()) { // n
if (v[i] > v[mi]) { // n - 1
mi = i; // ???
}
i++; // n - 1
}
return mi; // 1
}
most of the line counts are not too hard to figure out
but the one with the ??? is not so easy!
Counting mi = i¶
how many times do we call mi = i
?
the least: 0 (when v[0]
is the max)
the most: n - 1 when v
is in descending sorted order
what about on average?
that’s tricky!
Counting mi = i¶
lets run some experiments and count how many times mi = i
is called
here’s some code that generates random vectors of length n
vector<int> rand_vec(int n) {
vector<int> result(n);
for(int i = 0; i < n; i++) {
result[i] = rand(); // rand() returns a random int
} // from 0 to RAND_MAX
return result;
}
the rand()
function is in <cmath>
Random Seeds¶
rand()
generates pseudo-random numbers using a formula and initial seed
to make it truly unpredictable we can set the initial seed based on the current time:
srand(time(NULL)); // srand sets the seed for rand()
// time comes from <ctime>
note that if you don’t set the seed, then the random numbers will be the same every time you run the program
having the same random numbers each time might be very useful for debugging!
Counting mi = i¶
int index_of_max_count(const vector<int>& v) {
if (v.empty()) cmpt::error("empty vector");
int mi = 0;
int i = 1;
int mi_assign_i_count = 0;
while (i < v.size()) {
if (v[i] > v[mi]) {
mi = i;
mi_assign_i_count++;
}
i++;
}
cout << "mi_assign_i_count = " << mi_assign_i_count << "\n";
return mi;
}
Counting mi = i¶
experimentation code
void test() {
const int n = 10;
cout << "n = " << n << "\n";
for(int i = 0; i < 10; i++) {
vector<int> v = rand_vec(n);
index_of_max_count(v);
}
}
Some results¶
n = 100: 4, 6, 3, 7, 5, 2 (mean = 4.5)
n = 1000: 7, 4, 8, 5, 3, 10 (mean = 6.17)
n = 10000: 8, 7, 5, 12, 12, 10 (mean = 9.0)
n = 100000: 8, 9, 9, 13, 14, 9 (mean = 10.33)
n = 1000000: 9, 11, 19, 14, 8, 12 (mean = 12.17)
note that the results of are based on random data, so if you were to repeat these trials you probably won’t get exactly the same values
but the mean should be around the same value
Some results¶
what’s surprising about these numbers is how small they are, even with very large vectors
they suggest that for large amounts of randomly-ordered data mi = i
is
executed very few times
this is not obvious from the algorithm!
An Estimate of the Correct Answer¶
here’s an informal argument about the average number of times mi = i
is
called on randomly ordered data
the probability that the 1st element is the largest of the first 1 is \(\frac{1}{1}\)
the probability that the 2nd element is the largest of the first 2 is \(\frac{1}{2}\)
the probability that the 3rd element is the largest of the first 3 is \(\frac{1}{3}\)
in general, the probability that the nth element is the largest of the first n is \(\frac{1}{n}\)
these probabilities are independent, and so the total expected number of times
mi = i
is called is
this the nth harmonic number, \(H_n\), which grows very slowly, e.g. the billionth harmonic number is only about 21 (!)
that means if you call the max function on a vector with a billion entries,
you would expect mi = i
to be called about 21 times
A Visualization¶
click to re-set the image!
This shows 490 lines of random length, corresponding to an array of 490 random numbers. The red dots mark the first lines that are taller than all the previous lines to their left, i.e. the line in the max function where mi is set. The surprising fact is that very few red dots occur!
note that this was created using JavaScript and the P5.js library — give it a try if you want an easy way to play around with graphics on a web page!