The Max Function

Introduction

Given \(v_0, v_1, \ldots, v_{n-1}\) with \(n > 0\)

Find \(i\) such that \(v_i \geq v_j\) for all \(j\) in \(0, 1, \ldots , n-1\)

in other words, find the largest element in \(v_0, v_1, \ldots, v_{n-1}\)

Basic Solution

\(k \leftarrow 0\) (\(\leftarrow\) is assignment)

if \(v_1 \geq v_k\) then \(k \leftarrow 1\)

if \(v_2 \geq v_k\) then \(k \leftarrow 2\)

if \(v_3 \geq v_k\) then \(k \leftarrow 3\)

if \(v_i \geq v_k\) then \(k \leftarrow i\)

if \(v_{n-1} \geq v_k\) then \(k \leftarrow n-1\)

return \(k\)

Basic Solution Performance

how many comparisons does it do?

exactly \(n-1\)

it does this in every case

so we say the number of comparisons it does is proportional to \(n\)

i.e. it’s linear

Can fewer comparisons be done?

no

any of \(v_0, \ldots, v_{n-1}\) could be the max

so we can’t skip any even one comparison

A C++ Implementation

// Pre-condition:
//    v.size() > 0
// Post-condition:
//    Returns an index value mi such that 0 <= mi < v.size()
//    and v[mi] >= v[i] for 0 <= i < v.size().
int index_of_max(const vector<int>& v) {
    if (v.empty()) cmpt::error("empty vector");

    int mi = 0;
    for(int i = 1; i < v.size(); i++) {
        if (v[i] > v[mi]) {
            mi = i;
        }
    }
    return mi;
}

notice it’s easy to implement a max function like this:

int max(const vector<int>& v) {
    return v[index_of_max(v)];
}

A Recursive Max

max has a natural recursive implementation:

  • if v has 1 element, return the that 1 element (base case)
  • otherwise, return the max of the first element and the rest of the vector

A Recursive Max

the begin parameter tells us at what index v starts at

it’s a trick for efficiently getting the “rest” of the array

int index_of_max_rec(const vector<int>& v, int begin) {
    if (begin >= v.size()) cmpt::error("empty vector");

    if (begin == v.size() - 1) {  // only 1 element, so return it
        return begin;
    } else {
        int mi = index_of_max_rec(v, begin + 1);
        if (v[mi] > v[begin]) {
            return mi;
        } else {
            return begin;
        }
    }
}

extra parameters are quite common in recursive functions

we add a single-parameter version for convenience

int index_of_max_rec(const vector<int>& v) {
    return index_of_max_rec(v, 0);
}

A while-loop Implementation

this version uses a while-loop to make the individual statements more obvious

int index_of_max2(const vector<int>& v) {
    if (v.empty()) cmpt::error("empty vector");

    int mi = 0;
    int i = 1;
    while (i < v.size()) {
        if (v[i] > v[mi]) {
            mi = i;
        }
        i++;
    }
    return mi;
}

this version is useful when counting how many times each line of the function runs

Counting Lines

when index_of_max2(v) is called, how many times is each line of index_of_max2 called?

we assume that v contains n elements

the numbers in comments at the end of each line is the count of how many times that line executes

int index_of_max2(const vector<int>& v) {
    if (v.empty()) cmpt::error("empty vector");

    int mi = 0;              // 1
    int i = 1;               // 1
    while (i < v.size()) {   // n
        if (v[i] > v[mi]) {  // n - 1
            mi = i;          // ???
        }
        i++;                 // n - 1
    }
    return mi;               // 1
}

most of the line counts are not too hard to figure out

but the one with the ??? is not so easy!

Counting mi = i

how many times do we call mi = i?

the least: 0 (when v[0] is the max)

the most: n - 1 when v is in descending sorted order

what about on average?

that’s tricky!

Counting mi = i

lets run some experiments and count how many times mi = i is called

here’s some code that generates random vectors of length n

vector<int> rand_vec(int n) {
    vector<int> result(n);
    for(int i = 0; i < n; i++) {
        result[i] = rand();  // rand() returns a random int
    }                        // from 0 to RAND_MAX
    return result;
}

the rand() function is in <cmath>

Random Seeds

rand() generates pseudo-random numbers using a formula and initial seed

to make it truly unpredictable we can set the initial seed based on the current time:

srand(time(NULL));  // srand sets the seed for rand()
                    // time comes from <ctime>

note that if you don’t set the seed, then the random numbers will be the same every time you run the program

having the same random numbers each time might be very useful for debugging!

Counting mi = i

int index_of_max_count(const vector<int>& v) {
    if (v.empty()) cmpt::error("empty vector");

    int mi = 0;
    int i = 1;
    int mi_assign_i_count = 0;
    while (i < v.size()) {
        if (v[i] > v[mi]) {
            mi = i;
            mi_assign_i_count++;
        }
        i++;
    }
    cout << "mi_assign_i_count = " << mi_assign_i_count << "\n";
    return mi;
}

Counting mi = i

experimentation code

void test() {
    const int n = 10;
    cout << "n = " << n << "\n";
    for(int i = 0; i < 10; i++) {
        vector<int> v = rand_vec(n);
        index_of_max_count(v);
    }
}

Some results

n = 100: 4, 6, 3, 7, 5, 2 (mean = 4.5)

n = 1000: 7, 4, 8, 5, 3, 10 (mean = 6.17)

n = 10000: 8, 7, 5, 12, 12, 10 (mean = 9.0)

n = 100000: 8, 9, 9, 13, 14, 9 (mean = 10.33)

n = 1000000: 9, 11, 19, 14, 8, 12 (mean = 12.17)

note that the results of are based on random data, so if you were to repeat these trials you probably won’t get exactly the same values

but the mean should be around the same value

Some results

what’s surprising about these numbers is how small they are, even with very large vectors

they suggest that for large amounts of randomly-ordered data mi = i is executed very few times

this is not obvious from the algorithm!

An Estimate of the Correct Answer

here’s an informal argument about the average number of times mi = i is called on randomly ordered data

the probability that the 1st element is the largest of the first 1 is \(\frac{1}{1}\)

the probability that the 2nd element is the largest of the first 2 is \(\frac{1}{2}\)

the probability that the 3rd element is the largest of the first 3 is \(\frac{1}{3}\)

in general, the probability that the nth element is the largest of the first n is \(\frac{1}{n}\)

these probabilities are independent, and so the total expected number of times mi = i is called is

\[1 + \frac{1}{2} + \frac{1}{3} + \ldots + \frac{1}{n}\]

this the nth harmonic number, \(H_n\), which grows very slowly, e.g. the billionth harmonic number is only about 21 (!)

that means if you call the max function on a vector with a billion entries, you would expect mi = i to be called about 21 times

A Visualization

click to re-set the image!

This shows 490 lines of random length, corresponding to an array of 490 random numbers. The red dots mark the first lines that are taller than all the previous lines to their left, i.e. the line in the max function where mi is set. The surprising fact is that very few red dots occur!

note that this was created using JavaScript and the P5.js library — give it a try if you want an easy way to play around with graphics on a web page!