Linear Search (in C)

Algorithms

intuitively, an algorithm is a precise step-by-step process

for example

  • the method you learned in elementary school for adding numbers is an algorithm
  • selection sort is a sorting algorithm that re-arranges the elements of an array to be in order from smallest to biggest
  • Bresenham’s line-drawing algorithm draws a good-looking straight line between two pixels on a computer monitor
  • many neural networks are trained using an algorithm called backpropagation

programs can often be thought of as implementations of algorithms

in computer science, it is often useful to first think about algorithms “on paper”, and then implement them afterwards

by studying algorithms we can often understand useful things about them, such as:

  • whether or not they are correct, or for what sorts of inputs it works on
  • how much work it does (i.e. how fast it is)
  • how much memory it uses
  • how complicated it is
  • how secure it is
  • how much power does it use (this could be important on low-powered devices)

The Problem Linear Search Solves

Given \(v_0, v_1, \ldots, v_{n-1}\) and a target \(x\)

Find \(i\) such that \(v_i = x\)

If there is no such \(v_i\), return -1 (or somehow signal “not found”)

you can think of \(v_0, v_1, \ldots, v_{n-1}\) as being numbers, but they could also be strings, or arrays, or structures, or any data type for which == is defined

Pseudocode Solution 1

Does \(v_0 = x\)? If so, stop and return 0.

Does \(v_1 = x\)? If so, stop and return 1.

Does \(v_{n-1} = x\)? If so, stop and return \(n-1\).

Otherwise, stop and return -1.

Pseudocode Solution 1

this works very well in practice, and is what many programmers call linear search

however, it is overly specific

testing elements in the order \(v_0\) to \(v_{n-1}\) is not required

nor is it always best

  • for example, suppose you need to find the '.' in a file name
  • the dot is usually near the right-end of the name
  • and so it is usually quicker to search for the dot by starting at the right end and moving to the left

Pseudocode Solution 2

call \(I = \{0, 1, \ldots, n-1\}\) the index set of \(v\)

while \(I\) is not empty, do the following

choose and remove an index \(i\) from \(I\)

Does \(v_i = x\)? If so, return \(i\)

return -1 if you get here

Pseudocode Solution 2

this is a little more complex

but it highlights some of the flexibility the problem statement gives us

this version lets us do linear search in reverse order

or start in the middle and “wrap-around” to the start

or start in the middle and “spread out” towards the beginning and end at the same time

                            Forward Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
start +------------------------------->                                    |
      |                                                                    |
      +--------------------------------------------------------------------+



                           Reverse Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      |                                 <----------------------------------+ start
      |                                                                    |
      +--------------------------------------------------------------------+



                        Wrap-around Forward Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      +------------>          +------------------------------------------->+
      |                       start                                        |
      +--------------------------------------------------------------------+



                        Expanding Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      |      <------------------------+-------------------------->         |
      |                             start                                  |
      +--------------------------------------------------------------------+


                        Inwards Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      +-------------------->                       <-----------------------+
      |start                                                          start|
      +--------------------------------------------------------------------+

A Basic Implementation

// Pre-condition:
//    none
// Post-condition:
//    Returns the smallest i >= 0 such that v[i] == x; or, if
//    x is not anywhere in v, then -1 is returned
int linear_search1(int* v, int n, int x) {
    for(int i = 0; i < n; ++i) {
        if (x == v[i]) {
          return i;
        }
    }
    return -1;
}

we’ve specified this function carefully using a pre-condition and a post-condition

  • a pre-condition specifies what must be true before the function is called
  • a post-condition specifies what will be true after that function is called, assuming the pre-condition was true when you called it

if you call a function when it’s pre-condition is false the result is likely an error — the function may, or may not, do what you expect

good programming practice: use assert (or an if-statement) as the first line of a function to check if a pre-condition holds

  • this can help catch errors

Making Linear Search More Flexible

often we only want to search a sub-range of an array instead of the entire array itself

// linear search on the range [begin, end)
int linear_search3(int* v, int begin, int end, int x) {
    for(int i = begin; i < end; ++i) {
        if (x == v[i]) return i;
    }
    return -1;
}


              begin                               end-1  end
  +----------+-----+-----------------------------+-----+-----+---------+
  |          |     |                             |     |     |         |
  +----------+-----+-----------------------------+-----+-----+---------+

a subtlety here is that the valid elements of the sub-array go from begin to end-1 (not end!)

[begin, end) is the set of values {v[begin], v[begin + 1], v[begin + 2], …, v[end - 1]}

begin is the index of the first element of v we want to search

end is the index of one past the last element we want to search

thus, this searches the sub-range v[begin], [begin + 1] …, v[end - 1]

an advantage of this asymmetric range is that the number of elements in the range is exactly end - begin

plus, the C++ standard template library (STL) uses these kinds of asymmetric ranges

Linear Search Performance

how fast is linear search?

it depends in part on the speed of your computer

and your compiler (and what options it uses)

and the version of your operating system

and what else your computer might be doing at the same time

we avoid these complications by instead asking how many key instructions does linear search execute

this is independent of the particular software/hardware involved

Counting Instructions

we usually only count one particular key instruction

usually, the key instruction is the one that is executed the most

for linear search the key instruction is usually chosen to be ==

“how fast is linear search?” becomes “how many times is == executed?”

Counting == in Linear Search

in the best case, when v is empty, linear search does 0 comparisons

empty arrays are probably not searched very often in practice, so this result is not very useful

a slightly more useful best case is 1 comparison, when x is the first element of v

assuming randomly ordered unique numbers in v, if x is in v then there is a \(\frac{1}{n}\) chance x is the first element

thus, the best case does not occur very often and you shouldn’t count on it

Counting == in Linear Search

in practice, the worst case is often more helpful than the best case

hope for the best, but plan for the worst!

in the worst case, linear search does n comparisons (where n is the size of the array)

this can happen in two different ways

  • x is the last element of v
  • or x is not in v at all

Average Case Performance

the best and worst cases are extremes

both may occur infrequently

so it is natural to ask what is the expected average number of comparisons done by linear search

an intuitive answer to this problem goes like this …

there are two cases to consider

case 1: x is in v

case 2: x is not in v

assuming each case occurs 50% of the time, the total number of comparisons will be \(\frac{1}{2}\) (# of comparisons in case 1) + \(\frac{1}{2}\) (# of comparisons in case 1)

so, how many comparisons are done in each case?

for case 1, it is reasonable to estimate that about \(\frac{n}{2}\) comparisons are done on average, e.g. sometimes x is near the beginning of the array, sometimes it is near the end

assuming v is randomly ordered, then over multiple searches these cases balance out doing an average of about \(\frac{n}{2}\) comparisons per search

for case 2, when x is not in v, then n comparisons are always done

so the total number of comparisons done by linear search is \(\frac{1}{2}\) (# of comparisons in case 1) + \(\frac{1}{2}\) (# of comparisons in case 2)

which is \(\frac{1}{2}\cdot\frac{n}{2} + \frac{1}{2}n = \frac{n}{4} + \frac{2n}{4} = \frac{3n}{4}\)

so, assuming random data, this expression says linear search does \(\frac{3}{4}n\) comparisons on average

  • experiments show that this is a pretty accurate count of the number of comparisons that linear search actually does

one final comment: many programmers estimate the average case for linear search to instead be the simpler expression, \(\frac{n}{2}\)

  • but we get the answer \(\frac{3}{4}n\) because the worst-case of \(n\) comparisons occurs in two different ways: when \(x\) is the last element of the array, or when its not in the array at all
  • so \(n\) comparisons occurs about twice as frequently as any other number of comparisons
  • in practice, especially when \(n\) is big, the difference between \(\frac{3}{4}n\) and \(\frac{n}{2}\) isn’t hugely significant
  • so \(\frac{n}{2}\) is a reasonable approximation

Average Case Performance

in general, the average-case performance of an algorithm can be quite tricky to determine accurately

usually, it’s necessary to make assumptions about the distribution of the data and probabilities of values

this topic is discussed in much more detail in some later CMPT courses


Extra: More Detailed Linear Search Average Case

here is another, more detailed, derivation of the number of comparisons done by linear search in the average case.

first, we need to make some assumptions

v (the array being searched) contains n unique items (no duplicates)

the probability that x is in v is \(\frac{1}{2}\) (not necessarily realistic!)

if x is in v, then it has equal chance of being at any location, i.e. the chance that v[i] == x is \(\frac{1}{n}\) (again, not necessarily realistic!)

Linear Search Average Case

now we can say that the average number of comparisons is this

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}\) (# comparisons done when x is not in v)

Linear Search Average Case

when x is not in v, exactly n comparisons are done

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}n\)

Linear Search Average Case

how many comparisons are done when x is in v?

remember, we assumed there is a \(\frac{1}{n}\) chance of x at being in any particular location

if x is at index 0, 1 comparison is done

if x is at index 1, 2 comparisons are done

if x is at index 2, 3 comparisons are done

if x is at index i, i + 1 comparisons are done

if x is at index n - 1, n comparisons are done

Linear Search Average Case

thus

\(\frac{1}{n}\) of the time 1 comparison is done

\(\frac{1}{n}\) of the time 2 comparisons are done

\(\frac{1}{n}\) of the time 3 comparisons are done

\(\frac{1}{n}\) of the time n comparisons are done

Linear Search Average Case

the total amount of work done is the sum of each case

\(\frac{1}{n}\cdot 1 + \frac{1}{n}\cdot 2 + \ldots + \frac{1}{n}\cdot n\)

which simplifies to

\(\frac{1}{n}(1 + 2 + \ldots + n)\)

Linear Search Average Case

you should remember this fact

\(1 + 2 + \ldots + n = \frac{n(n+1)}{2}\)

Linear Search Average Case

the total number of comparisons done in the case when x is in v is

\(\frac{1}{n}(1 + 2 + \ldots + n)\)

which becomes

\(\frac{1}{n}(\frac{n(n+1)}{2})\)

which simplifies to

\(\frac{n+1}{2}\)

or

\(\frac{n}{2} + \frac{1}{2}\)

Linear Search Average Case

going back to the sum of all comparisons

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}n\)

this now becomes

\(\frac{1}{2}\frac{n+1}{2} + \frac{1}{2}n\)

or

\(\frac{n+1}{4} + \frac{2n}{4}\)

which is

\(\frac{3n+1}{4}\)

or

\(\frac{3n}{4} + \frac{1}{4}\)

Linear Search Average Case

so, assuming there’s a 50-50 chance x is in v

and, assuming, if x is in v it has an equal chance of being at any index

on average we would expect linear search to do \(\frac{3n}{4} + \frac{1}{4}\) calls to ==

ignoring the \(\frac{1}{4}\) is fine for practical applications, so we can say that linear search does about \(\frac{3n}{4}\) comparisons on average

Linear Search Average Case

\(\frac{3n}{4}\) is closer to the worst case than the best case

this is because half the time x is not in v and must do n comparisons to prove that

Source Code

// linearSearch.c

#include <stdio.h>
#include <assert.h>

// Prints the given array C-style, e.g. { 3, 1, -4 }.
void print_array(int* v, int n) {
    assert(n >= 0);
    if (n == 0) {
        printf("{}");
    } else if (n == 1) {
        printf("{ %d }", v[0]);
    } else { // n >= 2
        printf("{ %d", v[0]);
        for(int i = 1; i < n; i++) {
            printf(", %d", v[i]);
        }
        printf(" }");
    }
}

// Same as print_array, but prints a newline at the end.
void println_array(int* arr, int n) {
    print_array(arr, n);
    printf("\n");
}

// Pre-condition:
//    none
// Post-condition:
//    Returns the smallest i >= 0 such that v[i] == x; or, if
//    x is not anywhere in v, then -1 is returned
int linear_search1(int* v, int n, int x) {
    for(int i = 0; i < n; ++i) {
        if (x == v[i]) {
            return i;
        }
    }
    return -1;
}

// Pre-condition:
//    x is in v (i.e. there exists some i such that v[i] == x)
// Post-condition:
//    returns the smallest i >= 0 such that v[i] == x
// Note:
//    No need to pass the length of array v since x is in the array
//    and the loop will stop when it finds it.
int location_of(int* v, int x) {
    int i = 0;
    while (v[i] != x) {
        ++i;
    }
    return i;
}

// v is *not* const so that the end value can be temporarily
// modified by the algorithm.
int linear_search2(int* v, int n, int x) {
  if (n == 0) return -1;
  if (n == 1) {
    if (v[0] == x) return 0;
    else return -1;
  }

  // n >= 2 at this point

  // first check if x is the last element
  if (v[n-1] == x) return n - 1;

  // at this point we know v[n-1] != x
  int last = v[n-1];          // save the last element
  v[n-1] = x;                 // make the last element equal to x
  int i = location_of(v, x);  // search for x
  v[n-1] = last;              // now put the true last element back
  if (i == n - 1) return -1;  // if the first x was the one at the end, x is not in v
  else return i;
}

// linear search on the range [begin, end)
int linear_search3(int* v, int begin, int end, int x) {
    for(int i = begin; i < end; ++i) {
        if (x == v[i]) return i;
    }
    return -1;
}

// recursive linear search on a range
int linear_search4_help(int* v, int begin, int end, int x) {
    // printf("linear_search4_help(v, %d, %d, %d) ...\n", begin, end, x);
    if (begin >= end) {          // base case: empty range
        return -1;
    } else if (v[begin] == x) {  // base case: x is at range's beginning
        return begin;
    } else {                     // recursive case
        return linear_search4_help(v, begin + 1, end, x);  // note the + 1
    }
}

int linear_search4(int* v, int n, int x) {
    return linear_search4_help(v, 0, n, x);
}

void test1() {
    const int n = 4;
    int a[] = {4, 1, 9, 5};
    println_array(a, n);
    assert(linear_search1(a, n, 4) == 0);
    assert(linear_search1(a, n, 1) == 1);
    assert(linear_search1(a, n, 9) == 2);
    assert(linear_search1(a, n, 5) == 3);

    assert(linear_search1(a, n, 2) == -1);
    assert(linear_search1(a, n, 10) == -1);
    printf("all linear_search1 tests passed\n");
}

void test2() {
    const int n = 4;
    int a[] = {4, 1, 9, 5};
    println_array(a, n);
    assert(linear_search2(a, n, 4) == 0);
    assert(linear_search2(a, n, 1) == 1);
    assert(linear_search2(a, n, 9) == 2);
    assert(linear_search2(a, n, 5) == 3);

    assert(linear_search2(a, n, 2) == -1);
    assert(linear_search2(a, n, 10) == -1);
    printf("all linear_search2 tests passed\n");
}

void test4() {
    const int n = 4;
    int a[] = {4, 1, 9, 5};
    println_array(a, n);
    assert(linear_search4(a, n, 4) == 0);
    assert(linear_search4(a, n, 1) == 1);
    assert(linear_search4(a, n, 9) == 2);
    assert(linear_search4(a, n, 5) == 3);

    assert(linear_search4(a, n, 2) == -1);
    assert(linear_search4(a, n, 10) == -1);
    printf("all linear_search4 tests passed\n");
}

int main() {
    test1();
    test2();
    test4();
}