Linear Search¶

Introduction¶

linear search is a fundamental algorithm

it has many applications

and appears in many different forms

here we will study it from a somewhat abstract point of view

The Problem Linear Search Solves¶

Given \(v_0, v_1, \ldots, v_{n-1}\) and a target \(x\)

Find \(i\) such that \(v_i = x\)

If there is no such \(v_i\), return -1 (or somehow signal “not found”)

Pseudocode Solution 1¶

Does \(v_0 = x\)? If so, return 0.

Does \(v_1 = x\)? If so, return 1.

…

Does \(v_{n-1} = x\)? If so, return \(n-1\).

Otherwise, return -1.

Pseudocode Solution 1¶

this works fine, and is what many programmers call linear search

however, it is overly specific

testing elements in the order \(v_0\) to \(v_{n-1}\) is not required

nor is it always best

Pseudocode Solution 2¶

call \(I = \{0, 1, \ldots, n-1\}\) the index set of \(v\)

while \(I\) is not empty, do the following

choose and remove an index \(i\) from \(I\)

Does \(v_i = x\)? If so, return \(i\)

return -1 if you get here

Pseudocode Solution 2¶

this is a little more complex

but it highlights the some of the flexibility the problem statement gives us

we could do linear search in reverse order

or start in the middle and “wrap-around” to the start

or start in the middle and “spread out” towards the beginning and end at the same time

A Basic Solution¶

// Pre-condition:
//    none
// Post-condition:
//    Returns the smallest i >= 0 such that v[i] == x; or, if
//    x is not anywhere in v, then -1 is returned
int linear_search1(const vector<int>& v, int x) {
    for(int i = 0; i < v.size(); ++i) {
        if (x == v[i]) return i;
    }
    return -1;
}

we’ve specified this function carefully using a pre-condition and a post-condition

a function’s pre-condition specifies what must be true before the function is called

it’s post-condition specifies what will be true after that function is called (assuming the pre-condition was true when you called it)

if you call a function when it’s pre-condition is false the result is almost always an error — the function may, or may not, do what you expect

many functions check that their pre-conditions are true before they run

Another Version of Linear Search¶

suppose you know for a fact that x is in v, and you want to find where in v it is

you can easily answer that with linear_search1, but there is a slightly faster version

// Pre-condition:
//    x is in v (i.e. there exists some i such that v[i] == x)
// Post-condition:
//    Returns the smallest i >= 0 such that v[i] == x
int location_of(const vector<int>& v, int x) {
    int i = 0;
    while (v[i] != x) ++i;
    return i;
}

the pre-condition is very important here!

if it’s not true, then the result could be an infinite loop

this version of linear search is a little faster because it does only one comparison, v[i] != x, each iteration

in contrast, linear_search1 does two comparisons per iteration: v[i] != x and i < v.size()

A Faster General Purpose Linear Search?¶

interestingly, we can use the essential algorithm inside location_of to speed-up the general purpose linear search (where x might not be in v)

the trick is to add x to the end of v, guaranteeing that x is in v

this x added to the end of v is called a sentinel value

then we can use location_of to find x; if it finds the x at the end, then we know x is not equal to any of \(v_0, v_1, \ldots, v_{n-1}\)

this works, and it is indeed faster in most cases than regular linear search!

but it requires having an extra location at the end your array/vector/list, which might not always be possible

Making Linear Search More Flexible¶

often we only want to search a sub-range of a vector instead of the entire vector itself

int linear_search4(const vector<int>& v, int x, int begin, int end) {
    for(int i = begin; i < end; ++i) {
        if (x == v[i]) return i;
    }
    return -1;
}

Making Linear Search More Flexible¶

begin is the index of the first element of v we want to search

end is the index of one past the last element we want to search

thus, this searches the sub-range v[begin], [begin + 1] …, v[end - 1]

Even More Flexibility¶

our linear search only works with vectors

we’d have to write another function if we want to perform linear search on an array or a string or in a file

C++’s STL library implements linear search like this

template<class InputIterator, class T>
InputIterator find (InputIterator first,
                    InputIterator last,
                    const T& val)
{
  while (first != last) {
    if (*first == val) return first;
    ++first;
  }
  return last;
}

Even More Flexibility¶

InputIterator and T are types that are instantiated at compile-time

InputIterator is an iterator type

an iterator is a generalized version of a pointer

thus pointers and iterators are often interchangeable (as in this code)

all STL container classes have iterators defined for them

Even More Flexibility¶

the result is that find works on vectors and C-style arrays

and also on almost any other container that it makes sense to do linear search on

STL algorithms don’t know what data structure they are working on

all they know about are pointers that traverse the data structure

Even More Flexibility¶

the STL is extremely flexible

relatively easy to use

very high-performance

(most other algorithm libraries don’t provide all three of these properties)

A Different Approach to Flexibility¶

another approach to generalizing linear search is passing a boolean test function

linear_search(v, bool_fn) returns the first index location i in v such that bool_fn(v[i]) is true

this is useful and powerful and supported by C++ (and functional programming languages)

but we’ll not take this approach here

Linear Search¶

in practice, the following two linear search functions are usefu

the first is very flexible

the second is easy to call for the common case of searching the entire vector

int linear_search(const vector<int>& v, int x, int begin, int end) {
    for(int i = begin; i < end; ++i) {
        if (x == v[i]) return i;
    }
    return -1;
}

int linear_search(const vector<int>& v, int x) {
    return linear_search(v, x, 0, v.size());
}

Recursive Linear Search¶

functions that call themselves are said to be recursive

many useful functions have natural recursive implementations

you can think of linear search recursively as follows …

if you searching for x in v, then either the first element of v is equal to x, or it’s not

if x is the first element of v, then the algorithm is done; if it’s not, (recursively) do linear search on the rest of v, i.e. on every element of v except the first element

int linear_search5(const vector<int>& v, int x, int begin, int end) {
    if (begin >= end) {          // base case: empty range
        return -1;
    } else if (v[begin] == x) {  // base case: x is at range's beginning
        return begin;
    } else {                     // recursive case
        return linear_search5(v, x, begin + 1, end);  // note the +1
    }
}

Recursive Linear Search¶

performance-wise recursive linear search is not good due to all the function calls it must do

some compilers might be able to optimize-away the recursion in this case (but not every case!) using a trick called tail-call elimination

Linear Search Performance¶

how fast is linear search?

it depends in part on the speed of your computer

and your compiler (and what options it uses)

and the version of your operating system

and what else your computer might be doing at the same time

…

Linear Search Performance¶

we avoid these complexities by asking how many statements does linear search execute

this is independent of the particular software/hardware involved

Counting Instructions¶

we usually only count one particular key instruction

usually, the key instruction is the one that is executed the most

Key Instructions¶

for linear search the key instruction is usually chosen to be ==

“how fast is linear search?” becomes “how many times is == executed?”

Counting == in Linear Search¶

the number of times == is executed depends upon the input

0 times if the vector is empty

1 time if the vector’s first element is x, or the size of the vector is 1 and x is not in it

2 times if the vector’s second element is x, or the size of the vector is 2 and x is not in it

…

n times if the vector’s last element is x, or the size of the vector is n and x is not in it

(we are assuming the items in the vector are unique, i.e. it has no duplicates)

Counting == in Linear Search¶

in the best case, when v is empty, linear search does 0 comparisons

empty vectors are probably not searched very often in practice, so this result is not very useful

Counting == in Linear Search¶

a more useful best case is 1 comparison, when x is the first element of v

assuming randomly ordered unique numbers in v, if x is in v then there is a \(\frac{1}{n}\) chance x is the first element

thus, this best case does not occur very often

Counting == in Linear Search¶

in practice, the worst case is usually more important than the best case

hope for the best, but plan for the worst

in the worst case, linear search does n comparisons

this can happen in two different ways

the last element of v is x

or x is not in v at all

this is often useful information

Average Case Performance¶

the best and worst cases are extremes

both typically occur infrequently

so it is natural to ask what is the average number comparisons done by linear search

in general, average-case performance of an algorithm can be quite tricky to determine accurately

usually, we need to make assumptions about the distribution of the data and probabilities of values

Linear Search Average Case¶

but for linear search it’s not too hard to calculate the average case performance accurately

first, we need to make some assumptions

v contains n unique items (no duplicates)

the probability that x is in v is \(\frac{1}{2}\) (not necessarily realistic!)

if x is in v, then has equal chance of being at any location, the chance that v[i] == x is \(\frac{1}{n}\) (again, not necessarily realistic!)

Linear Search Average Case¶

now we can say that the average number of comparisons is this

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}\) (# comparisons done when x is not in v)

Linear Search Average Case¶

when x is not in v, exactly n comparisons are done

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}n\)

Linear Search Average Case¶

how many comparisons are done when x is in v?

remember, we assumed there is a \(\frac{1}{n}\) chance of x at being in any particular location

if x is at index 0, 1 comparison is done

if x is at index 1, 2 comparisons are done

if x is at index 2, 3 comparisons are done

…

if x is at index i, i + 1 comparisons are done

…

if x is at index n - 1, n comparisons are done

Linear Search Average Case¶

thus

\(\frac{1}{n}\) of the time 1 comparison is done

\(\frac{1}{n}\) of the time 2 comparisons are done

\(\frac{1}{n}\) of the time 3 comparisons are done

…

\(\frac{1}{n}\) of the time n comparisons are done

Linear Search Average Case¶

the total amount of work done is the sum of each case

\(\frac{1}{n}\cdot 1 + \frac{1}{n}\cdot 2 + \ldots + \frac{1}{n}\cdot n\)

which simplifies to

\(\frac{1}{n}(1 + 2 + \ldots + n)\)

Linear Search Average Case¶

you should remember this fact

\(1 + 2 + \ldots + n = \frac{n(n+1)}{2}\)

Linear Search Average Case¶

the total number of comparisons done in the case when x is in v is

\(\frac{1}{n}(1 + 2 + \ldots + n)\)

which becomes

\(\frac{1}{n}(\frac{n(n+1)}{2})\)

which simplifies to

\(\frac{n+1}{2}\)

or

\(\frac{n}{2} + \frac{1}{2}\)

Linear Search Average Case¶

going back to the sum of all comparisons

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}n\)

this now becomes

\(\frac{1}{2}\frac{n+1}{2} + \frac{1}{2}n\)

or

\(\frac{n+1}{4} + \frac{2n}{4}\)

which is

\(\frac{3n+1}{4}\)

or

\(\frac{3n}{4} + \frac{1}{4}\)

Linear Search Average Case¶

so, assuming there’s a 50-50 chance x is in v

and, assuming, if x is in v it has an equal chance of being at any index

on average we would expect linear search to do \(\frac{3n}{4} + \frac{1}{4}\) calls to ==

ignoring the \(\frac{1}{4}\) is fine for practical applications, so we can say that linear search does about \(\frac{3n}{4}\) comparisons on average

Linear Search Average Case¶

\(\frac{3n}{4}\) is closer to the worst case than the best case

this is because half the time x is not in v and must do n comparisons to prove that