Navigation

  • index
  • next |
  • previous |
  • 135 Spring 2020 1 documentation »

Table Of Contents

  • Linear Search
    • Algorithms
    • Linear Search
    • The Problem Linear Search Solves
    • Pseudocode Solution 1
    • Pseudocode Solution 1
    • Pseudocode Solution 2
    • Pseudocode Solution 2
    • A Basic Implementation
    • Locating and Item: Another Version of Linear Search
    • A Faster General Purpose Linear Search?
    • Making Linear Search More Flexible
    • Recursive Linear Search
    • Linear Search Performance
    • Counting Instructions
    • Counting == in Linear Search
    • Counting == in Linear Search
    • Counting == in Linear Search
    • Average Case Performance
    • Average Case Performance
    • Extra: More Detailed Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case
    • Linear Search Average Case

Previous topic

The Max Function

Next topic

Linear Search

Quick search

Linear Search¶

Algorithms¶

intuitively, an algorithm is a precise step-by-step process

for example

  • the method you learned in elementary school for adding numbers is an algorithm
  • selection sort is a sorting algorithm that re-arranges the elements of a vector to be in order from smallest to biggest
  • Bresenham’s line-drawing algorithm draws a good-looking straight line between two pixels on a computer monitor
  • many neural networks are trained using an algorithm called backpropagation

programs can often be thought of as implementations of algorithms

in computer science, it is often useful to first think about algorithms “on paper”, and then implement them afterwards

by studying algorithms we can often understand useful things about them, such as:

  • whether or not they are correct, or for what sorts of inputs it works on
  • how much work it does (i.e. how fast it is)
  • how much memory it uses
  • how complicated it is
  • how secure it is
  • how much power does it use (this could be important on low-powered devices)

Linear Search¶

linear search is a fundamental algorithm

it is relatively simple, but has many applications

and appears in many different forms

here we will study it from a fairly abstract point of view

The Problem Linear Search Solves¶

Given \(v_0, v_1, \ldots, v_{n-1}\) and a target \(x\)

Find \(i\) such that \(v_i = x\)

If there is no such \(v_i\), return -1 (or somehow signal “not found”)

you can think of \(v_0, v_1, \ldots, v_{n-1}\) as being numbers, but they could also be strings, or vectors, or structures, or any data type for which == is defined

Pseudocode Solution 1¶

Does \(v_0 = x\)? If so, stop and return 0.

Does \(v_1 = x\)? If so, stop and return 1.

…

Does \(v_{n-1} = x\)? If so, stop and return \(n-1\).

Otherwise, stop and return -1.

Pseudocode Solution 1¶

this works very well in practice, and is what many programmers call linear search

however, it is overly specific

testing elements in the order \(v_0\) to \(v_{n-1}\) is not required

nor is it always best

  • for example, suppose you need to find the '.' in a file name
  • the dot is usually near the right-end of the name
  • and so it is usually quicker to search for the dot by starting at the right end and moving to the left

Pseudocode Solution 2¶

call \(I = \{0, 1, \ldots, n-1\}\) the index set of \(v\)

while \(I\) is not empty, do the following

choose and remove an index \(i\) from \(I\)

Does \(v_i = x\)? If so, return \(i\)

return -1 if you get here

Pseudocode Solution 2¶

this is a little more complex

but it highlights some of the flexibility the problem statement gives us

this version lets us do linear search in reverse order

or start in the middle and “wrap-around” to the start

or start in the middle and “spread out” towards the beginning and end at the same time

                            Forward Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
start +------------------------------->                                    |
      |                                                                    |
      +--------------------------------------------------------------------+



                           Reverse Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      |                                 <----------------------------------+ start
      |                                                                    |
      +--------------------------------------------------------------------+



                        Wrap-around Forward Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      +------------>          +------------------------------------------->+
      |                       start                                        |
      +--------------------------------------------------------------------+



                        Expanding Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      |      <------------------------+-------------------------->         |
      |                             start                                  |
      +--------------------------------------------------------------------+


                        Inwards Linear Search
      +--------------------------------------------------------------------+
      |                                                                    |
      +-------------------->                       <-----------------------+
      |start                                                          start|
      +--------------------------------------------------------------------+

A Basic Implementation¶

// Pre-condition:
//    none
// Post-condition:
//    Returns the smallest i >= 0 such that v[i] == x; or, if
//    x is not anywhere in v, then -1 is returned
int linear_search1(const vector<int>& v, int x) {
    for(int i = 0; i < v.size(); ++i) {
        if (x == v[i]) return i;
    }
    return -1;
}

we’ve specified this function carefully using a pre-condition and a post-condition

  • a pre-condition specifies what must be true before the function is called
  • a post-condition specifies what will be true after that function is called, assuming the pre-condition was true when you called it

if you call a function when it’s pre-condition is false the result is likely an error — the function may, or may not, do what you expect

good programming practice: use assert (or an if-statement) as the first line of a function to check if a pre-condition holds

  • this can help catch errors

Locating and Item: Another Version of Linear Search¶

suppose you know for a fact that x is in v, and you want to find where in v it is

you can easily answer that with linear_search1 … but there is a slightly faster version

// Pre-condition:
//    x is in v (i.e. there exists some i such that v[i] == x)
// Post-condition:
//    returns the smallest i >= 0 such that v[i] == x
int location_of(const vector<int>& v, int x) {
    int i = 0;
    while (v[i] != x) ++i;
    return i;
}

the pre-condition is very important here: if it’s not true, then the result could be an infinite loop

this version of linear search is a little faster because it does only one comparison, v[i] != x, each iteration

in contrast, linear_search1 does two comparisons per iteration: v[i] != x and i < v.size()

A Faster General Purpose Linear Search?¶

we can use the essential algorithm inside location_of to create a sometimes faster version of general purpose linear search (where x might not be in v)

the trick is to add x to the end of v, guaranteeing that x is in v

this x added to the end of v is called a sentinel value

then we can use location_of to find x; if it finds the x at the end, then we know x is not equal to any of \(v_0, v_1, \ldots, v_{n-1}\)

this works, and it is indeed a little faster in some cases than regular linear search!

  • however, on many modern computers the improvement may be so slight that it’s often not worth the extra effort
  • but on some older computers, or sometimes low-powered computers, it might make a noticeable difference

but it requires modifying your array/vector/list (or having an extra location at the end), which might not always be possible

// v is *not* const so that the end value can be temporarily
// modified by the algorithm.
int linear_search2(vector<int>& v, int x) {
  const int n = v.size();
  if (n == 0) return -1;
  if (n == 1) {
    if (v[0] == x) return 0;
    else return -1;
  }

  // n >= 2 at this point

  // first check if x is the last element
  if (v[n-1] == x) return n - 1;

  // at this point we know v[n-1] != x
  int last = v[n-1];          // save the last element
  v[n-1] = x;                 // make the last element equal to x
  int i = location_of(v, x);  // search for x
  v[n-1] = last;              // now put the true last element back
  if (i == n - 1) return -1;  // if the first x was the one at the end, x is not in v
  else return i;
}

Making Linear Search More Flexible¶

often we only want to search a sub-range of a vector instead of the entire vector itself

// linear search on the range [begin, end)
int linear_search3(const vector<int>& v, int x, int begin, int end) {
    for(int i = begin; i < end; ++i) {
        if (x == v[i]) return i;
    }
    return -1;
}


              begin                               end-1  end
  +----------+-----+-----------------------------+-----+-----+---------+
  |          |     |                             |     |     |         |
  +----------+-----+-----------------------------+-----+-----+---------+

a subtlety here is that the valid elements of the sub-vector go from begin to end-1 (not end!)

[begin, end) is the set of values {v[begin], v[begin + 1], v[begin + 2], …, v[end - 1]}

begin is the index of the first element of v we want to search

end is the index of one past the last element we want to search

thus, this searches the sub-range v[begin], [begin + 1] …, v[end - 1]

an advantage of this asymmetric range is that the number of elements in the range is exactly end - begin

plus, the C++ standard template library (STL) uses these kinds of asymmetric ranges

Recursive Linear Search¶

functions that call themselves are said to be recursive

linear search has a natural recursive implementation …

if you are searching for x in v, then either the first element of v is equal to x, or it’s not

if x equals the first element of v, then the algorithm is done; if the first element is not equal to v, then (recursively) do linear search on the rest of v, i.e. on every element of v except the first element

// recursive linear search on a range
int linear_search4(const vector<int>& v, int x, int begin, int end) {
    if (begin >= end) {          // base case: empty range
        return -1;
    } else if (v[begin] == x) {  // base case: x is at range's beginning
        return begin;
    } else {                     // recursive case
        return linear_search4(v, x, begin + 1, end);  // note the + 1
    }
}

performance-wise, recursive linear search is likely not as efficient as linear search implemented with a loop due to all the function calls it must do, and all the parameters it must save on the call stack

  • some compilers might be able to optimize-away the recursion in this case (but not every case!) using a trick called tail-call elimination

Linear Search Performance¶

how fast is linear search?

it depends in part on the speed of your computer

and your compiler (and what options it uses)

and the version of your operating system

and what else your computer might be doing at the same time

…

we avoid these complications by instead asking how many key instructions does linear search execute

this is independent of the particular software/hardware involved

Counting Instructions¶

we usually only count one particular key instruction

usually, the key instruction is the one that is executed the most

for linear search the key instruction is usually chosen to be ==

“how fast is linear search?” becomes “how many times is == executed?”

Counting == in Linear Search¶

the number of times == is executed depends upon x (the value being searched for) and the input vector

  • 0 times if the vector is empty
  • 1 time if the vector’s first element is x, or the size of the vector is 1 and x is not in it
  • 2 times if the vector’s second element is x, or the size of the vector is 2 and x is not in it

…

  • n times if the vector’s last element is x, or the size of the vector is n and x is not in it

(we are assuming the items in the vector are unique, i.e. no duplicates)

Counting == in Linear Search¶

in the best case, when v is empty, linear search does 0 comparisons

empty vectors are probably not searched very often in practice, so this result is not very useful

a slightly more useful best case is 1 comparison, when x is the first element of v

assuming randomly ordered unique numbers in v, if x is in v then there is a \(\frac{1}{n}\) chance x is the first element

thus, the best case does not occur very often and you shouldn’t count on it

Counting == in Linear Search¶

in practice, the worst case is often more helpful than the best case

hope for the best, but plan for the worst!

in the worst case, linear search does n comparisons (where n is the size of the vector)

this can happen in two different ways

  • x is the last element of v
  • or x is not in v at all

Average Case Performance¶

the best and worst cases are extremes

both may occur infrequently

so it is natural to ask what is the expected average number of comparisons done by linear search

an intuitive answer to this problem goes like this …

there are two cases to consider

case 1: x is in v

case 2: x is not in v

assuming each case occurs 50% of the time, the total number of comparisons will be \(\frac{1}{2}\) (# of comparisons in case 1) + \(\frac{1}{2}\) (# of comparisons in case 1)

so, how many comparisons are done in each case?

for case 1, it is reasonable to estimate that about \(\frac{n}{2}\) comparisons are done on average, e.g. sometimes x is near the beginning of the vector, sometimes it is near the end

assuming v is randomly ordered, then over multiple searches these cases balance out doing an average of about \(\frac{n}{2}\) comparisons per search

for case 2, when x is not in v, then n comparisons are always done

so the total number of comparisons done by linear search is \(\frac{1}{2}`(# of comparisons in case 1) + :math:\)frac{1}{2}`(# of comparisons in case 1)

which is \(\frac{1}{2}\cdot\frac{n}{2} + \frac{1}{2}n = \frac{n}{4} + \frac{2n}{4} = \frac{3n}{4}\)

so, assuming random data, this expression says linear search does \(\frac{3}{4}n\) comparisons on average

  • experiments show that this is a pretty accurate count of the number of comparisons that linear search actually does

one final comment: many programmers estimate the average case for linear search to instead be the simpler expression, \(\frac{n}{2}\)

  • but we get the answer \(\frac{3}{4}n\) because the worst-case of \(n\) comparisons occurs in two different ways: when \(x\) is the last element of vector, or when its not in the vector at all
  • so \(n\) comparisons occurs about twice as frequently as any other number of comparisons
  • in practice, especially when \(n\) is big, the difference between \(\frac{3}{4}n\) and \(\frac{n}{2}\) isn’t hugely significant
  • so \(\frac{n}{2}\) is a reasonable approximation

Average Case Performance¶

in general, the average-case performance of an algorithm can be quite tricky to determine accurately

usually, it’s necessary to make assumptions about the distribution of the data and probabilities of values

this topic is discussed in much more detail in some later CMPT courses


Extra: More Detailed Linear Search Average Case¶

here is another, more detailed, derivation of the number of comparisons done by linear search in the average case.

first, we need to make some assumptions

v (the vector being searched) contains n unique items (no duplicates)

the probability that x is in v is \(\frac{1}{2}\) (not necessarily realistic!)

if x is in v, then it has equal chance of being at any location, i.e. the chance that v[i] == x is \(\frac{1}{n}\) (again, not necessarily realistic!)

Linear Search Average Case¶

now we can say that the average number of comparisons is this

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}\) (# comparisons done when x is not in v)

Linear Search Average Case¶

when x is not in v, exactly n comparisons are done

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}n\)

Linear Search Average Case¶

how many comparisons are done when x is in v?

remember, we assumed there is a \(\frac{1}{n}\) chance of x at being in any particular location

if x is at index 0, 1 comparison is done

if x is at index 1, 2 comparisons are done

if x is at index 2, 3 comparisons are done

…

if x is at index i, i + 1 comparisons are done

…

if x is at index n - 1, n comparisons are done

Linear Search Average Case¶

thus

\(\frac{1}{n}\) of the time 1 comparison is done

\(\frac{1}{n}\) of the time 2 comparisons are done

\(\frac{1}{n}\) of the time 3 comparisons are done

…

\(\frac{1}{n}\) of the time n comparisons are done

Linear Search Average Case¶

the total amount of work done is the sum of each case

\(\frac{1}{n}\cdot 1 + \frac{1}{n}\cdot 2 + \ldots + \frac{1}{n}\cdot n\)

which simplifies to

\(\frac{1}{n}(1 + 2 + \ldots + n)\)

Linear Search Average Case¶

you should remember this fact

\(1 + 2 + \ldots + n = \frac{n(n+1)}{2}\)

Linear Search Average Case¶

the total number of comparisons done in the case when x is in v is

\(\frac{1}{n}(1 + 2 + \ldots + n)\)

which becomes

\(\frac{1}{n}(\frac{n(n+1)}{2})\)

which simplifies to

\(\frac{n+1}{2}\)

or

\(\frac{n}{2} + \frac{1}{2}\)

Linear Search Average Case¶

going back to the sum of all comparisons

\(\frac{1}{2}\) (# comparisons done when x is in v) + \(\frac{1}{2}n\)

this now becomes

\(\frac{1}{2}\frac{n+1}{2} + \frac{1}{2}n\)

or

\(\frac{n+1}{4} + \frac{2n}{4}\)

which is

\(\frac{3n+1}{4}\)

or

\(\frac{3n}{4} + \frac{1}{4}\)

Linear Search Average Case¶

so, assuming there’s a 50-50 chance x is in v

and, assuming, if x is in v it has an equal chance of being at any index

on average we would expect linear search to do \(\frac{3n}{4} + \frac{1}{4}\) calls to ==

ignoring the \(\frac{1}{4}\) is fine for practical applications, so we can say that linear search does about \(\frac{3n}{4}\) comparisons on average

Linear Search Average Case¶

\(\frac{3n}{4}\) is closer to the worst case than the best case

this is because half the time x is not in v and must do n comparisons to prove that

Navigation

  • index
  • next |
  • previous |
  • 135 Spring 2020 1 documentation »
© Copyright 2020, Toby Donaldson. Created using Sphinx 1.7.9.