Linear Search¶
Introduction¶
linear search is a fundamental algorithm
it has many applications
and appears in many different forms
here we will study it from a somewhat abstract point of view
The Problem Linear Search Solves¶
Given \(v_0, v_1, \ldots, v_{n-1}\) and a target \(x\)
Find \(i\) such that \(v_i = x\)
If there is no such \(v_i\), return -1 (or somehow signal “not found”)
Pseudocode Solution 1¶
Does \(v_0 = x\)? If so, return 0.
Does \(v_1 = x\)? If so, return 1.
…
Does \(v_{n-1} = x\)? If so, return \(n-1\).
Otherwise, return -1.
Pseudocode Solution 1¶
this works fine, and is what many programmers call linear search
however, it is overly specific
testing elements in the order \(v_0\) to \(v_{n-1}\) is not required
nor is it always best
Pseudocode Solution 2¶
call \(I = \{0, 1, \ldots, n-1\}\) the index set of \(v\)
while \(I\) is not empty, do the following
choose and remove an index \(i\) from \(I\)
Does \(v_i = x\)? If so, return \(i\)
return -1 if you get here
Pseudocode Solution 2¶
this is a little more complex
but it highlights the some of the flexibility the problem statement gives us
we could do linear search in reverse order
or start in the middle and “wrap-around” to the start
or start in the middle and “spread out” towards the beginning and end at the same time
A Basic Solution¶
// Pre-condition:
// none
// Post-condition:
// Returns the smallest i >= 0 such that v[i] == x; or, if
// x is not anywhere in v, then -1 is returned
int linear_search1(const vector<int>& v, int x) {
for(int i = 0; i < v.size(); ++i) {
if (x == v[i]) return i;
}
return -1;
}
we’ve specified this function carefully using a pre-condition and a post-condition
a function’s pre-condition specifies what must be true before the function is called
it’s post-condition specifies what will be true after that function is called (assuming the pre-condition was true when you called it)
if you call a function when it’s pre-condition is false the result is almost always an error — the function may, or may not, do what you expect
many functions check that their pre-conditions are true before they run
Another Version of Linear Search¶
suppose you know for a fact that x
is in v
, and you want to find
where in v
it is
you can easily answer that with linear_search1
, but there is a slightly
faster version
// Pre-condition:
// x is in v (i.e. there exists some i such that v[i] == x)
// Post-condition:
// Returns the smallest i >= 0 such that v[i] == x
int location_of(const vector<int>& v, int x) {
int i = 0;
while (v[i] != x) ++i;
return i;
}
the pre-condition is very important here!
if it’s not true, then the result could be an infinite loop
this version of linear search is a little faster because it does only one
comparison, v[i] != x
, each iteration
in contrast, linear_search1
does two comparisons per iteration: v[i]
!= x
and i < v.size()
A Faster General Purpose Linear Search?¶
interestingly, we can use the essential algorithm inside location_of
to
speed-up the general purpose linear search (where x
might not be in v
)
the trick is to add x
to the end of v
, guaranteeing that x
is in
v
this x
added to the end of v
is called a sentinel value
then we can use location_of
to find x
; if it finds the x
at the
end, then we know x
is not equal to any of \(v_0, v_1, \ldots,
v_{n-1}\)
this works, and it is indeed faster in most cases than regular linear search!
but it requires having an extra location at the end your array/vector/list, which might not always be possible
Making Linear Search More Flexible¶
often we only want to search a sub-range of a vector instead of the entire vector itself
int linear_search4(const vector<int>& v, int x, int begin, int end) {
for(int i = begin; i < end; ++i) {
if (x == v[i]) return i;
}
return -1;
}
Making Linear Search More Flexible¶
begin
is the index of the first element of v
we want to search
end
is the index of one past the last element we want to search
thus, this searches the sub-range v[begin]
, [begin + 1]
…, v[end -
1]
Even More Flexibility¶
our linear search only works with vectors
we’d have to write another function if we want to perform linear search on an array or a string or in a file
C++’s STL library implements linear search like this
template<class InputIterator, class T>
InputIterator find (InputIterator first,
InputIterator last,
const T& val)
{
while (first != last) {
if (*first == val) return first;
++first;
}
return last;
}
Even More Flexibility¶
InputIterator
and T
are types that are instantiated at compile-time
InputIterator
is an iterator type
an iterator is a generalized version of a pointer
thus pointers and iterators are often interchangeable (as in this code)
all STL container classes have iterators defined for them
Even More Flexibility¶
the result is that find
works on vectors
and C-style arrays
and also on almost any other container that it makes sense to do linear search on
STL algorithms don’t know what data structure they are working on
all they know about are pointers that traverse the data structure
Even More Flexibility¶
the STL is extremely flexible
relatively easy to use
very high-performance
(most other algorithm libraries don’t provide all three of these properties)
A Different Approach to Flexibility¶
another approach to generalizing linear search is passing a boolean test function
linear_search(v, bool_fn)
returns the first index location i
in v
such that bool_fn(v[i])
is true
this is useful and powerful and supported by C++ (and functional programming languages)
but we’ll not take this approach here
Linear Search¶
in practice, the following two linear search functions are usefu
the first is very flexible
the second is easy to call for the common case of searching the entire vector
int linear_search(const vector<int>& v, int x, int begin, int end) {
for(int i = begin; i < end; ++i) {
if (x == v[i]) return i;
}
return -1;
}
int linear_search(const vector<int>& v, int x) {
return linear_search(v, x, 0, v.size());
}
Recursive Linear Search¶
functions that call themselves are said to be recursive
many useful functions have natural recursive implementations
you can think of linear search recursively as follows …
if you searching for x
in v
, then either the first element of
v
is equal to x
, or it’s not
if x
is the first element of v
, then the algorithm is done; if it’s
not, (recursively) do linear search on the rest of v
, i.e. on every
element of v
except the first element
int linear_search5(const vector<int>& v, int x, int begin, int end) {
if (begin >= end) { // base case: empty range
return -1;
} else if (v[begin] == x) { // base case: x is at range's beginning
return begin;
} else { // recursive case
return linear_search5(v, x, begin + 1, end); // note the +1
}
}
Recursive Linear Search¶
performance-wise recursive linear search is not good due to all the function calls it must do
some compilers might be able to optimize-away the recursion in this case (but not every case!) using a trick called tail-call elimination
Linear Search Performance¶
how fast is linear search?
it depends in part on the speed of your computer
and your compiler (and what options it uses)
and the version of your operating system
and what else your computer might be doing at the same time
…
Linear Search Performance¶
we avoid these complexities by asking how many statements does linear search execute
this is independent of the particular software/hardware involved
Counting Instructions¶
we usually only count one particular key instruction
usually, the key instruction is the one that is executed the most
Key Instructions¶
for linear search the key instruction is usually chosen to be ==
“how fast is linear search?” becomes “how many times is ==
executed?”
Counting == in Linear Search¶
the number of times ==
is executed depends upon the input
0 times if the vector is empty
1 time if the vector’s first element is x
, or the size of the vector is 1
and x
is not in it
2 times if the vector’s second element is x
, or the size of the vector is
2 and x
is not in it
…
n times if the vector’s last element is x
, or the size of the vector is n
and x
is not in it
(we are assuming the items in the vector are unique, i.e. it has no duplicates)
Counting == in Linear Search¶
in the best case, when v
is empty, linear search does 0 comparisons
empty vectors are probably not searched very often in practice, so this result is not very useful
Counting == in Linear Search¶
a more useful best case is 1 comparison, when x
is the first element of
v
assuming randomly ordered unique numbers in v
, if x
is in v
then
there is a \(\frac{1}{n}\) chance x
is the first element
thus, this best case does not occur very often
Counting == in Linear Search¶
in practice, the worst case is usually more important than the best case
hope for the best, but plan for the worst
in the worst case, linear search does n comparisons
this can happen in two different ways
the last element of v
is x
or x
is not in v
at all
this is often useful information
Average Case Performance¶
the best and worst cases are extremes
both typically occur infrequently
so it is natural to ask what is the average number comparisons done by linear search
in general, average-case performance of an algorithm can be quite tricky to determine accurately
usually, we need to make assumptions about the distribution of the data and probabilities of values
Linear Search Average Case¶
but for linear search it’s not too hard to calculate the average case performance accurately
first, we need to make some assumptions
v
contains n unique items (no duplicates)
the probability that x
is in v
is \(\frac{1}{2}\) (not necessarily
realistic!)
if x
is in v
, then has equal chance of being at any location, the
chance that v[i] == x
is \(\frac{1}{n}\) (again, not necessarily
realistic!)
Linear Search Average Case¶
now we can say that the average number of comparisons is this
\(\frac{1}{2}\) (# comparisons done when x
is in v)
+
\(\frac{1}{2}\) (# comparisons done when x
is not in v)
Linear Search Average Case¶
when x
is not in v
, exactly n comparisons are done
\(\frac{1}{2}\) (# comparisons done when x
is in v)
+
\(\frac{1}{2}n\)
Linear Search Average Case¶
how many comparisons are done when x
is in v
?
remember, we assumed there is a \(\frac{1}{n}\) chance of x
at being
in any particular location
if x
is at index 0, 1 comparison is done
if x
is at index 1, 2 comparisons are done
if x
is at index 2, 3 comparisons are done
…
if x
is at index i, i + 1 comparisons are done
…
if x
is at index n - 1, n comparisons are done
Linear Search Average Case¶
thus
\(\frac{1}{n}\) of the time 1 comparison is done
\(\frac{1}{n}\) of the time 2 comparisons are done
\(\frac{1}{n}\) of the time 3 comparisons are done
…
\(\frac{1}{n}\) of the time n comparisons are done
Linear Search Average Case¶
the total amount of work done is the sum of each case
\(\frac{1}{n}\cdot 1 + \frac{1}{n}\cdot 2 + \ldots + \frac{1}{n}\cdot n\)
which simplifies to
\(\frac{1}{n}(1 + 2 + \ldots + n)\)
Linear Search Average Case¶
the total number of comparisons done in the case when x
is in v
is
\(\frac{1}{n}(1 + 2 + \ldots + n)\)
which becomes
\(\frac{1}{n}(\frac{n(n+1)}{2})\)
which simplifies to
\(\frac{n+1}{2}\)
or
\(\frac{n}{2} + \frac{1}{2}\)
Linear Search Average Case¶
going back to the sum of all comparisons
\(\frac{1}{2}\) (# comparisons done when x
is in v)
+
\(\frac{1}{2}n\)
this now becomes
\(\frac{1}{2}\frac{n+1}{2} + \frac{1}{2}n\)
or
\(\frac{n+1}{4} + \frac{2n}{4}\)
which is
\(\frac{3n+1}{4}\)
or
\(\frac{3n}{4} + \frac{1}{4}\)
Linear Search Average Case¶
so, assuming there’s a 50-50 chance x
is in v
and, assuming, if x
is in v
it has an equal chance of being at any
index
on average we would expect linear search to do \(\frac{3n}{4} +
\frac{1}{4}\) calls to ==
ignoring the \(\frac{1}{4}\) is fine for practical applications, so we can say that linear search does about \(\frac{3n}{4}\) comparisons on average
Linear Search Average Case¶
\(\frac{3n}{4}\) is closer to the worst case than the best case
this is because half the time x
is not in v
and must do n
comparisons to prove that