Linear Search (in C)¶
Algorithms¶
intuitively, an algorithm is a precise step-by-step process
for example
- the method you learned in elementary school for adding numbers is an algorithm
- selection sort is a sorting algorithm that re-arranges the elements of an array to be in order from smallest to biggest
- Bresenham’s line-drawing algorithm draws a good-looking straight line between two pixels on a computer monitor
- many neural networks are trained using an algorithm called backpropagation
programs can often be thought of as implementations of algorithms
in computer science, it is often useful to first think about algorithms “on paper”, and then implement them afterwards
by studying algorithms we can often understand useful things about them, such as:
- whether or not they are correct, or for what sorts of inputs it works on
- how much work it does (i.e. how fast it is)
- how much memory it uses
- how complicated it is
- how secure it is
- how much power does it use (this could be important on low-powered devices)
Linear Search¶
linear search is a fundamental algorithm
it is relatively simple, but has many applications
and appears in many different forms
here we will study it from a fairly abstract point of view
The Problem Linear Search Solves¶
Given \(v_0, v_1, \ldots, v_{n-1}\) and a target \(x\)
Find \(i\) such that \(v_i = x\)
If there is no such \(v_i\), return -1 (or somehow signal “not found”)
you can think of \(v_0, v_1, \ldots, v_{n-1}\) as being numbers, but they
could also be strings, or arrays, or structures, or any data type for which
==
is defined
Pseudocode Solution 1¶
Does \(v_0 = x\)? If so, stop and return 0.
Does \(v_1 = x\)? If so, stop and return 1.
…
Does \(v_{n-1} = x\)? If so, stop and return \(n-1\).
Otherwise, stop and return -1.
Pseudocode Solution 1¶
this works very well in practice, and is what many programmers call linear search
however, it is overly specific
testing elements in the order \(v_0\) to \(v_{n-1}\) is not required
nor is it always best
- for example, suppose you need to find the
'.'
in a file name - the dot is usually near the right-end of the name
- and so it is usually quicker to search for the dot by starting at the right end and moving to the left
Pseudocode Solution 2¶
call \(I = \{0, 1, \ldots, n-1\}\) the index set of \(v\)
while \(I\) is not empty, do the following
choose and remove an index \(i\) from \(I\)
Does \(v_i = x\)? If so, return \(i\)
return -1 if you get here
Pseudocode Solution 2¶
this is a little more complex
but it highlights some of the flexibility the problem statement gives us
this version lets us do linear search in reverse order
or start in the middle and “wrap-around” to the start
or start in the middle and “spread out” towards the beginning and end at the same time
Forward Linear Search
+--------------------------------------------------------------------+
| |
start +-------------------------------> |
| |
+--------------------------------------------------------------------+
Reverse Linear Search
+--------------------------------------------------------------------+
| |
| <----------------------------------+ start
| |
+--------------------------------------------------------------------+
Wrap-around Forward Linear Search
+--------------------------------------------------------------------+
| |
+------------> +------------------------------------------->+
| start |
+--------------------------------------------------------------------+
Expanding Linear Search
+--------------------------------------------------------------------+
| |
| <------------------------+--------------------------> |
| start |
+--------------------------------------------------------------------+
Inwards Linear Search
+--------------------------------------------------------------------+
| |
+--------------------> <-----------------------+
|start start|
+--------------------------------------------------------------------+
A Basic Implementation¶
// Pre-condition:
// none
// Post-condition:
// Returns the smallest i >= 0 such that v[i] == x; or, if
// x is not anywhere in v, then -1 is returned
int linear_search1(int* v, int n, int x) {
for(int i = 0; i < n; ++i) {
if (x == v[i]) {
return i;
}
}
return -1;
}
we’ve specified this function carefully using a pre-condition and a post-condition
- a pre-condition specifies what must be true before the function is called
- a post-condition specifies what will be true after that function is called, assuming the pre-condition was true when you called it
if you call a function when it’s pre-condition is false the result is likely an error — the function may, or may not, do what you expect
good programming practice: use assert
(or an if-statement) as the
first line of a function to check if a pre-condition holds
- this can help catch errors
Locating and Item: Another Version of Linear Search¶
suppose you know for a fact that x
is in v
, and you want to find
where in v
it is
you can easily answer that with linear_search1
… but there is a slightly
faster version
// Pre-condition:
// x is in v (i.e. there exists some i such that v[i] == x)
// Post-condition:
// returns the smallest i >= 0 such that v[i] == x
// Note:
// No need to pass the length of array v since x is in the array
// and the loop will stop when it finds it.
int location_of(int* v, int x) {
int i = 0;
while (v[i] != x) {
++i;
}
return i;
}
the pre-condition is very important here: if it’s not true, then the result could be an infinite loop
this version of linear search is a little faster because it does only one
comparison, v[i] != x
, each iteration
in contrast, linear_search1
does two comparisons per iteration: v[i]
!= x
and i < v.size()
A Faster General Purpose Linear Search?¶
we can use the essential algorithm inside location_of
to create a
sometimes faster version of general purpose linear search (where x
might
not be in v
)
the trick is to add x
to the end of v
, guaranteeing that x
is in
v
this x
added to the end of v
is called a sentinel value
then we can use location_of
to find x
; if it finds the x
at the
end, then we know x
is not equal to any of \(v_0, v_1, \ldots,
v_{n-1}\)
this works, and it is indeed a little faster in some cases than regular linear search!
- however, on many modern computers the improvement may be so slight that it’s often not worth the extra effort
- but on some older computers, or sometimes low-powered computers, it might make a noticeable difference
but it requires modifying your array/vector/list (or having an extra location at the end), which might not always be possible
int linear_search2(int* v, int n, int x) {
if (n == 0) return -1;
if (n == 1) {
if (v[0] == x) return 0;
else return -1;
}
// n >= 2 at this point
// first check if x is the last element
if (v[n-1] == x) return n - 1;
// at this point we know v[n-1] != x
int last = v[n-1]; // save the last element
v[n-1] = x; // make the last element equal to x
int i = location_of(v, x); // search for x
v[n-1] = last; // now put the true last element back
if (i == n - 1) return -1; // if the first x was the one at the end, x is not in v
else return i;
}
Making Linear Search More Flexible¶
often we only want to search a sub-range of an array instead of the entire array itself
// linear search on the range [begin, end)
int linear_search3(int* v, int begin, int end, int x) {
for(int i = begin; i < end; ++i) {
if (x == v[i]) return i;
}
return -1;
}
begin end-1 end
+----------+-----+-----------------------------+-----+-----+---------+
| | | | | | |
+----------+-----+-----------------------------+-----+-----+---------+
a subtlety here is that the valid elements of the sub-array go from begin
to end-1
(not end
!)
[begin, end)
is the set of values {v[begin]
, v[begin + 1]
,
v[begin + 2]
, …, v[end - 1]
}
begin
is the index of the first element of v
we want to search
end
is the index of one past the last element we want to search
thus, this searches the sub-range v[begin]
, [begin + 1]
…, v[end -
1]
an advantage of this asymmetric range is that the number of elements in the
range is exactly end
- begin
plus, the C++ standard template library (STL) uses these kinds of asymmetric ranges
Recursive Linear Search¶
functions that call themselves are said to be recursive
linear search has a natural recursive implementation …
if you are searching for x
in v
, then either the first element of
v
is equal to x
, or it’s not
if x
equals the first element of v
, then the algorithm is done; if the
first element is not equal to v
, then (recursively) do linear search on
the rest of v
, i.e. on every element of v
except the first element
// recursive linear search on a range
int linear_search4_help(int* v, int begin, int end, int x) {
// printf("linear_search4_help(v, %d, %d, %d) ...\n", begin, end, x);
if (begin >= end) { // base case: empty range
return -1;
} else if (v[begin] == x) { // base case: x is at range's beginning
return begin;
} else { // recursive case
return linear_search4_help(v, begin + 1, end, x); // note the + 1
}
}
int linear_search4(int* v, int n, int x) {
return linear_search4_help(v, 0, n, x);
}
performance-wise, recursive linear search is likely not as efficient as linear search implemented with a loop due to all the function calls it must do, and all the parameters it must save on the call stack
- some compilers might be able to optimize-away the recursion in this case (but not every case!) using a trick called tail-call elimination
Linear Search Performance¶
how fast is linear search?
it depends in part on the speed of your computer
and your compiler (and what options it uses)
and the version of your operating system
and what else your computer might be doing at the same time
…
we avoid these complications by instead asking how many key instructions does linear search execute
this is independent of the particular software/hardware involved
Counting Instructions¶
we usually only count one particular key instruction
usually, the key instruction is the one that is executed the most
for linear search the key instruction is usually chosen to be ==
“how fast is linear search?” becomes “how many times is ==
executed?”
Counting == in Linear Search¶
the number of times ==
is executed depends upon x
(the value being
searched for) and the input array
- 0 times if the array is empty
- 1 time if the array’s first element is
x
, or the size of the array is 1 andx
is not in it - 2 times if the array’s second element is
x
, or the size of the array is 2 andx
is not in it
…
- n times if the array’s last element is
x
, or the size of the array is n andx
is not in it
(we are assuming the items in the array are unique, i.e. no duplicates)
Counting == in Linear Search¶
in the best case, when v
is empty, linear search does 0 comparisons
empty arrays are probably not searched very often in practice, so this result is not very useful
a slightly more useful best case is 1 comparison, when x
is the first
element of v
assuming randomly ordered unique numbers in v
, if x
is in v
then
there is a \(\frac{1}{n}\) chance x
is the first element
thus, the best case does not occur very often and you shouldn’t count on it
Counting == in Linear Search¶
in practice, the worst case is often more helpful than the best case
hope for the best, but plan for the worst!
in the worst case, linear search does n comparisons (where n is the size of the array)
this can happen in two different ways
x
is the last element ofv
- or
x
is not inv
at all
Average Case Performance¶
the best and worst cases are extremes
both may occur infrequently
so it is natural to ask what is the expected average number of comparisons done by linear search
an intuitive answer to this problem goes like this …
there are two cases to consider
case 1: x
is in v
case 2: x
is not in v
assuming each case occurs 50% of the time, the total number of comparisons will be \(\frac{1}{2}\) (# of comparisons in case 1) + \(\frac{1}{2}\) (# of comparisons in case 1)
so, how many comparisons are done in each case?
for case 1, it is reasonable to estimate that about \(\frac{n}{2}\)
comparisons are done on average, e.g. sometimes x
is near the beginning of
the array, sometimes it is near the end
assuming v
is randomly ordered, then over multiple searches these cases
balance out doing an average of about \(\frac{n}{2}\) comparisons per
search
for case 2, when x
is not in v
, then n
comparisons are always done
so the total number of comparisons done by linear search is \(\frac{1}{2}\) (# of comparisons in case 1) + \(\frac{1}{2}\) (# of comparisons in case 2)
which is \(\frac{1}{2}\cdot\frac{n}{2} + \frac{1}{2}n = \frac{n}{4} + \frac{2n}{4} = \frac{3n}{4}\)
so, assuming random data, this expression says linear search does \(\frac{3}{4}n\) comparisons on average
- experiments show that this is a pretty accurate count of the number of comparisons that linear search actually does
one final comment: many programmers estimate the average case for linear search to instead be the simpler expression, \(\frac{n}{2}\)
- but we get the answer \(\frac{3}{4}n\) because the worst-case of \(n\) comparisons occurs in two different ways: when \(x\) is the last element of the array, or when its not in the array at all
- so \(n\) comparisons occurs about twice as frequently as any other number of comparisons
- in practice, especially when \(n\) is big, the difference between \(\frac{3}{4}n\) and \(\frac{n}{2}\) isn’t hugely significant
- so \(\frac{n}{2}\) is a reasonable approximation
Average Case Performance¶
in general, the average-case performance of an algorithm can be quite tricky to determine accurately
usually, it’s necessary to make assumptions about the distribution of the data and probabilities of values
this topic is discussed in much more detail in some later CMPT courses
Extra: More Detailed Linear Search Average Case¶
here is another, more detailed, derivation of the number of comparisons done by linear search in the average case.
first, we need to make some assumptions
v
(the array being searched) contains n unique items (no duplicates)
the probability that x
is in v
is \(\frac{1}{2}\) (not necessarily
realistic!)
if x
is in v
, then it has equal chance of being at any location, i.e.
the chance that v[i] == x
is \(\frac{1}{n}\) (again, not necessarily
realistic!)
Linear Search Average Case¶
now we can say that the average number of comparisons is this
\(\frac{1}{2}\) (# comparisons done when x
is in v)
+
\(\frac{1}{2}\) (# comparisons done when x
is not in v)
Linear Search Average Case¶
when x
is not in v
, exactly n comparisons are done
\(\frac{1}{2}\) (# comparisons done when x
is in v)
+
\(\frac{1}{2}n\)
Linear Search Average Case¶
how many comparisons are done when x
is in v
?
remember, we assumed there is a \(\frac{1}{n}\) chance of x
at being
in any particular location
if x
is at index 0, 1 comparison is done
if x
is at index 1, 2 comparisons are done
if x
is at index 2, 3 comparisons are done
…
if x
is at index i, i + 1 comparisons are done
…
if x
is at index n - 1, n comparisons are done
Linear Search Average Case¶
thus
\(\frac{1}{n}\) of the time 1 comparison is done
\(\frac{1}{n}\) of the time 2 comparisons are done
\(\frac{1}{n}\) of the time 3 comparisons are done
…
\(\frac{1}{n}\) of the time n comparisons are done
Linear Search Average Case¶
the total amount of work done is the sum of each case
\(\frac{1}{n}\cdot 1 + \frac{1}{n}\cdot 2 + \ldots + \frac{1}{n}\cdot n\)
which simplifies to
\(\frac{1}{n}(1 + 2 + \ldots + n)\)
Linear Search Average Case¶
the total number of comparisons done in the case when x
is in v
is
\(\frac{1}{n}(1 + 2 + \ldots + n)\)
which becomes
\(\frac{1}{n}(\frac{n(n+1)}{2})\)
which simplifies to
\(\frac{n+1}{2}\)
or
\(\frac{n}{2} + \frac{1}{2}\)
Linear Search Average Case¶
going back to the sum of all comparisons
\(\frac{1}{2}\) (# comparisons done when x
is in v)
+
\(\frac{1}{2}n\)
this now becomes
\(\frac{1}{2}\frac{n+1}{2} + \frac{1}{2}n\)
or
\(\frac{n+1}{4} + \frac{2n}{4}\)
which is
\(\frac{3n+1}{4}\)
or
\(\frac{3n}{4} + \frac{1}{4}\)
Linear Search Average Case¶
so, assuming there’s a 50-50 chance x
is in v
and, assuming, if x
is in v
it has an equal chance of being at any
index
on average we would expect linear search to do \(\frac{3n}{4} +
\frac{1}{4}\) calls to ==
ignoring the \(\frac{1}{4}\) is fine for practical applications, so we can say that linear search does about \(\frac{3n}{4}\) comparisons on average
Linear Search Average Case¶
\(\frac{3n}{4}\) is closer to the worst case than the best case
this is because half the time x
is not in v
and must do n
comparisons to prove that
Source Code¶
// linearSearch.c
#include <stdio.h>
#include <assert.h>
// Prints the given array C-style, e.g. { 3, 1, -4 }.
void print_array(int* v, int n) {
assert(n >= 0);
if (n == 0) {
printf("{}");
} else if (n == 1) {
printf("{ %d }", v[0]);
} else { // n >= 2
printf("{ %d", v[0]);
for(int i = 1; i < n; i++) {
printf(", %d", v[i]);
}
printf(" }");
}
}
// Same as print_array, but prints a newline at the end.
void println_array(int* arr, int n) {
print_array(arr, n);
printf("\n");
}
// Pre-condition:
// none
// Post-condition:
// Returns the smallest i >= 0 such that v[i] == x; or, if
// x is not anywhere in v, then -1 is returned
int linear_search1(int* v, int n, int x) {
for(int i = 0; i < n; ++i) {
if (x == v[i]) {
return i;
}
}
return -1;
}
// Pre-condition:
// x is in v (i.e. there exists some i such that v[i] == x)
// Post-condition:
// returns the smallest i >= 0 such that v[i] == x
// Note:
// No need to pass the length of array v since x is in the array
// and the loop will stop when it finds it.
int location_of(int* v, int x) {
int i = 0;
while (v[i] != x) {
++i;
}
return i;
}
// v is *not* const so that the end value can be temporarily
// modified by the algorithm.
int linear_search2(int* v, int n, int x) {
if (n == 0) return -1;
if (n == 1) {
if (v[0] == x) return 0;
else return -1;
}
// n >= 2 at this point
// first check if x is the last element
if (v[n-1] == x) return n - 1;
// at this point we know v[n-1] != x
int last = v[n-1]; // save the last element
v[n-1] = x; // make the last element equal to x
int i = location_of(v, x); // search for x
v[n-1] = last; // now put the true last element back
if (i == n - 1) return -1; // if the first x was the one at the end, x is not in v
else return i;
}
// linear search on the range [begin, end)
int linear_search3(int* v, int begin, int end, int x) {
for(int i = begin; i < end; ++i) {
if (x == v[i]) return i;
}
return -1;
}
// recursive linear search on a range
int linear_search4_help(int* v, int begin, int end, int x) {
// printf("linear_search4_help(v, %d, %d, %d) ...\n", begin, end, x);
if (begin >= end) { // base case: empty range
return -1;
} else if (v[begin] == x) { // base case: x is at range's beginning
return begin;
} else { // recursive case
return linear_search4_help(v, begin + 1, end, x); // note the + 1
}
}
int linear_search4(int* v, int n, int x) {
return linear_search4_help(v, 0, n, x);
}
void test1() {
const int n = 4;
int a[] = {4, 1, 9, 5};
println_array(a, n);
assert(linear_search1(a, n, 4) == 0);
assert(linear_search1(a, n, 1) == 1);
assert(linear_search1(a, n, 9) == 2);
assert(linear_search1(a, n, 5) == 3);
assert(linear_search1(a, n, 2) == -1);
assert(linear_search1(a, n, 10) == -1);
printf("all linear_search1 tests passed\n");
}
void test2() {
const int n = 4;
int a[] = {4, 1, 9, 5};
println_array(a, n);
assert(linear_search2(a, n, 4) == 0);
assert(linear_search2(a, n, 1) == 1);
assert(linear_search2(a, n, 9) == 2);
assert(linear_search2(a, n, 5) == 3);
assert(linear_search2(a, n, 2) == -1);
assert(linear_search2(a, n, 10) == -1);
printf("all linear_search2 tests passed\n");
}
void test4() {
const int n = 4;
int a[] = {4, 1, 9, 5};
println_array(a, n);
assert(linear_search4(a, n, 4) == 0);
assert(linear_search4(a, n, 1) == 1);
assert(linear_search4(a, n, 9) == 2);
assert(linear_search4(a, n, 5) == 3);
assert(linear_search4(a, n, 2) == -1);
assert(linear_search4(a, n, 10) == -1);
printf("all linear_search4 tests passed\n");
}
int main() {
test1();
test2();
test4();
}