Notes on the Lower Bound for Sorting with Comparisons

Let \(A\) be a deterministic sorting algorithm that takes a permutation of distinct values \(x_1, x_2, \ldots, x_n\) as input and returns them in ascending sorted order.

The only way to get information about these \(x_i\) values is by comparing them with \(<\), e.g. determining if \(x_i < x_j\) is true or false.

When we say that \(A\) is deterministic, we mean it doesn’t do any random actions, and its only inputs are the \(x_1, \ldots, x_n\) values. Furthermore, \(A\) cannot play any tricks involving, say, global variables, reading a file, asking the user questions, opening network connections, looking at the time, etc. We also assume that \(A\) has no bugs and always correctly sorts the data.

Now imagine running \(A\) on a particular permutation of \(x_1, \ldots, x_n\). So \(A\) does a bit of work, then does a comparison, say \(x_a < x_b\). If the comparison is true, it does some more work, and does another comparison. If the comparison was instead false, it does some work that may, or may not, be the same as the work it would have done if the comparison were true. It then does another comparison, and so on and so on. Eventually, \(A\) stops when it has correctly sorted its input, and so it has done a finite sequence of comparisons (interspersed with other work, e.g. swapping values, keeping track of other variables, etc.). If \(A\) printed a message to the screen every time it did a comparison, then the resulting list of comparisons would be a trace through the program.

Now imagine combining all possible traces of comparisons into a binary tree. Each internal node in the tree corresponds to one comparison \(x_i < x_j\), and has two children: one child tree is is for when the comparison is true, and the other child subtree is for when the comparison is false. The leaves of the tree indicate that \(A\) is finished and has put the data in sorted order. Such a tree is called a binary decision tree, or just a decision tree.

Each root-to-leaf path of this decision tree is one possible ordering of \(x_1, \ldots, x_n\), and so each leaf of the tree corresponds to one of the possible permutations of the input.

Every permutation of \(x_1, \ldots, x_n\) takes a different path through the decision tree. To see why, suppose two different permutations of the input, \(P_1\) and \(P_2\), took the same path through the tree. Since \(P_1\) and \(P_2\) are different, there must be at least one pair of different values, \(x_i\) and \(x_j\), such that \(x_i\) comes before \(x_j\) in \(P_1\), but \(x_i\) comes after \(x_j\) in \(P_2\). When \(A\) is done, either \(x_i\) and \(x_j\) will be in the same position as they were in the initial input, or they won’t — it cannot be both. Thus one of \(P_1\) and \(P_2\) cannot be properly sorted if \(P_1\) and \(P_2\) both follow the same path through the tree. Thus, the permutations \(P_1\) and \(P_2\) must follow different paths through the tree.

Finally, we note that there are \(n!\) different permutations of \(x_1, \ldots, x_n\) (remember we assumed that the inputs are all distinct). Since every leaf is associated with exactly one of these permutations, the decision tree has \(n!\) leaves. We know from our earlier study of binary trees that the shortest possible binary tree with \(L\) leaves is \(\log_2 L\). Since \(L=n!\) here, the decision tree must have height \(\log_2 n!\). By using Stirling’s approximation, it turns out that \(\log_2 n!\) is approximately \(n \log_2 n\), and so the height of the decision tree is at least \(O(n \log n)\). That means for any comparison-based sorting algorithm, at least one permutation requires at least \(O(n \log n)\) comparisons to be sorted. Thus, no deterministic comparison-based sorting algorithm can do better than \(O(n \log n)\) comparisons in the worst case.