O(n log n)

Hypothetically, let us say that you are a Famous Computer Scientist, having graduated from the Famous Computer Scientist School in Smellslikefish, Massachusetts. As a Famous Computer Scientist, people are throwing money at you hand over fist. So, you have a big pile of checks. (You remember checks, little slips of paper with signatures? They represent money.)

You also have a problem: A shiny new Boeing 747 has caught your eye and you need to know whether your pile of checks represents enough money to pay for it. That problem is easy to solve, since all you need to do is to add up the amounts of the checks. But you also have a more pressing problem: Do you have enough time to sum up your checks before the 747 dealer closes for the day?

Being a Famous Computer Scientist, you know something that few others know: that you have to sort the checks, smallest to largest, before you can add them up.

A digression: That is the lowest form of computer science humor, a somewhat obscure reference to a painful fact about computing hardware. In this case the problem is that numbers such as $3141.59, $100000.00, and $2.57 are represented by the computer as floating point numbers. Floating point numbers are effectively represented in scientific or exponential notation, such as 3.14159 * 10^3, 1.0000000 * 10^5, or 2.57 * 10^0. The computer, however, only allocates a certain number of digits for each number, as if instead of using one digit to the left of the decimal point in the exponential notation followed by further digits to the right, it only had 3 (or 6 or 8) to the left of the decimal and no digits to the right. That would mean that 3141.59 would be represented in the computer as 314*10^1, 100000.00 as 100*10^3, and 2.57 as 257*10-2.

Adding floating point numbers is a multi-step process: the computer first adjusts the numbers so the exponents are the same, then adds the values. But 257^10-2 added to 100*10^3 becomes 0*10^3 or 0.0 when adjusted, so 2.57 + 100000.00 equals 100000.00.

To help mitigate that loss of precision, you need to sort your numbers from smaller to larger before you add them, in hopes that the small numbers will add up to something large enough to avoid completely disappearing when you add the sum to the larger numbers.

There is a discipline dedicated those kind of painful problems, called numerical analysis. Unfortunately, I am supremely unqualified to talk about that discipline, so I shall just quote another Famous Computer Scientist, Edsger W. Dijkstra: End of digression.

The bottom line here is that you need to sort your pile of checks before you can add them together, and you need to know how long it will take to sort them to decide whether you can buy your new bird today or not.

At this point, as a Famous Computer Scientist myself, I am going to use a very snazzy computer science trick to avoid having to continually talk about checks and 747s: abstraction. [1] I will throw away all the irrelevant details, including the 747 dealer (who wants to go home now anyway) and in fact both the numbers and the addition operation, and state the problem this way: you have a list of things that you need to sort, and you want to know how long it will take. The only operation you have on the things is to compare two and ask, "Is this one smaller than that one?"

Algorithm behavior

Sorting is a fundamental computer science problem and is well studied. There are many, many algorithms known for sorting. (The algorithm is fundamental to computer science. If you have a program, it is made up of code written by a programmer. The code implements one or, more likely, many algorithms. An algorithm is a high-level, general, abstract description of a process, while the code is the details; the low-level, concrete, specific description of the process plus a huge bundle of other trivia.)

Each of the algorithms has its own behavior, and its own speed. But what does it mean to talk about the speed of an algorithm, which is a thing of the mind and does not do anything? That is where the "Big-Oh"[2] notation comes in.

For every algorithm, any operation used in that algorithm, and any input given to the algorithm, there is a mathematical expression that describes how many times the operation is used on that input by that algorithm. Consider a simple algorithm, going through a list of 12 numbers counting how many there are in the list, one by one. If the operation is looking at a number to determine whether it is the last one in the list, then the operation is used 12 times by that algorithm on that input.

The expression normally depends on the size of the input: if that simple algorithm is given a list of length n (the variable traditionally used for such things), then it will use that operation n times.

Unfortunately, very often the mathematical expression is complex, not very helpful, and very unenlightening. The big-oh notation is designed to highlight the most important part of the expression while hiding irrelevant details. It describes the asymptotic behavior of the expression as the inputs of the expression get larger. In this case, that means that it describes the behavior of the algorithm as the size of the inputs to it get bigger.

Take, for example, the expression 37n^3 + 12n^2 + 19. If n is 1, that equals 37+12+19 = 68. If n is 10, it equals 37*1000 + 12*100 + 19 = 37000 + 1200 + 19. Clearly, as n gets bigger, the final 19 becomes irrelevant. Less clearly, 12n^2 component also becomes irrelevant, because it will be dwarfed by the first. Finally, the 37 is also irrelevant since 37 times a huge number is just another huge number. The asymptotically important part of the expression is the n^3 element. So, 37n^3+12n^2+19 is O(n^3), pronounced "order of n-cubed".

The "order of" an expression describes a kind of upper bound for the expression. It is defined as an expression which, when multiplied by some constant, is greater than or equal to the original expression for all input values larger than some other constant. Because we can choose the new expression, we can pick one that is simpler than the original in the same way that n^3 is simpler than 37n^3+12n^2+19.

(By the way, in case you were wondering, when n = 13, 37n^3+12n^2+19 equals 83336, while 38*n^3 equals 83486. So, n^3 multiplied by 38 is greater than the original expression for all input values greater than n=12. Hence, 37n^3+12n^2+19 is indeed O(n^3).)

The behavior of the counting algorithm is O(n), "order of n". How does that describe the speed of the algorithm? If each operation takes a finite (and by assumption, constant) time, then the time taken by any implementation of the algorithm will depend on the size of the input primarily by n times the time taken by the operation (plus some constant factors here and there). The Big-Oh notation provides a way of comparing two algorithms; for a sufficiently large value of n, n^2 will be greater than 10000n, and thus an O(n^2) algorithm will be slower than an O(n) algorithm for any sufficiently large input. There is a traditional hierarchy of algorithms:

  • O(1) is constant-time; such an algorithm does not depend on the size of its inputs.
  • O(n) is linear-time; such an algorithm looks at each input element once and is generally pretty good.
  • O(n log n) is also pretty decent (that is n times the logarithm base 2 of n).
  • O(n^2), O(n^3), etc. These are polynomial-time, and generally starting to look pretty slow, although they are still useful.
  • O(2^n) is exponential-time, which is common for artificial intelligence tasks and is really quite bad. Exponential-time algorithms begin to run the risk of having a decent-sized input not finish before the person wanting the result retires.

There are worse; like O(2^2^...(n times)...^2).

Sorting

How does that apply to sorting? Sorting, as a problem, is clearly at least as bad as O(n), since it has to look at each item, but that is clearly not a good estimate. Can a better bound be found? What do the well-known sorting algorithms do?

The best sorting algorithms, going by names like "heapsort" and "quicksort", are O(n log n). But that does not necessarily mean those are the absolute best algorithms. There might well be a better one yet to be discovered.

So, is there a way to find the performance of any sorting algorithm? In this case, the answer is yes.

Going back to our operation, comparison, it takes two items and says either "yes, this one is smaller than that", or "no, this one is not smaller than that". It is a binary comparison; it produces one of two possible answers.

A sorted list is a permutation of the original list; it is the same list with the elements rearranged. For a list of n elements, there are n! possible permutations; that is n * (n-1) * (n-2) * ... * 1. The question becomes, how many comparisons do you need to pick out one specific permutation out of the n! possibilities?

Each comparison, because it is binary, reduces the possibilities by half. So, the answer to the question is a number of comparisons, c, such that 2^c >= n!. Solving for c, that is

c >= log n! =
     log (n * (n-1) * ... * 1) =
     log n + log (n-1) + ... + log 1

The last expression is of the order of

log n + log (n-1) + ... + log (n/2)

which is of the order of

n/2 * log (n/2)

which is of the order of n log n. [3]

Ultimately, using only comparisons, sorting is O(n log n); any fewer comparisons would not be able to pick out a single permutation from the possibilities.

Heapsort is therefore about as good as sorting algorithms are going to get; anything better will be only an incremental improvement. (Quicksort has bad behavior for certain inputs; for example, using quicksort on an already sorted input is O(n^2).)

A creative part of computer science is often that changing the problem, or adding information, allows dramatic improvements. For example, if I can change the problem to adding a single new element to an already sorted list while keeping it sorted, I can easily find an O(n) algorithm; simply taking the new item on the end and re-sorting would be foolish when I could just go down the list and identify the spot where the new item goes.

(That might give you an idea for beating the problem. Taking one item as a list, it as already sorted. Adding another item to the list is O(n). Adding a third is also O(n). But the resulting algorithm, called "insertion sort", for creating a sorted list from a list of n items is quite bad: Using an O(n) algorithm n times is O(n^2).)

Physics often says fundamental things about the universe. Mathematics is "the queen of the sciences" even though it is not a science, because it frequently makes final, dramatic statements. But here is one from computer science: sorting using only basic comparisons is a fundamental problem. (Have you ever tried to find a word in an unsorted dictionary?) The shortest possible time it takes to sort n items is of the order of n log n. And, the heapsort algorithm, if it is not itself the best possible algorithm, is the neighbor of the best.

[1]To save time, I will skip the joke (physics, I think) and just give you the punchline: "Assume that each cow is a perfect sphere of uniform density...."
[2]No, I am not referring to the manga or anime series on Cartoon Network. And, no, I do not understand it either.
[3]I have taken this argument from Bruce Mills, Theoretical Introduction to Programming.

Return to Top | About this site...
Last edited Thu Jun 7 18:31:31 2007.
Copyright © 2005-2008 Tommy M. McGuire