# O(n log n)

Hypothetically, let us say that you are a Famous Computer Scientist, having graduated from the Famous Computer Scientist School in Smellslikefish, Massachusetts. As a Famous Computer Scientist, people are throwing money at you hand over fist. So, you have a big pile of checks. (You remember checks, little slips of paper with signatures? They represent money.)

You also have a problem: A shiny new Boeing 747 has caught your eye and you need to know whether your pile of checks represents enough money to pay for it. That problem is easy to solve, since all you need to do is to add up the amounts of the checks. But you also have a more pressing problem: Do you have enough time to sum up your checks before the 747 dealer closes for the day?

Being a Famous Computer Scientist, you know something that few others know: that you have to sort the checks, smallest to largest, before you can add them up.

A digression: That is the lowest form of computer science humor, a
somewhat obscure reference to a painful fact about computing hardware.
In this case the problem is that numbers such as $3141.59, $100000.00,
and $2.57 are represented by the computer as *floating point* numbers.
Floating point numbers are effectively represented in scientific or
exponential notation, such as 3.14159 * 10^3, 1.0000000 * 10^5, or
2.57 * 10^0. The computer, however, only allocates a certain number
of digits for each number, as if instead of using one digit to the
left of the decimal point in the exponential notation followed by
further digits to the right, it only had 3 (or 6 or 8) to the left of
the decimal and *no* digits to the right. That would mean that
3141.59 would be represented in the computer as 314*10^1, 100000.00 as
100*10^3, and 2.57 as 257*10-2.

Adding floating point numbers is a multi-step process: the computer first adjusts the numbers so the exponents are the same, then adds the values. But 257^10-2 added to 100*10^3 becomes 0*10^3 or 0.0 when adjusted, so 2.57 + 100000.00 equals 100000.00.

To help mitigate that loss of precision, you need to sort your numbers from smaller to larger before you add them, in hopes that the small numbers will add up to something large enough to avoid completely disappearing when you add the sum to the larger numbers.

There is a discipline dedicated those kind of painful problems, called numerical analysis. Unfortunately, I am supremely unqualified to talk about that discipline, so I shall just quote another Famous Computer Scientist, Edsger W. Dijkstra: End of digression.

The bottom line here is that you need to sort your pile of checks before you can add them together, and you need to know how long it will take to sort them to decide whether you can buy your new bird today or not.

At this point, as a Famous Computer Scientist myself, I am going to
use a very snazzy computer science trick to avoid having to
continually talk about checks and 747s: abstraction. [1] I will throw
away all the irrelevant details, including the 747 dealer (who wants
to go home now anyway) and in fact both the numbers and the addition
operation, and state the problem this way: you have a list of *things*
that you need to sort, and you want to know how long it will take.
The only operation you have on the *things* is to compare two and ask,
"Is this one smaller than that one?"

# Algorithm behavior

Sorting is a fundamental computer science problem and is well studied.
There are many, many algorithms known for sorting. (The *algorithm*
is fundamental to computer science. If you have a program, it is made
up of code written by a programmer. The code implements one or, more
likely, many algorithms. An algorithm is a high-level, general,
abstract description of a process, while the code is the details; the
low-level, concrete, specific description of the process plus a huge
bundle of other trivia.)

Each of the algorithms has its own behavior, and its own speed. But
what does it mean to talk about the *speed* of an algorithm, which is
a thing of the mind and does not *do* anything? That is where the
"Big-Oh"[2] notation comes in.

For every algorithm, any operation used in that algorithm, and any input given to the algorithm, there is a mathematical expression that describes how many times the operation is used on that input by that algorithm. Consider a simple algorithm, going through a list of 12 numbers counting how many there are in the list, one by one. If the operation is looking at a number to determine whether it is the last one in the list, then the operation is used 12 times by that algorithm on that input.

The expression normally depends on the size of the input: if that
simple algorithm is given a list of length *n* (the variable
traditionally used for such things), then it will use that operation
*n* times.

Unfortunately, very often the mathematical expression is complex, not
very helpful, and very unenlightening. The big-oh notation is
designed to highlight the most important part of the expression while
hiding irrelevant details. It describes the *asymptotic* behavior of
the expression as the inputs of the expression get larger. In this
case, that means that it describes the behavior of the algorithm as
the size of the inputs to it get bigger.

Take, for example, the expression 37n^3 + 12n^2 + 19. If n is 1, that equals 37+12+19 = 68. If n is 10, it equals 37*1000 + 12*100 + 19 = 37000 + 1200 + 19. Clearly, as n gets bigger, the final 19 becomes irrelevant. Less clearly, 12n^2 component also becomes irrelevant, because it will be dwarfed by the first. Finally, the 37 is also irrelevant since 37 times a huge number is just another huge number. The asymptotically important part of the expression is the n^3 element. So, 37n^3+12n^2+19 is O(n^3), pronounced "order of n-cubed".

The "order of" an expression describes a kind of upper bound for the expression. It is defined as an expression which, when multiplied by some constant, is greater than or equal to the original expression for all input values larger than some other constant. Because we can choose the new expression, we can pick one that is simpler than the original in the same way that n^3 is simpler than 37n^3+12n^2+19.

(By the way, in case you were wondering, when n = 13, 37n^3+12n^2+19 equals 83336, while 38*n^3 equals 83486. So, n^3 multiplied by 38 is greater than the original expression for all input values greater than n=12. Hence, 37n^3+12n^2+19 is indeed O(n^3).)

The behavior of the counting algorithm is O(n), "order of n". How does that describe the speed of the algorithm? If each operation takes a finite (and by assumption, constant) time, then the time taken by any implementation of the algorithm will depend on the size of the input primarily by n times the time taken by the operation (plus some constant factors here and there). The Big-Oh notation provides a way of comparing two algorithms; for a sufficiently large value of n, n^2 will be greater than 10000n, and thus an O(n^2) algorithm will be slower than an O(n) algorithm for any sufficiently large input. There is a traditional hierarchy of algorithms:

- O(1) is constant-time; such an algorithm does not depend on the size of its inputs.
- O(n) is linear-time; such an algorithm looks at each input element once and is generally pretty good.
- O(n log n) is also pretty decent (that is n times the logarithm base 2 of n).
- O(n^2), O(n^3), etc. These are polynomial-time, and generally starting to look pretty slow, although they are still useful.
- O(2^n) is exponential-time, which is common for artificial intelligence tasks and is really quite bad. Exponential-time algorithms begin to run the risk of having a decent-sized input not finish before the person wanting the result retires.

There are worse; like O(2^2^...(n times)...^2).

# Sorting

How does that apply to sorting? Sorting, as a problem, is clearly at least as bad as O(n), since it has to look at each item, but that is just as clearly not a good estimate. Can a better bound be found? What do the well-known sorting algorithms do?

The best sorting algorithms, going by names like "heapsort" and "quicksort", are O(n log n). But that does not necessarily mean those are the absolute best algorithms. There might well be a better one yet to be discovered. [5]

So, is there a way to find the performance of *any* sorting algorithm?
In this case, the answer is yes.

Going back to our operation, comparison, it takes two items and says
either "yes, this one is smaller than that", or "no, this one is not
smaller than that". It is a *binary* comparison; it produces one of
two possible answers.

A sorted list is a permutation of the original list; it is the same list with the elements rearranged. For a list of n elements, there are n! possible permutations; that is n * (n-1) * (n-2) * ... * 1. The question becomes, how many comparisons do you need to pick out one specific permutation out of the n! possibilities?

Each comparison, because it is binary, reduces the possibilities by half. So, the answer to the question is a number of comparisons, c, such that 2^c >= n!. Solving for c, that is

c >= log n! = log (n * (n-1) * ... * 1) = log n + log (n-1) + ... + log 1

The last expression is of the order of

log n + log (n-1) + ... + log (n/2)

which is of the order of

n/2 * log (n/2)

which is of the order of n log n. [3]

Ultimately, using only comparisons, sorting is O(n log n); any fewer comparisons would not be able to pick out a single permutation from the possibilities. [4]

Heapsort is therefore about as good as sorting algorithms are going to get; anything better will be only an incremental improvement. (Quicksort has bad behavior for certain inputs; for example, using quicksort on an already sorted input is O(n^2).)

A creative part of computer science is often that changing the problem, or adding information, allows dramatic improvements. For example, if I can change the problem to adding a single new element to an already sorted list while keeping it sorted, I can easily find an O(n) algorithm; simply taking the new item on the end and re-sorting would be foolish when I could just go down the list and identify the spot where the new item goes.

(That might give you an idea for beating the problem. Taking one item as a list, it as already sorted. Adding another item to the list is O(n). Adding a third is also O(n). But the resulting algorithm, called "insertion sort", for creating a sorted list from a list of n items is quite bad: Using an O(n) algorithm n times is O(n^2).)

Physics often says fundamental things about the universe. Mathematics is "the queen of the sciences" (even though it is not a science) because it frequently makes final, absolute, dramatic statements. But here is one from computer science: sorting using only basic comparisons is a fundamental problem. (Have you ever tried to find a word in an unsorted dictionary?) The shortest possible time it takes to sort n items is of the order of n log n. And, the heapsort algorithm, if it is not itself the best possible algorithm, is the neighbor of the best.

# Appendix

I am seeing a lot of traffic to this page, presumably from people
interested in the O(*e*) notation. If that describes you, here are
some good links to learn more, in a slightly more serious fashion:

- Big O notation from Wikipedia is reasonably complete, reasonably correct (as far as I can see), and reasonably complex. Probably the exact opposite of the "Wikipedia is a wasteland of pop culture" stereotype.
- Plain English Explanation of Big O Notation is more readable and reasonably complete.
- Big O notation from CS Animated. What can I say, it's
*animated*. Actually, it does have a very good illustration of the process of reducing the total number of operations identified in a block of code to a single, simplified Big O (or in that case, the related Big Theta) expression, as well as a thorough discussion of the ideas, aimed at budding computer scientists. - Check out P and NP for similar entertainment.

cdsmith wrote a blog post on the fibonacci numbers, including analyzing the complexity of several functions to compute them. It turns out that if you use the simple, basic definition of fibonacci:

fib 0 = 1 fib 1 = 1 fib n = fib (n-1) + fib (n-2)

the time complexity of computing *fib n* is *O(fib n)*.

I originally wrote this article not to discuss the Big O, but to
highlight the properties of the sort algorithm. Algorithm analysis is
complicated and interesting on its own. The most entertaining example
that I know about is from an article by Jon Bently, collected in one
of the Programming Pearls books if I recall correctly. He showed two
algorithms for reversing an array of things with the same algorithmic
complexity, and then showed that one was much, *much*, faster than the
other because the slower was largely pessimal in regard to page-based
virtual memory.

If you are interested in floating point numbers, check out Anatomy of a floating point number or the cannonical What Every Computer Scientist Should Know About Floating-Point Arithmetic .

# Footnotes

[1] | To save time, I will skip the joke (physics, I think) and just give you the punchline: "Assume that each cow is a perfect sphere of uniform density...." |

[2] | No, I am not referring to the manga or anime series on Cartoon Network. And, no, I do not understand it either. |

[3] | I have taken this argument from Bruce Mills, Theoretical
Introduction to Programming. |

[4] | Is it possible to do better by 80658175170943878571660636856403766975289505440883277824000000000000 according to my copy of For a better example, suppose that you have many, many decks of cards mixed together in a big heap, and that you want to sort them all by suit and value. There are only 52 possible combinations of face and value: Ace of Hearts, four of Spades, and so forth; every Ace of Hearts is equivalent to every other Ace of Hearts. You arrange 52 bins, examine each card once, and toss it into the appropriate bin. This algorithm, bin sort, sorts the heap in O(n), and can do so because it involves no comparisons between any two cards. |

[5] | For an interesting discussion (with graphs) of the difference between n log n and n algorithms, see O(N log N) Complexity - Similar to linear? on stackoverflow. (And yes, I did give up on keeping the notes in order in the text.) |