A Killer Adversary for Quicksort (1999) [pdf]

orlp · on Oct 30, 2015

I used this technique to prove libc++'s implementation of std::sort is O(n^2). There's still an open bug report for this in libc++: https://llvm.org/bugs/show_bug.cgi?id=20837.

On a related note, I'm talking to the libc++ and libstdc++ developers about replacing introsort with pdqsort (https://github.com/orlp/pdqsort) as the implementation for std::sort. It solves the worst case in a subtly different (and potentially more efficient) way than introsort. Although that's not the real selling point.

EDIT: I forgot I still use my old username on hacker news. To prevent confusion, I'm also the author of pdqsort.

lshevtsov · on Oct 30, 2015

I made an implementation of this algorithm as a university assignment;

here's a vanilla implementation of Quicksort https://github.com/leonid-shevtsov/uni-programming/blob/mast...

and here's the function to generate data that causes the custom implementation to go O(n^2) https://github.com/leonid-shevtsov/uni-programming/blob/mast...

jmount · on Oct 30, 2015

I've had bad commercial Quicksort implementations go N^2 on constant data, http://www.win-vector.com/blog/2008/04/sorting-in-anger/ . Kind of sours one on the "Quicksort is essentially elegant" idea.

ilzmastr · on Oct 30, 2015

I don't get how you could create an adversarial input in advance for an implementation that uses a randomized-partition (index of pivot drawn from a uniform distribution).

Can anyone fill me in?

What this article describes (for that, as far as I can tell) is an adversary that modifies the input while quick sort is working. How realistic is that?

asdfasdfsd · on Oct 30, 2015

That algorithm is only used to create an input pattern. When it is fed back to the same sorting function, it will cause it to go quadratic. If your target is using the same sorting function it will do the same on their side.

This attack is realistic. I have tried it on the glibc qsort, generic quicksort, and Visual Studio qsort, and they were vulnerable.

ilzmastr · on Oct 30, 2015

I'm interested in the way you use the word "attack."

I was looking at it from the perspective of you could never run into an input that would consistently sort in O(n^2) using randomized-quicksort.

But your perspective (maybe this article's perspective) is that some malicious adversary could force the sorter to do extra work by knowing in advance the steps that they will take. Am I getting that right? Like from a flops security stand point?

If so, why care about this situation? This only seems like a realistic scenario in very weird domains, like amazon web services messing with everyone's calls to some standard sorting code to charge them more cents for compute time or something...

alejohausner · on Oct 30, 2015

If I'm not mistaken, the attack is aimed at a quicksort implementation where the pivot value is chosen as a median of first, middle, and last values. It would not work on randomized quicksort, of course.

asdfasdfsd · on Oct 31, 2015

Take a look at Denial-of-service attack.

graycat · on Oct 30, 2015

What's the problem? From D. Knuth's The Art of Computer Programming: Sorting and Searching,

(1) if sorting n records by comparing pairs of keys, then the Gleason bound shows that on average can't sort faster than O( n ln(n) ).

(2) Heap sort sorts n records in both worst case and average case in time O( n ln(n) ).

Net, when sorting by comparing pairs of keys, in main memory, on a simple machine architecture, with a single processor, ..., can't improve on Heap sort.

So, just use Heap sort. That's what I do.

orlp · on Oct 30, 2015

Big O hides all constants. A hybrid algorithm using Quicksort for the heavy lifting is O(n log n) in the worst case, but is a significant amount faster than heapsort - mainly due to cache effects.

See pdqsort for examples (benchmarks included): https://github.com/orlp/pdqsort.

It shows that on average it's twice as fast on the random input distribution.

graycat · on Oct 30, 2015

Now and long the usual criterion of performance is big-O notation, and that is what I considered, with the Gleason bound and heap sort and, thus, showed that, with various common assumptions, can't do better than heap sort. As far as I know, I was correct.

So, to claim to beat heap sort, have to set aside some of the usual assumptions and the big-O criterion.

Quicksort is great with a cache since as it has great locality of reference when working with the partitions that fit into the cache. But, worst case of Quicksort is still O( n^2 ) so what happens if Quicksort encounters such a partition?

Just intuitively, it appears that heap sort has a tough time making much exploitation of usual cache designs. We know that. There is a modification of heap sort that helps it work better when have to do virtual memory paging.

I used to get torqued at considering big-O, and did this just because it ignores constants; as a programmer, I worked hard to do well with such constants. But in the end I gave in and gave up and just accepted big-O notation as the most appropriate, simple, single criterion. And, for judging algorithms by how they do with cache exploitation, I gave up on that, also: Or, broadly it's the responsibility of such speedup techniques to work on ordinary code written for a simple processor architecture, not the responsibility of such code to do tricky things to exploit tricky architectures.

In the land of big-O, a factor of only 2 is considered not very exciting.

If relax some of the assumptions, then can beat heap sort and O( n ln(n) ): Use radix sort. That was the sorting algorithm of the old punch card sorting machines. E.g., if have 100 billion records to sort where each key is an alpha-numeric string 10 characters long, then radix sort does all the work in just 10 passes of the 100 billion records. Or for keys m characters long, can sort n records in O( nm ). So that's faster when m < ln(n). Or, to make the arithmetic easier, m < log(n). But log(100 billion) = 12, so radix sort should be faster with m = 10 and n = 100 billion.

So, for a hybrid sort routine, consider also radix sort.

Am I excited about radix sort? Nope!

orlp · on Oct 30, 2015

> But, worst case of Quicksort is still O( n^2 ) so what happens if Quicksort encounters such a partition?

I'd strongly suggest you to read the readme I wrote for https://github.com/orlp/pdqsort, which explains in detail how I beat heapsort (and other sorting algorithms). It also shows how I prevent the O(n^2) worst case.

> I used to get torqued at considering big-O, and did this just because it ignores constants; as a programmer, I worked hard to do well with such constants. But in the end I gave in and gave up and just accepted big-O notation as the most appropriate, simple, single criterion.

We have already established that O(n log n) is the lower bound, and we have reached this lower bound. This is where big O stops being useful. A metric that compares equal for every relevant algorithm (mergesort, introsort, pdqsort, timsort, smoothsort, heapsort) is not a useful metric.

> In the land of big-O, a factor of only 2 is considered not very exciting.

I live in the land of real-world performance, powered by actual benchmarks. In this land a factor of 2 is very exciting.

> So, for a hybrid sort routine, consider also radix sort.

I worked in the scope of std::sort, which is strictly a comparison sort.

taeric · on Oct 30, 2015

> We have already established that O(n log n) is the lower bound, and we have reached this lower bound. This is where big O stops being useful. A metric that compares equal for every relevant algorithm (mergesort, introsort, pdqsort, timsort, smoothsort, heapsort) is not a useful metric.

With the explosion of cores and special purpose hardware, it is interesting to think of how long before sorting network routines are more practical. Dropping the runtime to O(lg n) would be enticing. :)

graycat · on Oct 30, 2015

I assumed that we were talking about in place sorting.

"I feel your pain." As I tried to explain, the first time I saw big-O notation taken seriously as the main metric for evaluating the actual running time of actual code on actual data on an actual computer, I had your reaction, that the constants are important and the big-O criterion, not good.

There's another point: The silent assumption of the big-O criterion is that we are interested in (A) something fundamental about algorithms, just the algorithms, and not (B) actual running time of actual code on actual data on actual computers.

And big progress in algorithms is also important and, as we know, in some cases can give much better performance gains in real computing than anything a programmer can do without the big progress.

And we have a great example: There were days before heap sort, shell sort, quicksort when the in place sorting algorithm people coded were bubble sort or something closely related and also O(n^2). Then in practice, the difference was just huge.

Early in my career, I ran into that at Georgetown University: A prof had coded up some software for teaching statistics and had used the IBM Scientific Subroutine Package (SSP) for a most of the actual calculations. One of the SSP routines was to find ranks, and it had two loops, each essentially bubble sort. In testing, the thing ran all lunch time.

Well, the data to be ranked was 32 bit integers, and an array index was a 16 bit integer. So, I was able to do a tricky overlay of two 16 bit integers on the 32 bit data, use heap sort, and get O( n ln(n) ). In actually running time on nearly any possible real data, my code as so much faster than IBM's SSP that I blew the doors off the SSP.

Lesson 1: O( n ln(n) ) instead of O( n^2 ) can be much better in practice.

Lesson 2: Progress in big-O in algorithms can be terrific stuff.

Back to practice, early in writing the software for my startup, I tried to write a polymorphic version of heap sort. I wrote in Microsoft's Visual Basic .NET. For the data type to be sorted, I passed just the type object. The software ran correctly right away. Then I did some careful timings -- the software commonly ran ballpark 10 times longer than I would have guessed. What the heck? Well, the run time was smart enough to figure out the actual data type and do the comparison, but the overhead here was enormous. Then I learned that for polymorphic code I was supposed to write some interfaces, one for each data type to be sorted, and for sorting a particular data type pass the corresponding interface. Okay. My confusion was that that use of an interface didn't sound polymorphic to me and sounded no different than what I'd long done passing entry variables in PL/I and Fortran. Okay, for polymorphism, use interfaces. And in the code of the interface, have to work essentially without strong typing, essentially with untyped pointers. Okay -- been there, done that, in both PL/I and Fortran; I was surprised that a serious object oriented language with polymorphism would have essentially just syntactic sugar over what I'd been doing in PL/I and Fortran. Okay. Lesson learned: There are cases of hype in parts of programming languages.

Heck, in PL/I my approach to sort an array of PL/I structures would be to pass a PL/I structure that had an element variable that was an entry variable to a procedure that would do the comparisons. That is, could have some conventions that would let PL/I structures be self-describing. Then could write code that was more general. Maybe under the covers, that's just what's going in the Microsoft common language run time (CLR) that Visual Basic .NET used for me without letting me know -- more syntactic sugar?

So, when I rewrote my polymorphic heap sort to do the comparisons with a passed interface, the performance got okay. Still, I have non-polymorphic versions for the more important element data types, and they are faster in just they way you emphasized. Yup, been there, done that.

So, I, too, was concerned about actual running time of actual data. But for the algorithm itself, I was still using just heap sort and didn't try to write a hybrid routine that sometimes might exploit, say, radix sort.

On the pros and cons of using big-O for the only criterion, there's a big example in linear programming: In practice, the simplex algorithm is fantastically fast. Intuitively, in practice, the early iterations in effect focus what to do to get to the optimal solution (assuming feasibility, no unboundedness, etc.). But due to some work of Klee and Minty, the worst case of simplex is exponential and just awful. Intuitively the Klee and Minty examples trick simplex into making really bad choices!

So, there was research to find a polynomial algorithm for linear programming. Eventually there was an ellipsoid algorithm that was polynomial but the constant was so high that whenever in practice ellipsoid beat simplex both ran too long to be practical.

So, really, the concern about simplex and the effort for ellipsoid was as progress just in algorithms and in the big-O criterion. And in practice, right away, the constant was so large that ellipsoid was a flop.

Still, in algorithms, people are straining to do better in the sense of the big-O criterion. Of course the biggest effort here is to find an algorithm that shows that P = NP, maybe the one of the most important problems in both math and computer science, based on big-O for some polynomial.

With all of that, I decided in my own work, for something as simple as sorting, just to go with heap sort, maybe even a polymorphic heap sort. Indeed, with all the emphasis on polymorphism, people can't be very concerned about constants! :-)!

orlp · on Oct 31, 2015

I'm impressed you can write such a long story yet address none of my points. You should consider becoming a politician.

The only relevant piece of information I could find in your anecdote is this:

> I assumed that we were talking about in place sorting.

If we limit ourselves to in-place sorting algorithms then we still are left with block sort, pdqsort, introsort and heap sort, all of which run in O(n log n). My argument still holds.

graycat · on Oct 31, 2015

I wrote that long post trying to explain my attitude on sorting software from 50,000 or so feet up and, in particular, why I have been happy just to use heap sort and forget about looking for anything faster on average.

I was reluctant to read the documentation in your GitHub submission if only because in my software development I have never needed or used GitHub.

Also, I was reluctant to believe that there could be any version of quicksort that would have worst case running time of O( n ln(n) ).

Why reluctant to believe?

First, sure, the usual version of quicksort has us select a pivot value for a partition from a "median of three" keys from that partition.

Second, sure: For a good pivot value, we want essentially the median of the keys in the partition we are trying to split into two partitions. And, sure, the median of three approach to selecting a pivot value is an approximation to the median we really want and, thus, a relatively good pivot value. So this approach to selecting a pivot value promises on average to get faster average running times. Fine. Faster on average is good. Terrific.

But this pivot selection rule says next to nothing about worst case running time, and, for claiming worst case running time of O( n ln(n) ), this is a biggie issue.

Third, with this median of three approach to selecting pivot values, we can think of bad arrays of input keys that will cause quicksort to run in time no faster than O( n^2 ). Then maybe we can think of other ways to select pivot values that result in a new version of quicksort, a version that runs in O( n ln(n) ) again on the bad arrays. But with this change, here's the problem: We are lacking a proof that the worst case running time of the new version of quicksort has worst case running time of O( n ln(n) ).

That is, our new version of quicksort is good for some old cases of bad arrays but may have created for itself some new cases of bad arrays. So, with no proof, the new version of quicksort may also have worst case running time of O( n^2 ) like the old version did. The main difference might be that the bad arrays for the new version are more complicated to construct than the bad arrays for the old case.

But I just looked at the file README.MD you suggested. I see two issues:

First Issue:

> When a new pivot is chosen it's compared to the greatest element in the partition before it. If they compare equal we can derive that there are no elements smaller than the chosen pivot. When this happens we switch strategy for this partition, and filter out all elements equal to the pivot.

If I understand this statement correctly, I suspect that for some arrays of input keys, the resulting algorithm won't actually sort the keys. That is, we won't have a sorting algorithm.

Second Issue:

> While this technically still allows for a quadratic worst case, the chances of it happening are astronomically small.

For proving worst case running time of O( n ln(n) ), "astronomically small" is irrelevant. Maybe your remark "astronomically small" is just a side comment and not relevant to PDQSORT having worst case running time of O( n ln(n) ), but from your documentation I can't easily tell.

> A bad partition occurs when the position of the pivot after partitioning is under 12.5% (1/8th) percentile or over 87,5% percentile - the partition is highly unbalanced. When this happens we will shuffle four elements at fixed locations for both partitions. This effectively breaks up many patterns. If we encounter more than log(n) bad partitions we will switch to heap sort.

For this and the associated documentation, I don't see a solid argument that PDQSORT has worst case running time of O( n ln(n) ).

For some work, a big-O guarantee on worst case performance is a biggie, as in crucial to human life. Such a guarantee is essentially a theorem in applied math called computer science. A good example of how to prove such theorems is, of course, in D. Knuth, The Art of Computer Programming, Volume 3, Sorting and Searching.

Again, I have been happy just to use heap sort and forget about looking for anything faster on average.

orlp · on Nov 1, 2015

> If I understand this statement correctly, I suspect that for some arrays of input keys, the resulting algorithm won't actually sort the keys. That is, we won't have a sorting algorithm.

You don't understand the statement correctly. However, this is not a trivial realization.

Because pdqsort always recurses on the left partition first it means that every element to the left of the current recursive call is equal to or less than every element within the current recursive call. So if we find that the selected pivot is equal to an element before the current recursive call we can conclude that there can only be elements equal to the pivot, and not smaller, in the current recursive call.

> Maybe your remark "astronomically small" is just a side comment and not relevant to PDQSORT having worst case running time of O( n ln(n) ).

Correct.

> For this and the associated documentation, I don't see a solid argument that PDQSORT has worst case running time of O( n ln(n) ).

There is a very simple argument. The 'bad partition counter' is initially log(n). If we encounter a bad partition, we can do up to O(n) work. This can happen log(n) times. So we can do up to O(n log n) work on bad partitions. If we then encounter another bad partition, we switch to heapsort, which is O(n log n).

If we encounter less than log(n) bad partitions Quicksort performed well enough to be O(n log n).

For a similar approach, read up on introsort.

graycat · on Nov 9, 2015

Okay.

greydius · on Oct 30, 2015

> In the land of big-O, a factor of only 2 is considered not very exciting

In the land of solving real-world problems a factor of 2 is extremely significant, especially when you're trying to get the most performance out of a power-constrained system. In fact, even a 1% increase in performance is significant.

jlarocco · on Oct 30, 2015

> Now and long the usual criterion of performance is big-O notation, and that is what I considered, with the Gleason bound and heap sort and, thus, showed that, with various common assumptions, can't do better than heap sort. As far as I know, I was correct.

That's a little short sighted for real life performance. Memory access patterns, cache performance, and the hidden constant factor are all relevant for real performance. For example there's a 200x difference between main memory and cache access times. log2(1000000000000) is ~40, meaning poor cache performance can impact performance by more than a factor of log(n) in the Big-O notation.

There are also many cases where algorithms with better Big-O performance have constant factors so large that they perform worse for common problem sizes. Matrix multiplication is a good example. Strassen's algorithm has a better Big-O than the straightforward algorithm, but the constant factor is large enough that it only becomes faster on matrices larger than maybe 100x100. And there are algorithms even better than Strassen's, but their constant factors are so big they're effectively never better than Strassen's on real hardware.

strmpnk · on Oct 30, 2015

You should check out the sort your parent linked. It talks all about what happens and the overhead involved. ;-)

crpatino · on Oct 30, 2015

In theory, there's no significant differences between theory and practice. In practice, things are never so simple.

When you use Big-Oh notation, O(f(n)), you claim that there exist one function F(n) that accurately characterizes the running time of an algorithm based on the size of the input, and that f(n) is the higher order component of that function. I looks something like this:

   F(n) = A*f(n) + B*g(n) + C*h(n) + ... + K  
      where A,B,C,..K are constants and f,g,h,... are functions.

You approximate F(n) ~= Af(n) because for n big enough, this component will dominate the growth of such function. But A is different for different algorithms... so, in the case of sorting, you have several options - quicksort, mergesort, heapsort - that are in paper all "the same" but each has a different constant, and there are real advantages of using the one with the smallest constant.

Furthermore, if you are not trying to sort infinitely big data (most real world cases) you must consider that other components of the actual F(n) function, such as g and h, may dominate the performance of the algorithm in the range of data sizes that you are more likely to encounter. There are other measures of algorithmic complexity that deal with this (I recall there is one Small-Oh notation, among others), but they don't teach you that at school unless you are a posgraduate student doing highly relevant work in complexity theory. I suspect most guys, like myself, find out by not taking Knuth as holy gospel and reading further material.

And then there are the degenerate cases, where your data can get sorted in quasi-linear time if you apply one of the slow dumb O(n^2) dinosaurs.