Also, I think he's picturing two almost-equal files. In that case the average ru...

jbapple · on June 12, 2015

> two almost-equal files. In that case the average running time should be way lower, no?

Yes, indeed! What you're thinking of is "output-sensitive" algorithms. There are some output-sensitive algorithms for edit distance. The fastest one I found is in "Improved Algorithms for Approximate String Matching" by Dimitris and Georgios Papamichail. They note:

"We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance between the two strings."

http://arxiv.org/abs/0807.4368

darkmighty · on June 12, 2015

Cool. The naive approach yields O(exp(s)*n), this looks like a nice improvement.

smrtinsert · on June 11, 2015

Isn't that the case? http://www.telegraph.co.uk/news/worldnews/northamerica/usa/1...

darkmighty · on June 11, 2015

Yes, but complexity theorists are mostly interested in worst case analysis I believe (which I think is rather unfortunate) -- so the quoted numbers are probably what you get from plugging in 100,000^2 * dt or so.

IneffablePigeon · on June 11, 2015

It's also got a lower quadratic bound, it has to fill in the whole matrix of m*n cells, at least for the simple implementation. There may well be some optimisation for almost identical strings.

daveloyall · on June 11, 2015

1,000 years? Really?

Isn't it more likely that somebody misquoted "slow, like a half an hour" as "slow, like a THOUSAND YEARS"?

mmarx · on June 11, 2015

  >>> 1000 / ((1024. ** 6) / 3e9 / 86400 / 365)
  82.0593593076069

The algorithm is quadratic in the input size. For a Gigabyte of data, that's 1024^6 operations. Dividing that by 3 * 10^9 operations/second (assuming a 3GHz CPU), 86400 (the number of seconds in a day), and 365 (the number of days in a year), we obtain the runtime (in years) assuming that comparing a single byte takes exactly one operation. Dividing 1000 by that number, we get ~82 operations to compare a single byte, and that doesn't look unreasonable.

mathattack · on June 11, 2015

They're quoting exponential (2^N), not quadratic (N^2) time.

If on some machine a quadratic-time algorithm took, say, a hundredth of a second to process 100 elements, an exponential-time algorithm would take about 100 quintillion years.

mmarx · on June 11, 2015

That is yet another section of the article, the thousand years clearly reference the edit distance, which is quadratic.