Also, I think he's picturing two almost-equal files. In that case the average running time should be way lower, no? (I believe the quadratic time is worst case)
> two almost-equal files. In that case the average running time should be way lower, no?
Yes, indeed! What you're thinking of is "output-sensitive" algorithms. There are some output-sensitive algorithms for edit distance. The fastest one I found is in "Improved Algorithms for Approximate String Matching" by Dimitris and Georgios Papamichail. They note:
"We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance between the two strings."
Yes, but complexity theorists are mostly interested in worst case analysis I believe (which I think is rather unfortunate) -- so the quoted numbers are probably what you get from plugging in 100,000^2 * dt or so.
It's also got a lower quadratic bound, it has to fill in the whole matrix of m*n cells, at least for the simple implementation. There may well be some optimisation for almost identical strings.
The algorithm is quadratic in the input size. For a Gigabyte of data, that's 1024^6 operations. Dividing that by 3 * 10^9 operations/second (assuming a 3GHz CPU), 86400 (the number of seconds in a day), and 365 (the number of days in a year), we obtain the runtime (in years) assuming that comparing a single byte takes exactly one operation. Dividing 1000 by that number, we get ~82 operations to compare a single byte, and that doesn't look unreasonable.
They're quoting exponential (2^N), not quadratic (N^2) time.
If on some machine a quadratic-time algorithm took, say, a hundredth of a second to process 100 elements, an exponential-time algorithm would take about 100 quintillion years.