Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Also, I think he's picturing two almost-equal files. In that case the average running time should be way lower, no? (I believe the quadratic time is worst case)


> two almost-equal files. In that case the average running time should be way lower, no?

Yes, indeed! What you're thinking of is "output-sensitive" algorithms. There are some output-sensitive algorithms for edit distance. The fastest one I found is in "Improved Algorithms for Approximate String Matching" by Dimitris and Georgios Papamichail. They note:

"We designed an output sensitive algorithm solving the edit distance problem between two strings of lengths n and m respectively in time O((s-|n-m|)min(m,n,s)+m+n) and linear space, where s is the edit distance between the two strings."

http://arxiv.org/abs/0807.4368


Cool. The naive approach yields O(exp(s)*n), this looks like a nice improvement.



Yes, but complexity theorists are mostly interested in worst case analysis I believe (which I think is rather unfortunate) -- so the quoted numbers are probably what you get from plugging in 100,000^2 * dt or so.


It's also got a lower quadratic bound, it has to fill in the whole matrix of m*n cells, at least for the simple implementation. There may well be some optimisation for almost identical strings.


1,000 years? Really?

Isn't it more likely that somebody misquoted "slow, like a half an hour" as "slow, like a THOUSAND YEARS"?


  >>> 1000 / ((1024. ** 6) / 3e9 / 86400 / 365)
  82.0593593076069
The algorithm is quadratic in the input size. For a Gigabyte of data, that's 1024^6 operations. Dividing that by 3 * 10^9 operations/second (assuming a 3GHz CPU), 86400 (the number of seconds in a day), and 365 (the number of days in a year), we obtain the runtime (in years) assuming that comparing a single byte takes exactly one operation. Dividing 1000 by that number, we get ~82 operations to compare a single byte, and that doesn't look unreasonable.


They're quoting exponential (2^N), not quadratic (N^2) time.

If on some machine a quadratic-time algorithm took, say, a hundredth of a second to process 100 elements, an exponential-time algorithm would take about 100 quintillion years.


That is yet another section of the article, the thousand years clearly reference the edit distance, which is quadratic.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: