So, if have an array A and need more
space, then allocate a larger array B and
copy over the contents of array A to array
B and continue with array B.
A guess is that in nearly all cases a
better solution would be a tree where the
array subscripts are used as keys and the
array elements, as leaves.
Maybe the main reason to include dynamic
arrays is to do some applied math to
analyze by how much bigger array B should
be than array A.
(2) Heaps
I like heaps.
At one point in the software of my
startup, I have to search through maybe 1
million numbers and end up with the, say,
20 largest. For that I programmed a heap,
and it has worked out great.
There are versions of the heap algorithm
that are better on locality of reference
and when the heap is carried mostly on
slow, secondary storage.
(3) Balanced Binary Search Trees
AVL (Adelson-Velskii, Landis) trees are in
the Knuth reference, and they are
terrific. An alternative is red-black
trees. One of those two is likely the key
to .NET collection classes, and my
startup uses two instances for a simple,
light, fast key-value store instead of
Redis.
(4) Hash Tables
Those are also in Knuth. Hashing usually
leaves me in doubt due to its various
possible problems. But better still
sometimes is perfect hashing as I recall
also in Knuth.
For hashing in general, a good step
forward is in
Ronald Fagin, Jurg Nievergelt, Nicholas
Pippenger, H. Raymond Strong, Extendible
hashing-a fast access method for dynamic
files, "ACM Transactions on Database
Systems", ISSN 0362-5915, Volume 4, Issue
3, September 1979, Pages: 315 - 344.
We used that in an AI (artificial
intelligence) product we shipped.
(5) Sorting
Knuth covers heap sort and shows that it
meets the Gleason bound for sorting by
comparing pairs of keys.
(6) Graph Searching
Looking at their lecture notes, it appears
that they mean versions of shortest paths
on networks.
These are all fairly simple except for
minimum cost single commodity network
flows where each arc has a maximum flow
and also a cost per unit of flow. The
problem is linear programming, and the
classic simplex algorithm applies and
takes on an especially simple form -- a
basic solution corresponds to a spanning
tree of arcs.
Some good news is that if the arc
capacities are all integers and if start
the algorithm with an integer solution,
then the simplex algorithm maintains an
integer solution and will terminate with
one.
It is tempting to see that "good news" as
a case of progress in NP-complete integer
linear programming.
For such content, I recommend
Mokhtar S. Bazaraa and John J. Jarvis,
Linear Programming and Network Flows,
ISBN 0-471-06015-1, John Wiley and Sons,
New York, 1977.
(7) Dynamic Programming
That can be a big subject but does not
have to be. I got a good introduction
from an expert in about 90 seconds while
my cab was waiting to take me to the
airport. I ended up writing my Ph.D.
dissertation in dynamic programming. For
an easy introduction, can have fun, like
eating from the appetizer plate at
Thanksgiving, say, a half hour at a bite
from
Stuart E. Dreyfus and Averill M. Law, The
Art and Theory of Dynamic Programming,
ISBN 0-12-221860-4, Academic Press, New
York, 1977.
One of the amazing advantages of dynamic
programming is how well it handles
randomness -- then have stochastic optimal
control, Markov decision theory, etc.
The flavor of dynamic programming in
computer science can be a bit different,
e.g., applied to search for some string A
in some string B.
Maybe another purpose of the course is to
get everyone all wound up and fired up
about the question of
P versus NP
My recommendation is to try quite hard to
ignore that question.
You should read about dynamic arrays more carefully. They have amortized O(1) insertion which is better than a tree, and the data is contiguous in memory which gives it better cache locality than a tree. They are one of the most popular data structures.
Parts of your post also seem to me to be quite boastful and low-value: (paraphrasing) "the course takes itself very seriously", "why spend so much time teaching so little material", "these topics are mostly old; just read Knuth", and "dynamic programming is easy; I learned it in 90 seconds and then did my PhD in it".
I had a revision of that post, but I had
it ready only just after the end of the 2
hour window for revisions.
The course pressed hard on the students to
devote, what was it, 4 hours of time a
week in group sessions with more hours
in independent study. That's asking a lot
from the students.
In response I had a lesson and purpose
in that post: (A) That collection of
fundamental algorithms hasn't changed very
fast, were much the same 40 years ago.
(B) Nearly all the algorithms are quite
simple and each one can be learned
quickly, including the derivation of its
big-O performance, and coded, running, and
tested in 2 hours or so. (C) I mentioned
Knuth v.3 as a reference: Tough to
improve on that Knuth volume as a
reference for such algorithms. (D) For
hashing, network flows (graph search),
and dynamic programming I gave really good
references -- tough to compete with any of
them. I used some of my experience to
illustrate (A) -- (D).
That lesson should be some quite good news
for any HN readers considering studying
the algorithms.
> Parts of your post also seem to me to be
quite boastful and low-value:
No, I just used some of my experience to
give examples of my points.
> You should read about dynamic arrays
more carefully. They have amortized O(1)
insertion ....
I saw all that. Get the O(1) property
only with some assumptions and some math
derivations, and I mentioned the math.
Two obvious problems:
(1) It is trivial for any of us to think
of applications where the new allocations
and copying would be wildly wasteful.
(2) For the assumptions, we will rarely
have enough information to have much
confidence in our math derivation.
We also can think of
(3) The reallocations, when there are a
lot of them, will create problems for the
memory management, garbage collection.
Sure, any of us can think of niche
situations where (a) we do a few
reallocations and (b) then go for hours,
..., months with no more reallocations and
with the advantages of arrays.
Dynamic arrays don't belong on a list of
Best Algorithms.
Again, my guess is that the interest of
the course in dynamic arrays is an
opportunity to do the math derivations.
The people that MIT course is intended for
are maybe good HN readers, so an issue for
HN readers is, should they devote a lot of
time to that course? We review movies,
restaurants, etc., and we might also
review courses.
Your attack on my review was mostly an
attack on me: You resented my post
because I mentioned some of my background
and in response attacked me. Instead,
make a good contribution of your own,
maybe as a review of that MIT course.
I'll state the basic lesson again:
The algorithms in that course are nearly
all quite good but old, with some really
good old references, and can be learned
quickly, say, the whole course in a few
weekends.
Thanks for taking the time to read and respond. I admit the second paragraph of my post was a bit aggressive and I was on the fence about posting it. I don't have a problem with you sharing your background but the parts I mentioned previously came off in a certain way to me.
I found your initial argument on dynamic arrays dismissive because you admitted you had never heard of it, then implied that they don't make sense as if to justify why you had never heard of them. I find that intellectually dishonest and it really ticked me off; it's just confirming one's own bias. I still find your argument a bit dismissive although we can agree to disagree. It's not a case of worthiness to be on a list of best algorithms or of fancy math derivations. They are widely used in practice, are O(1) for many operations, work well with caches, and are worth studying for that reason.
As for making a "good contribution of my own" by reviewing the course, I don't feel the need. It's a standard undergrad algorithms course of the kind that most CS students would take. I don't think there's any value in reviewing the syllabus when they all tend to cover the same material.
I'm probably won't reply again so (sincerely) have a good day. I realize you feel attacked but if you're going to opine on something then other people might opine on your opinion. You don't hesitate in your writing style so I didn't either. I just apologize if I made it too personal. I read some things that I couldn't let slide.
https://ocw.mit.edu/courses/6-006-introduction-to-algorithms...
The course takes itself very seriously and seems to ask each student to devote a lot of time to the course.
My summary reaction is that it would be a shame to devote that much time to what is basically so little material.
Yes, the "Syllabus" mentions
Introduction to Algorithms, Cormen, Leiserson, Rivest, and Stein, CLRS.
Years ago I downloaded a PDF and didn't see much beyond what I'd gotten from Knuth, etc.
I can give a fast overview here:
There I see their lists of topics:
dynamic arrays, heaps, balanced binary search trees, hash tables
and
sorting, graph searching, dynamic programming
I'm a little surprised at how old these topics are: I first learned several of the topics almost entirely from
Donald E. Knuth, The Art of Computer Programming, Volume 3, Sorting and Searching, 1973.
(1) Dynamic Array
A glance at how it works explains why I never heard of it:
Can see
"Dynamic array"
at
https://en.wikipedia.org/wiki/Dynamic_array
So, if have an array A and need more space, then allocate a larger array B and copy over the contents of array A to array B and continue with array B.
A guess is that in nearly all cases a better solution would be a tree where the array subscripts are used as keys and the array elements, as leaves.
Maybe the main reason to include dynamic arrays is to do some applied math to analyze by how much bigger array B should be than array A.
(2) Heaps
I like heaps.
At one point in the software of my startup, I have to search through maybe 1 million numbers and end up with the, say, 20 largest. For that I programmed a heap, and it has worked out great.
There are versions of the heap algorithm that are better on locality of reference and when the heap is carried mostly on slow, secondary storage.
(3) Balanced Binary Search Trees
AVL (Adelson-Velskii, Landis) trees are in the Knuth reference, and they are terrific. An alternative is red-black trees. One of those two is likely the key to .NET collection classes, and my startup uses two instances for a simple, light, fast key-value store instead of Redis.
(4) Hash Tables
Those are also in Knuth. Hashing usually leaves me in doubt due to its various possible problems. But better still sometimes is perfect hashing as I recall also in Knuth.
For hashing in general, a good step forward is in
Ronald Fagin, Jurg Nievergelt, Nicholas Pippenger, H. Raymond Strong, Extendible hashing-a fast access method for dynamic files, "ACM Transactions on Database Systems", ISSN 0362-5915, Volume 4, Issue 3, September 1979, Pages: 315 - 344.
We used that in an AI (artificial intelligence) product we shipped.
(5) Sorting
Knuth covers heap sort and shows that it meets the Gleason bound for sorting by comparing pairs of keys.
(6) Graph Searching
Looking at their lecture notes, it appears that they mean versions of shortest paths on networks.
These are all fairly simple except for minimum cost single commodity network flows where each arc has a maximum flow and also a cost per unit of flow. The problem is linear programming, and the classic simplex algorithm applies and takes on an especially simple form -- a basic solution corresponds to a spanning tree of arcs.
Some good news is that if the arc capacities are all integers and if start the algorithm with an integer solution, then the simplex algorithm maintains an integer solution and will terminate with one.
It is tempting to see that "good news" as a case of progress in NP-complete integer linear programming.
For such content, I recommend
Mokhtar S. Bazaraa and John J. Jarvis, Linear Programming and Network Flows, ISBN 0-471-06015-1, John Wiley and Sons, New York, 1977.
(7) Dynamic Programming
That can be a big subject but does not have to be. I got a good introduction from an expert in about 90 seconds while my cab was waiting to take me to the airport. I ended up writing my Ph.D. dissertation in dynamic programming. For an easy introduction, can have fun, like eating from the appetizer plate at Thanksgiving, say, a half hour at a bite from
Stuart E. Dreyfus and Averill M. Law, The Art and Theory of Dynamic Programming, ISBN 0-12-221860-4, Academic Press, New York, 1977.
One of the amazing advantages of dynamic programming is how well it handles randomness -- then have stochastic optimal control, Markov decision theory, etc.
The flavor of dynamic programming in computer science can be a bit different, e.g., applied to search for some string A in some string B.
Maybe another purpose of the course is to get everyone all wound up and fired up about the question of
P versus NP
My recommendation is to try quite hard to ignore that question.