I took that class on the Stanford campus in the Summer of 2011. Keith was one of the most engaging instructors I've ever had. His enthusiasm and knowledge were boundless.
The course was video-recorded and I downloaded and saved most of the lectures (and I gave Keith a DVD with a copy of them.) I still have them, but I don't know if it's okay to put them on YouTube or not.
Currently taking CS 103 (Mathematical Foundations of Computing) under Keith at Stanford. Truly one of the best taught classes I've ever taken - the man has a love for teaching Computer Science that really makes all the difference.
This midterm exam is open-book, open-note, open-computer, but closed-network. This means
that if you want to have your laptop with you when you take the exam, that's perfectly fine, but you must not use a network connection. You should only use your computer to look at notes you've downloaded from online in advance.
If you are an SCPD student, you should take this exam in any two-hour period between 11:00
AM Pacific time, July 20 and 11:00 AM Pacific time, July 21 . If you have any questions, please feel free to call me at _ any time in this window except between midnight and 8AM on July 21, Pacific time. You may submit the exam either by faxing it to _ or by scanning and emailing it
I'm not familiar with how these things work in the US - are the students trusted to take the exam away and not use the internet to get help?
At Stanford, faculty are prohibited from proctoring. So there is no professor or grad student that sits in the exam hall with the students while they are taking the test:
"The faculty on its part manifests its confidence in the honor of its students by refraining from proctoring examinations and from taking unusual and unreasonable precautions to prevent the forms of dishonesty mentioned above."
A professor or grad student walks into the room every 30 minutes or so just to make sure everything looks okay. They otherwise sit outside the exam hall so you can ask them questions, ask to go the bathroom, etc. But it's up to the students to police each other and ultimately themselves.
There are definitely other universities in the US that have this policy, but I would say it's more the exception than the norm.
That's really interesting. I did a CS degree in the UK and all exams were strictly supervised and we were never allowed computers.
The exception to this was programming exams for which the department would boot all the lab workstations into a locked-down mode with no internet & new homedir containing only the exam files. At the end of the exam everything got collected and marked by an automated test suite.
I can't speak for other schools in the US but I am an undergraduate in the CS department at Stanford, and it's fairly common to have an open-computer test to refer to notes and the textbook. Other students sit around you and it's most likely fear of another student approaching a TA that makes this enforceable.
I'm taking my university's offering of this course in my final semester. It's fantastic. I'm using all of the seperate skills I've developed over the last four years: Mathematical thinking (via parsing), algorithms and data structures (ASTS),Programming Languages, software design. It's also the first course where big design decisions are non-trivial, and really effect things later down the pipeline, an experience that is rare in school, but, from what I can tell, ubiquitous in the field. I really recommend it to any students out there.
I'm going to be taking a Compilers course next quarter and I was speaking to the prof who told me that she doesn't plan on using the Dragon book, instead she plans to use the Louden book which seems a lot more approachable in a single quarter.
I was lucky enough to buy a used copy on amazon for $8.
I remember while learning data structures and implementing them, I landed on http://www.keithschwarz.com/interesting/. It was enlightenment for me. By browsing through these examples I learned about good programming practices. Most importantly how commenting your code can help you even when you return after few years to fix something in it.
And again, 90% about the parsing, and a couple of words about "code optimisation". That pitiful so called "Dragon Book" really keeps taking its toll of souls.
I'm terribly sorry to break the news for you, but no, compilers are not about parsing. Nowhere near that.
Right, but while parsing is important, it's the least interesting aspect of implementing a compiler. It would be better if more time was spend on starting with an AST and generating code. Then at the end of the course, implement a parser to generate the AST.
That's exactly how I used to teach compilers. But the weaker students had a really hard time understanding the purpose of ASTs ("why don't you give the program to the code generator as a string?") no matter how often I pointed out why ASTs are a better data structure than strings from the POV of efficient code generation. And some students complained about this "unnatural" sequence of explaining compilers in the course evaluations. So I changed to teaching compilers in the standard order, lexing -> parsing -> type-checking -> code gen -> optimisation. No complaints any more.
There is another dimension to this. While code gen and optimisations are more interesting than lexing/parsing almost no student will ever again write a code generator, let alone optimisations. OTOH, most programmers need to write little lexers and little parsers for little languages all the time. That's why a bit of facility with lexing/parsing and associated tools is hightly useful.
Then my students would have been puzzled by Scheme, which they don't know.
Incidentally, Scheme has a syntax too, so you need lexing and parsing to handle Scheme. The only advantage is that Scheme's syntax is simple.
Moreover the problem of language representation doesn't go away. You still need to represent programs. The real problem is to understand that there are two level: programs and representations of programs. Avoiding to confuse the two is the real challenge, and it remains a challenge in Scheme.
A truly efficient concurrent language is always very much platform-specific (think of Occam on one side of the spectrum and OpenCL and Cuda on another).
> nobody really knows what exactly the memory model is
Really? I've seen a number of Coq/HOL/ACL2 precise memory models of certain architectures.
> idealised compilers have a beautiful pipeline structure
A very practical compiler can also be built as a beautiful and simple pipeline.
> By this reasoning, any computable problem is trivially simple, because I can translate it into ARM/x86/MIPS/... machine code, and machine code is just a linear list of trivial commands, i.e. I can "turn it into a laughably simple chain".
You did not get it at all.
You cannot translate "any code" into a linear trivial sequence with each step being nothing but some primitive form of term rewriting of a single input tree into a single output tree.
A trivial proof for how badly wrong you are: you do not need a Turing-complete language to write a compiler. Now try to apply this to your "any computable problem" reasoning.
> if you chain enough simple things, the result stops being simple.
Incorrect. Such a chain is as simple as its most complex building block. Complexity does not add up.
> Look for example at modern JIT compilers, tracing or otherwise.
Firstly, we're talking about the proper compilers here. JIT is a totally different beast, it got a very severe performance constraint.
Secondly, even JIT can be structured this way, and most of the existing JITs are idiotically, needlessly overcomplicated.
> They are not simple by any stretch of the imagination.
Sorry, but no, they are.
Yes, the non-Turing-complete property breaks there, as well as with any other interpreter, but a Turing-complete subset of a JIT compiler can be very easily isolated.
> Optimisations are easy only if you can accept the optimisation process to be slow, and/or the resulting code buggy.
And this is nothing but a FUD.
Firstly, nanopass compilers are not any slower than the fused ones. In some cases they're faster, due to some of the nice properties of the immutability. Also, you can fuse some passes automatically, while still keeping your code trivial.
Secondly, the simpler your passes are, the easier it is to reason about their correctness. Compare it to the notoriously shitty instcombine pass in LLVM, as it is an example of the opposite approach (the one you're apparently advocating).
I, too, can't post as many messages as I'd like. Not sure why, or how
to change it, or how even to see if there's some limit imposed on
me. Maybe < 1000 rep users are throttled?
> Maybe they're not ready for a compilers course then?
We've been teaching compilers to 2nd years ever since the department
was founded as far as I'm aware. The 2nd year is the right time to see how the
programmer's most important tool works.
> you don't have to roll out your own.
I want students to write their own lexer and parser, and familiarise
themselves with lexer/parser generators. This is an important part of
a programmer's education. First, it let's them see the purpose of all
that formal language theory they had to do in the first year. Second,
and more important, working programmers have to design languages and
write lexers/parsers all the time. A lot of commercial programming is
about transforming data from one representation to another. In
contrast, few programmers will ever write a real code generator.
> you're teaching in a lunatics asylum
I teach at a large, well-known and decidedly mediocre university. The
kind of place that most working programmers graduate from. Needless to
say that this is perfectly compatible with being a lunatics asylum.
> truly efficient concurrent language is always very much
> platform-specific
Sure, but so what? That's like saying for truly efficient concurrent
programming you need to program in assembly, because that's the only
way to exploit all platform specific trickery.
The purpose of high-level languages is to abstract away from the
target platform. The additional goal of languages like C is to
abstract as little as possible, so as to be platform independent while
still allowing fast executables.
In practise compilers for high-performance languages like C/C++ and
Rust need to balance 3 contradictory goals.
- Platform independence.
- Fast executables.
- Low cost of compiler development and evolution.
Note that optimisations are by far the most expensive part of a
compiler. Memory models are used in this balance: you want MMs that
are platform agnostic, while still enabling powerful
compiler optimisations. The CPU vendor has two more objectives (which
is closely related to platform agnosticism):
- Not giving away too many CPU internals.
- Allowing CPU evolution.
> Coq/HOL/ACL2 precise memory models
Are they low-level models of CPUs that can also be used to reason
about read/write reordering? Or are they genuine memory models,
coming from the CPU manufacturers?
As far as I'm aware, the construction of memory models these days is
an empirical science like zoology, where people run experiments to
confirm hypotheses about CPU behaviour. See eg. [1]
> A very practical compiler can also be built as a beautiful and simple pipeline.
Yes, and GHC is probably an example of this. But GHC is not simple by any stretch of the imagination.
M Odersky once gave a talk (sorry, no reference handy) where he said
that scalac, the Scala compiler, is no longer a pipeline, but has
a database structure. That's because scalac's many stages cannot just live on
the data feed by the immediate preceeding stage, and need to go back serveral stages sometime. Interesting.
> you do not need a Turing-complete language to write a compiler.
Type checking / inference in Haskell, Scala and C++ (among others) is
Turing complete. Haskell's compile-time meta-programming extension
Template Haskell, or scala.meta allow you to perform arbitrary
computations as part of the code generation process, and do so by
recursively invoking itself.
On top of this, the control of a Turing machine is a finite state
automata. So it's very simple too.
Finally, I don't see the significance of Turing completeness here.
You can write highly complicated programs in small fragments of Turing
complete languages (e.g. Calculus of Constructions). This is the point
of "Total Functional Programming" [2]. Indeed almost everything the
typical working programmer writes is primitive recursive. Why do we
have Turing complete programming languages? Because using just
primitive recursion is really inconvenient. Try writing a trivial
4-line program like the ancient algorithm computing the greatest
common denominator using just primitive recursion [3]. It's already
annoying. So unrestricted recursion is a convenience, making programs
more readable and shorter. Imagine you had to write type inferencers,
graph-colouring based register allocators or program analyses based on
fix-point iteration using just primitive recursion. Ugh!
> some primitive form of term rewriting of a single input tree into a single output tree.
Term rewriting is Turing complete. So yes, you can transform any
compiler into a term rewriting system. So what? If you transform a
sophisticated, industrial-strength compiler like GHC or scalac into a
TRS, you end up with a very complex TRS that is difficult to
understand. You cannot hide intrinsic complexity. You can kick the can
down the road, and change the form in which the complexity manifests
itself.
> Such a chain is as simple as its most complex building block.
Consider the Turing complete language {S, K} of combinators. By your
reasoning any program written as SK combinators should be simple,
because the building block are trivial.
> JIT is a totally different beast
Not really. In a way, JITs are simpler than AOTs in that the optimisations they
run are simpler (eg register alloc by linear scan rather than graph
colouring), and tracing JIT optimisations are simpler still since they
traces they work on have linear control flow, which drastically
simplifies the optimisation. Indeed that's the point of traces. The
main complication of JITs is the seamless switching between
interpretation and execution of compiled fragments.
> existing JITs are idiotically, needlessly overcomplicated.
The JIT teams at Oracle, Google, or Microsoft would like to hear from
you if you can lower the cost of JIT compiler construction.
> nanopass compilers are ...
I have nothing against nano-pass compilers, and using a lot of
immutable datastructures is fine. IIRC ocamlopt is heavily based on
immutability, and it's one of the fastest compilers around. Whether
nanopass is better for formal reasoning about compiler correctness
remains to be seen. In formal reasoning it doesn't actually matter
that much if information is passed explicitly by argument, or is
global state, since either way, your logic has to keep track of it.
But whether nano- or otherwise, the complexity is intrinsic in the
problem of compilation, which is about bridging the semantic gap
between two languages. And the bigger the semantic gap, the more
complicated the task.
> Needless to say that this is perfectly compatible with being a lunatics asylum.
I see... Yes, I'we been exposed to the quality of the intake of the so-called "new" universities (in the UK). Just did not think they're learning anything beyond Excel.
> I want students to write their own lexer and parser, and familiarise themselves with lexer/parser generators. This is an important part of a programmer's education.
I agree. It should follow the compilers course, once everything of importance there is properly absorbed.
> In contrast, few programmers will ever write a real code generator.
And this is where I cannot agree at all. I believe that everyone must build multiple compilers, routinely, on a daily basis.
The reason to think so is:
1) DSLs are an ultimate answer to a complexity problem. And compilation is the best way to implement them easily.
2) Compilers are often present in where you won't even expect to find anything like a compilation. Protocols, UIs, business logic, databases, whatever. And, given the property of a compiler that it can be reduced to a tiny complexity (we'll get back to it later), it is extremely important to be able to spot such patterns early and convert your architecture to a compiler.
And, of course, all such incidental compilers got absolutely nothing to do with any kind of a parsing. Their source languages are already some data structures (even graphs), almost never a plain boring stream of characters.
> The purpose of high-level languages is to abstract away from the target platform.
Impossible if your target platforms concurrency models are incompatible. You cannot fit message passing into a SPMD, and vice versa.
> Note that optimisations are by far the most expensive part of a compiler.
Not necessarily. For C++, for example, all the optimisations, even with a polyhedral vectorisation, abstract interpretation, partial specialisation and all the bells and whistles, are nothing compared to a mere template expansion. See the clang+llvm profiling data.
> Or are they genuine memory models, coming from the CPU manufacturers?
Not "coming" anywhere, unfortunately. They're all internal models. Not available even under an NDA.
> But GHC is not simple by any stretch of the imagination.
It got quite a few gotchas, to be honest. I'd definitely design it differently. And, Haskell is not the best language for this sort of things - see the "expression problem". A Scrap Your Boilerplate library is sort of fixing part of these issues, but it is not used inside GHC.
> That's because scalac's many stages cannot just live on the data feed by the immediate preceeding stage, and need to go back serveral stages sometime. Interesting.
It is exceptionally over-engineered. But, yes, there is a reason to have a backtracking in a compiler pipeline. While still having a linear pipeline (at the end of the day it is linear, but it may be branching and going back to try out certain hypothesis). I've been experimenting with this approach a lot. It does not increase complexity in any way.
> Type checking / inference in Haskell, Scala and C++ (among others) is Turing complete.
Firstly, for most of your DSLs you don't need such a type system. Secondly, all the implementations I've seen are extremely overengineered, and I've got no idea why.
In fact, any Hindley-Milner like type system is laughably trivial, and the Turing-complete part of it is very much isolated (and you don't have to code it, just use an existing library). Hindley-Milner unification is not any different from Prolog unification, so my way of implementing these type systems is to simply compile your AST into a flat list of Prolog equations (see the recent history of my posts for this method explained in depth).
> So it's very simple too.
It's not. You cannot reason about it.
> This is the point of "Total Functional Programming"
Exactly. You don't need anything beyond a simple total language for your entire compiler (or, if you're implementing some metaprogramming, abstract interpretation or a Turing-complete type system, a 99.99% of your compiler, with the rest being well-isolated and reused over and over again).
> Term rewriting is Turing complete.
You only need a subset of term rewriting, which you can express in a total language.
> In a way, JITs are simpler than AOTs in that the optimisations they run are simpler
Yet, they're interpreters, especially the tracing ones. And interpreters are inherently orders of magnitude more complex than compilers.
> By your reasoning any program written as SK combinators should be simple, because the building block are trivial.
Not really. You're combining them in a deep, horrible tree, not a linear pipeline of isolated things.
I do not see anything extraordinary in an obvious notion of corporate coding being thoroughly broken. It's a common knowledge.
> In formal reasoning it doesn't actually matter that much if information is passed explicitly by argument, or is global state, since either way, your logic has to keep track of it.
It matters. If all you have is a language as an input and another language as an output, without any global state, then every chunk of your pipeline is isolated and you can comprehend it without thinking about the rest. That's exactly why the complexity of the passes do not add up, and the total complexity is only a complexity of the worst of the passes.
> the complexity is intrinsic in the problem of compilation
I've never seen anything complex there. After over 20 years in the field.
> which is about bridging the semantic gap between two languages.
It's not a chasm, you don't have to cross it in one step. Just build a hundred of trivial passes in between, one step at a time.
> And the bigger the semantic gap, the more complicated the task.
No, complexity do not grow. Mind giving an example of the opposite?
P.S. Also, one of the nice properties of the compilers is that you can abstract out and reuse a lot of routine stuff. Register allocation? Write it once and reuse everywhere. SSA? Implement it once and reuse for any imaginable IR. Typing? See above. Constant folding, ADCE, inlining, etc? Write it once and reuse for all the possible IRs.
My point is exactly the opposite - compilers are easy, probably easier than pretty much anything else. And courses like this do not help by pretending this is a complex topic and by obscuring what matters with unimportant minutiae (like parsing and such).
None of what you ssid means writing compilers is hard: just that the first ones or techniques are hard to do. Once it's done, it's another building block that can be chained together to solve similar problems. Using a ML or LISP with right primitives makes that even easier. Designing the right language and implementing modularly makes that easier.
So, I'd say the industrial ones are as tough as they are because they support hard-to-compile languages with many dark corners while bring implemented in the same languages themselves. Make problem hard for yourself then it's hard. Make it easier then it's... still work but only a tiny fraction of what you alluded to.
It's far from trivial, but it is presented as more complex than it should. The parsing point is valid, when in AST land the rewrite / transform / optimize part, which could be seen as the most important one, seems more enjoyable and composable than the parsing stage which is often covered in great details (courses making you write BNF state matrices?).
Compilers are not really complicated compared to plenty of other hard topics. They're large, but they always separate out into different passes that you just run one after the other. You don't have to manage the program as a whole.
And optimizations don't even matter anyway. x86 runs your -O1 program as fast as your -O3, and nobody has put a real accurate model of an x86 into a production compiler for the last decade.
That's a pretty good indication that there is a big problem. Deep at the heart of it is that processor designers don't quite understand the performance characteristics of their processors any more, so they cannot produce e.g. an abstract memory model that compiler writers could base their optimisations on.
Optimisations for parallel processing, e.g. auto-parallelisation are not well developed either.
Oh, I think Intel knows it, but they consider it too much of a trade secret to even tell the compiler team about it.
There are actually numerous performance counters exposed by the CPU too, which are good for dynamic benchmarking, but hardly anyone cares to use them.
BTW, even though the ALU part of the CPU is extremely wide and dynamic, you still have to fit your instructions through the x86 decoder at the front, and it's worth writing a scheduler just to model that part.
I understand that they need to keep the low level details
of their CPU secret, but a memory model? All it says is when/how reads and write commute. If a CPU manufacturer were to
provide a nice, high-level memory model, this would make the CPU more sellable. Why? Because compiler writers could more aggressively optimise when compiling for the CPU.
I suspect that the reason they don't give out memory models is that
they don't know how to do it. That's also the vibe I've been getting
when I talked to people who would know.
Tbh, it is really hard to have an "accurate" model of a very wide OoO core. It is far too dynamic, you instruction scheduler cannot predict the occupancy of the pipelines.
In order to determine the meaning of a programming language, you need a platform independent memory model. This is increasingly important as CPUs are more and more concurrent and programming languages are exposing some of that concurrency to the programmer.
Of course the language designer can always play it safe and choose a ridiculous memory model like sequential consistency, but that prevents many useful compiler optimisations. At the other extreme you could explicitly state that the memory model is a specific processor, but then
(a) the language becomes too platform dependent and program semantics will change with every processor upgrade.
(b) nobody really knows what exactly the memory model is in any meaningful sense because even the manufacturers don't know (other than saying: it is what the CPU does). That means the semantics of the language is unclear.
What you want is something between the Scylla of inefficiency and the Charybdis of platform dependence. This is a difficult problem, and unsolved even for as simple a language as C, see e.g. [1]. Indeed it is unclear if memory models possessing the required attributes even exist, maybe it's contradictory to assume that they can.
Again, compilers are easy, easier than anything else.
If the entire thing is "complicated" (although it should not be), it can still be broken into a chain of laughably trivial passes. This is the most beautiful property of the compilers - an ability to break down complexity all the way down to trivial.
If you do not agree for some reason just name a part of a compiler you perceive as inherently complex, and I will show you how to turn it into a laughably simple chain.
I agree with you that idealised compilers have a beautiful pipeline structure. Finding this structure Is one of the great achievement of the early computing pioneers.
turn it into a laughably simple chain.
By this reasoning, any computable problem is trivially simple, because I can translate it into ARM/x86/MIPS/... machine code, and machine code is just a linear list of trivial commands, i.e. I can "turn it into a laughably simple chain".
My point is: if you chain enough simple things, the result stops being simple.
Look for example at modern JIT compilers, tracing or otherwise. They are not simple by any stretch of the imagination. To see that just look at a program, and predict how long the warm-up will take. Whoops ... unknown [1].
That's true. There continues to be interesting work there. Most people dont want to improve parsers though. They just want to translate one programming language to another form. So, our focus should be on that stuff by default instead of the part that is mostly solved for most usecases even with automated tools.
I agree that it's also better to keep the parsing part small and focus on the more "interesting" things like code generation and optimisation. Recursive descent/precedence climbing seems to be the preferred parsing technique in a lot of production-quality, modern compilers, and I'd argue that knowing about RD is more than sufficient for most of what anyone wanting to play around with compilers needs.
I actually think many useful languages are mostly syntax. If you compile to another source language (like C or SQL) you might not need much optimization.
Parsing some syntaxes is difficult but that doesn't imply we should just ignore it and let libraries handle everything. Thinking about parsing is an opportunity to learn about tradeoffs and to develop a good taste designing syntaxes which are easy to parse (and, possibly correlated, manipulate) both for computers and humans.
> You have to start somewhere. Code optimization is utterly useless if you can't parse code.
It's absolutely not uncommon for architecture experts to plug into the optimization stage of an existing compiler; expecting them to understand the first stages is not useful at all. They need to understand the AST / IR, but not how to arrive at it.
So write up a grammar, feed it to a parser generator like GOLD, run result through a premade parser engine, and BOOM your done. Then, you can start on the stuff in the middle that really needs more people working on it.
The problem is that writing a good grammar requires understanding of parsers.
(Also, real parsers tend to be hand-written, judging from open source language implementations (some have actually migrated from generated parsers to recursive descent in the past). There must be a reason).
The reason basically boils down to generated parsers being very brittle. If you need to tweak it slightly (for example, to handle errors on missing semicolons better), it's difficult or maybe even impossible to get the generated parser to do it well.
Alarm bells ring in my head whenever someone claims "it's a solved problem" in the context of an engineering (as opposed to mathsy) problem. Unless the context is really narrow (and the problem becomes basically a math problem) that claim is usually wrong.
Those of us who care about syntax (that is optimized for the user, not the programmer), will have to keep thinking about parsers for the foreseeable future.
PEG is basically a thin abstraction over recursive descent parsing. Everything you can do in a handwritten parser can be expressed mostly declaratively in PEG.
And, no, there is no single reason not to use PEG. Especially when the choice is between a handwritten parser and a PEG. There is a lot of stupid FUD about PEGs, you should ignore it. Your links are pointing to the ignorant, incompetent mumbling of those who never tried implementing a proper PEG.
And I never met a language for which writing a PEG parser was not totally trivial. A pro tip: use PEG alongside with a Pratt for the binary expressions.
Quite a few very usable languages are just fine without any dedicated syntax at all. Think of Lisp and Fort for example.
And backend is not just about optimisations, they're mostly insig igicant. Backend is about translating gradually one semantics to the other. An example of such is compiling a regular expression into a DFA and then a C array and a switch.
I am talking of the languages that are built on top of Lisp/Forth. I.e., you do not really need any syntax if your underlying implementation language is homoiconic.
I believe the most difficult part of a compiler/language is actually the runtime environment, in case it contains a concurrent garbage collector. Code optimisation is easy compared to that, especially if done a priori.
It is an amazing archive of all kinds of neat algorithms and data structures.