Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Literate programming: Two beefs with the classic version (2014) (akkartik.name)
43 points by kiyanwang on May 14, 2016 | hide | past | favorite | 33 comments


#includes are a totally broken idea. They are the canonical violation of the DRY (Don't Repeat Yourself) principle. With #includes, whenever you want to make a change in the signature of a C function you have to do it in two places. The only reason we have them at all is because early C compilers were one-pass compilers, and so they needed to have declarations spoon-fed to them in the right order so that they could run in the 64k or so of RAM that was available on the PDP-8. There is no good reason to have includes today except that the C legacy is so strong and the idea is so deeply entrenched in people's minds and in the structure of C that they are nearly impossible to get rid of.


Isn't the problem here imports in general? Imagine these examples written in a literate Haskell file .lhs: 'Let us discuss graph search... > import Control.Monad'


> Isn't the problem here imports in general?

That's a very complicated question.

In a complex world, you have to do something to establish the context in which a particular piece of code is being evaluated. It's a matter of taste whether that context establishment should come before or after the code for which the context is being established. Personally, I prefer the context establishment to come before, but reasonable people can disagree. This extends even to the language level, where you have some languages (like the C family, and Lisp) where variable bindings always precede their usage, and others like ML and Haskell, where the bindings can come after they are used.

But what I was complaining about was specific to #includes, which use a very particular (and badly broken) mechanism to establish context, namely, textual inclusion. It was a horrible hack invented to appease a brain-damaged compiler which could only deal with one file at a time. In 1970 this was an understandable engineering trade-off. In 2016, not so much. The right way to fix it is to get rid of it and replace it with something less brain-damaged rather than layering more hacks on top of it (IMHO).


Note that in Knuth's formulation of literate programming, you could freely reorder the program source. So those imports needn't be at the start of the literate program - they could be off in a parenthetical appendix.


Which basically nulls one half of the beef this blog post has re canonizing typesetting over organization.


Hey, give me some credit here. My beef wasn't that CWeb doesn't let you reorder code. I was talking about policy rather than mechanism, that the ability of CWeb wasn't being put to best use.


Can't blame me for misunderstanding when you have clearly written:

>These are systems that focus on generating beautifully typeset documentation without allowing the author to arbitrarily order code.

As CWEB does allow the author to arbitrarily order code.

There's also this statement:

>I speculate that nobody has actually read anybody else's literate programs in any sort of detail.

Which suggests you yourself have never read any literate programs in detail. Thankfully this statement is provably false as the MMIX Group at MUAS exist? = true so I can merely comment that buggy line out and read the second half of your argument without it being entirely invalidated.


> "These are systems that focus on generating beautifully typeset documentation without allowing the author to arbitrarily order code."

The previous sentence was: "When I look around at the legacy of literate programming, systems to do so-called semi- or quasi-literate programming dominate." Semi- or quasi- being intended to distinguish with classic literate programming as in CWeb.

I usually try to go out of my way to assume that if someone misunderstands me it's a failure of my writing, but in this case my post is filled with Knuth's original CWeb programs. It didn't seem realistic that you missed them all, or that you thought I missed that CWeb permits reordering in spite of having read so many CWeb programs.

And then you make this new jab about me not having read any literate programs, in spite of me showing repeatedly in the article that I went all the way down the list at http://www-cs-faculty.stanford.edu/~uno/programs.html. Are you sure you read my post?

---

Thanks for the pointer to http://mmix.cs.hm.edu/local. Not immediately obvious that it cares about literate programming, but I'll take your word for it. I'd appreciate intros to anybody in that group with experience reading literate programs (email address in profile).


A noweb file is a sequence of chunks, which may appear in any order. Nuweb allows the programmer to present elements of the program in any desired order. Babel/org-mode allows for arbitrary order as well. Here's an example of Babel: http://www.howardism.org/Technical/LP/introduction.html

I'm not sure what LP implementations you have looked at it would have again helped any misunderstanding on my part if you had named said implementations that supposedly "dominate" the legacy of LP.

The jab was to point out how your argument returns false with such statements as "I speculate nobody as ever read a literate program in detail", and "(typeset) can't render inside just about any actual programming environment (editor or IDE) on this planet, and so we can't make changes to it while we work on the codebase" yet emacs exists and can render typeset in a split screen buffer. DrRacket also renders tex and any other typeset you want http://lists.racket-lang.org/users/archive/attachments/20120...


In my experience, when you say "literate programming", most people think perl POD or literate Haskell before CWEB and similar. This was certainly the case for me before making a point of reading what Knuth had actually written (some many years back). I think that's unfortunate, and I think it's valuable to educate people that there's more ideas there than present in some of these systems, and that there are systems supporting more. But I think there are more constructive ways than ascribing malfeasance or incompetence to what seems itself a useful (if not maximally useful) piece of education.


Yeah, it's a couple of years old, and I too am a bit embarrassed when I read it now. It was a rant, read it as such. I find Literate Programming to be strictly an improvement over conventional programming. But "quasi-literate" systems to beautify programs aren't really Literate Programming, and their benefits are more questionable, since they pollute what the term means and keep people from realizing what Literate Programming can be. That's my frustration.


Ok, I admit I hadn't considered Babel (though I was vaguely aware of it) when I looked at literate programming systems. I considered CWeb, Nuweb, Noweb, and Literate Haskell and a bunch of other semi-literate systems. I think my blind spot was to only consider approaches that were oblivious to tooling, things that I could open and read in any editor. That caused me to also miss LEO, as a commenter on the OP pointed out. I'm absolutely aware of LEO and like it, it just wasn't on my mind when writing this.

I still find your approach of interpreting my statements as strictly logical entities to be pointless at best. For the last bloody time, I know that literate programs operate on hunks that can be reordered. I still don't understand how you can think I don't know that after reading the post. Also, was it not obvious that "nobody" meant "nobody else"? And yes, of course I can't be sure that none of the six billion people on the planet has ever read a literate program. (Though if they did and they didn't say anything about it, and they didn't push back against people building crappy quasi-literate systems, are they not like the proverbial falling tree in the forest with nobody to hear it?) It's a rant, dammit. Do you know what that is? :) I'm pissed off at quasi-literate systems that keep people from realizing what Literate Programming can be, and I'm sorry that wasn't clear to you from my writing.

Are there any good example Babel programs you can recommend? I went looking and didn't find anything better/meatier than https://github.com/limist/literate-programming-examples. I'm still skeptical that any sort of typesetting system is of much benefit; if I have to render in a split screen that's one less window for me to open code in, most likely I'll be trying to get by just reading un-rendered code. I think code should be WYSIWYG, because the slightest rendering step is overhead at the scale programmers deal with text. If I had to constantly render λ out of \lambda I'd never use it. But now that my terminal can render unicode sanely I use it all the time. Racket's Scribble system suffers the same drawback (Racket's ability to render Latex that you pointed out is utterly irrelevant in the context of this thread, without Scribble's ability to reorder code).

Anyways, if you'd send me some example Babel programs I'd love to give it a try.


I think lisper was raising a tangential but still valid point. Haskell imports don't suffer from the problem he was pointing out; we only see the function type in one place.

It's not the fault of Haskell imports that literate programs put them first.


But it is a fault of Literate Haskell, which doesn't provide mechanisms allowing you to put them elsewhere.


Yeah, absolutely.


#include repetition is not really that much of a problem these days. Put all your #includes in one include.h file, and include only that include.h in the rest of your code. Then make a precompiled header for include.h (gcc, clang, and even msvc can do this), which will vastly speed up compile times, and compensate for the overhead of including unnecessary headers in your code that doesn't need to include everything.


I think lisper is (rightly) criticizing the repetition of the same code being multiple places due to the includes (not the also irritating need to write the includes multiple places). The problem does go from bad to worse in C and C++ when you work through what happens when static decelerations and method bodies make it into ".h" files. A lot of rules are in place to have these multiple code units appear as a coherent single unit.


Exactly right.


> Put all your #includes in one include.h file

RIP incremental build times.

> Then make a precompiled header for include.h (gcc, clang, and even msvc can do this), which will vastly speed up compile times, and compensate for the overhead of including unnecessary headers in your code that doesn't need to include everything.

Speaking from experience, god no it doesn't. Those are the codebases where fixing a single typo may take an hour to fully rebuild a single build configuration. As the poor bastard charged with porting things and managing said build configurations, this means I either break the build from time to time, or have to wait overnight to have a full set of builds to test before checking anything in.


> ...vastly speed up compile times, ...

Just use ccache if build times are getting atrocious, then focus on getting link times down :)


ccache won't help much if every TU changes with every typo fix - nothing but cache misses. But yes, use ccache or similar to accomplish proper incremental builds.

As for link times: switch from bfd to gold. And maybe play with --incremental. I didn't - just switching to gold shaved minutes off my link times for free, and suddenly link times were no longer my build bottleneck :)



This doesn't address the objection, though. If you change a function signature in library.c, you have to make the same change again in library.h, because they are coupled. The concept of signature-only header files is just making your code more brittle.


The problem of literate programming, as with many of these systems, is that they presuppose a text file on disk with a parsing order. Fundamentally code is a graph, we've just badly flattened it to a poor on disk format.

If we could work, and annotate, at the graph level we'd see far more literate programming.


That's true, code is rarely linear. But the whole idea of encapsulation, separation of concerns and many, many other practices is to increase locality of a code. In that sense a well written and structured code has local, mostly linear systems that you can analyze in isolation to get insight into the larger system.

With this in mind it doesn't matter that the overarching structure is much more complex. Translation unit you're analyzing is (or should) be mostly human-parsable in a linear fashion.

What I found eye-opening in this article is that it challenged my concept of well structured code file. I was like "what?! this is blasphemy!" from the very beginning but at the end I was like "huh, I finally understand why sometimes #define's feel better in the center of a file than they feel at the top"[1]. It's rare for a blog entry to stir me like this. :)

[1] amongst other things that came though my head


Thanks for the kind words!


This is why programming languages that support nesting really help in some scenarios. It's great to be able to write functions inside functions, with or without captured variables/bindings. (I know a tree isn't all graphs, but it works well for many scenarios.)

When I start on a new program in F#, I usually just start with one main function and keep typing straight on down. When I want to re-organize it into a module, it's usually nearly as easy as writing "module" at the top.

Without this, one is encouraged to either not abstract properly as the overhead and context for a new top-level function is too much, or you end up with free-floating functions that don't belong there and only have a single caller.


Anytime I hear about literate programming I think about how mathematicians worked for centuries to encode math symbolically. If you look at the ancient math texts it was all word problems and it was just awful. One of the coolest things about math is how the symbolic representation of the problem allows your mind to work out solutions with less effort. I think the future of programming looks more like math than english.


Let's look at some actual math.

The first article from the most recent issue of the first journal listed on the AMS page is here: http://www.ams.org/journals/ecgd/2016-20-01/S1088-4173-2016-...

To me that looks like quite a bit of English, organized for humans. It is true that there are also quite a few symbols, but I contend it looks a whole lot more like the CWEB implementation of wc than it does like the C implementation.


Ah, finally a break from pure Knuth worship for something objective.

http://unisonweb.org/ doesn't espouse the term"literate programming" itself, but its ideals are comparable. Oh, and, no imports whatsoever :).


Lately I have been working on "hyper hypertext" which involves mashing up code with supporting documentation, making the most of things like Javadoc, etc.


I disagree with the Author: literate programming is Great but it is true that gives a better result For small codebases. I have few tiny projects that are just a README.md which contains code and docs, and I am really glad about this Solution I called KISS Literate Programming: as an example, see http://g14n.info/bb-clone/


I found this talk about re-structuring and bringing back to life a 1.2 MLOC lisp codebase with classical LP interesting (it's a bit meandering, but INHO well worth a watch):

"Literate Programming in the Large": https://www.youtube.com/watch?v=Av0PQDVTP4A

The reorganization is still ongoing, as can been seen at: http://www.axiom-developer.org/axiom-website/currentstate.ht...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: