Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Charles Moore: From Forth to Stack Processors and Beyond (2013) (cpushack.com)
60 points by fogus on April 14, 2020 | hide | past | favorite | 24 comments



Anyone know how Chuck is doing these days? His site hasn't been updated since 2013, but I did see him in an interview 2 or 3 years ago where he seemed well.


He is retired, but he still mess with computers:

https://www.youtube.com/watch?v=SASQMl0rvYg

The sound is terrible at the beginning, it gets a little bit better later on. You can watch his demo if you want more details:

https://www.youtube.com/watch?v=3ML-pJFa8lY


> In 1983 Chuck founded Novix, a company whose goal was to design a processor that was optimal for use with FORTH, a true stack processor.

C started as a language that was designed to take advantage of an existing processor to produce fast code. That has worked spectacularly.

It seems, however, going the other way by first designing a language and then making a processor to make it fast does not work out. Other examples are the Lisp Machines and Java Processors.


I would guess that's for economic reasons rather than technical ones. I think that lispm were considered a great technical success by many people. The problem was that they were a lot more expensive, and much more limited in what you could do with them (not quite "You can program in any language as long as it's lisp, but you get the idea). That constrained them to a relatively tiny market, which in turn limited the amount of money that could go into R&D on the technology. So it was doomed to fall behind.

Java processors, I'm still not sure what the target market there was. It's easy to imagine at least an academic market for lisp machines, particularly during the 1980s and perhaps the early 1990s. But, particularly in a post-1995 software environment, it's hard to imagine a large-scale (as in, not embedded) computing project that can reasonably commit itself to Java, and it's also hard for me to imagine an embedded application where Java would be preferable to a low-level language. Which is also more of an economic problem - they might have worked really well from a technical perspective, but that doesn't mean they weren't a solution in search of a problem.


I joined the RTX team at Harris Semi after grad school. The only devt environment they could offer customers was (as I remember it) completely Forth-based - the barrier to entry was awfully steep, even for greenfield projects.


I learned Forth by myself with Pygmy Forth (no Internet, no book, just the docs of that system) when I was 15 or so. It was my third language after Basic and assembler.

I believe that the challenge of Forth is not to learn it, but to unlearn the rest. It is difficult to forget what you know. It's difficult to forget about dynamic memory allocation, classes, local variables, type checking... And frustrating too.

It's like video games. When I was a kid/teen, I could do over and over again a level until I succeeded. Not today. I would lose patience and bitch about the poor game/level design because I think I know better. I am more often wrong than right on this.

Adult programmers and engineers are like that. They think they know better, so when something is difficult for them, they blame it on the language. They don't have the really open mind kids have. If there's a barrier to entry, it is not in Forth.


I think I agree with you. Forth was (fortunately IMO) the first language I really grokked after undergrad. Before that it was assembly, Apple Basic & (shudder) Fortran. I had zero exposure to CS 101. So Forth warped my brain in all the right ways.


> I would guess that's for economic reasons rather than technical ones. I think that lispm were considered a great technical success by many people.

Hey, an economic failure's a failure. The technically perfect machine that doesn't exist because no market would buy it... doesn't exist.


The opposite happened once C became a thing. Processors started offering features to make C programs faster. The stack frame support that first showed up on 8086 processors comes to mind.

At any rate, what we run these days are for all practical purposes C machines...


Frame pointers help Pascal. While C can also used them to make nicer assembly code (a given variable always has the same offset) it can work well without them.

RISCs are very good C processors, but it is possible to do even better: https://en.wikipedia.org/wiki/AT%26T_Hobbit


stack frame support was around long before the x86 - VAXs are a great example (B6700s a more extreme one) - the first VAXs were created before C had become popular or wide spread


Chuck never hesitated to change the language to fit the CPU. His various Machine Forth dialects definitely co-evolved with the hardware they targeted. In terms of making it fast, I think he was fairly successful within the constraints that these are small CPUs, not superscalar, no giant caches, etc.


It's probably a stretch but the closest thing to this today is TPUs being built for Tensorflow.


Has anybody done a study of dependencies of computations in real programs and compared the efficiency of a stack representation and a register representation?


There's https://www.usenix.org/legacy/events/vee05/full_papers/p153-...

But it's more specifically focused on virtual machines than physical hardware.


Stack Machines, Koopman is the only treatise that I have seen that really digs into the question.

http://users.ece.cmu.edu/~koopman/stack_computers/index.html

My memory of the conclusion is that register machines are a bit faster for general programs. Stack machines are better in some special niches. (High volume/rapid Interrupt handling comes to mind)

I should read the book again myself...


A stack processor will always be less efficient than a two or three operand register-based processor. This is because all of the the registers can be used directly without any stack manipulations to access them, and the two or three operand operations usually include a move.

If you examine any stack language code, you should consider any stack manipulation to be a NOP type of inefficiency. And when a virtual stack machine is implemented on non-stack hardware, these inefficiencies are compounded.


I think you have to have a larger view of what is 'efficient' historically instruction fetch bandwidth was scarce, caches were expensive, often tiny or non-existent - stack machines with one byte opcodes (look at the Burroughs large systems) were efficient in context.

We've switched to RISC (despite the above I'm a big fan have built several) these days largely because there came a point where we could push everything onto a chip, RISC started to make sense at about the time where cache went on-chip (or was SRAM closely coupled to a single die) - and for the record I think x86 has survived because its ISA was the most RISCy of it's original stable-mates (68k, 32k, z8k etc) - x86 instructions make at most 1 memory access (with one exception) with simple operands


Two and (especially) three operand processors spend significantly more space encoding register indexes, though. It's worth it to avoid forth-style pop swap rot, tuck u* spaghetti, but register machines just push the NOP-type inefficiency somewhere else. Eg, obviously most 3ops are the last use of at least one source operand, so there's usually no point encoding separate src1 and dst fields, but also many instructions immediately reuse (often the last and only use) the destination register of the previous instruction as a source.

I'd kinda like to see a machine with a intermediate, one-operand style of instructions. Eg:

  add tos stN # *sp += sp[N]
  add stN pop # sp[N] += *sp++
  ld [stN] # *--sp = mem[sp[N]]


I think Henry Baker argued otherwise saying that at the circuit level a stack processor can be designed to operate faster (like shorter path, less clock cycles) compared to register ones.


The problem with trivial ops like stack manipulations is that they don't really do anything useful, but they can take as long as multiply in the pipeline.

Stack machines made more sense when memory was limited (small opcodes) and there were no hardware multiplies, but these days they make no sense.

I don't understand all the vague nostalgia for something that never could have panned out outside of the creaky old Apollo flight computer or something.


Fair point. That said it's not nostalgia, Baker's article was about linear logic and that forth was naturally fit for this. Cue Rust borrowing and you see why it may be of interest (if baker was right of course).


It's too bad you can't get a single chip conveniently soldered on a schmartboard ready to be used anymore [1].

That was a great deal for only 35$.

The only option now is to buy the evaluation board for close to 500$ or 10 chips for 200$ and do QFN soldering which can be a PITA.

[1] https://schmartboard.com/schmartboard-ez-qfn-88-pins-0-4mm-p...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: