Charles Moore: From Forth to Stack Processors and Beyond (2013)

fouc · on April 14, 2020

Part 2: http://www.cpushack.com/2013/03/02/chuck-moore-part-2-from-s...

oh_sigh · on April 14, 2020

Anyone know how Chuck is doing these days? His site hasn't been updated since 2013, but I did see him in an interview 2 or 3 years ago where he seemed well.

astrobe_ · on April 14, 2020

He is retired, but he still mess with computers:

https://www.youtube.com/watch?v=SASQMl0rvYg

The sound is terrible at the beginning, it gets a little bit better later on. You can watch his demo if you want more details:

https://www.youtube.com/watch?v=3ML-pJFa8lY

RcouF1uZ4gsC · on April 14, 2020

> In 1983 Chuck founded Novix, a company whose goal was to design a processor that was optimal for use with FORTH, a true stack processor.

C started as a language that was designed to take advantage of an existing processor to produce fast code. That has worked spectacularly.

It seems, however, going the other way by first designing a language and then making a processor to make it fast does not work out. Other examples are the Lisp Machines and Java Processors.

mumblemumble · on April 14, 2020

I would guess that's for economic reasons rather than technical ones. I think that lispm were considered a great technical success by many people. The problem was that they were a lot more expensive, and much more limited in what you could do with them (not quite "You can program in any language as long as it's lisp, but you get the idea). That constrained them to a relatively tiny market, which in turn limited the amount of money that could go into R&D on the technology. So it was doomed to fall behind.

Java processors, I'm still not sure what the target market there was. It's easy to imagine at least an academic market for lisp machines, particularly during the 1980s and perhaps the early 1990s. But, particularly in a post-1995 software environment, it's hard to imagine a large-scale (as in, not embedded) computing project that can reasonably commit itself to Java, and it's also hard for me to imagine an embedded application where Java would be preferable to a low-level language. Which is also more of an economic problem - they might have worked really well from a technical perspective, but that doesn't mean they weren't a solution in search of a problem.

mud_dauber · on April 14, 2020

I joined the RTX team at Harris Semi after grad school. The only devt environment they could offer customers was (as I remember it) completely Forth-based - the barrier to entry was awfully steep, even for greenfield projects.

astrobe_ · on April 14, 2020

I learned Forth by myself with Pygmy Forth (no Internet, no book, just the docs of that system) when I was 15 or so. It was my third language after Basic and assembler.

I believe that the challenge of Forth is not to learn it, but to unlearn the rest. It is difficult to forget what you know. It's difficult to forget about dynamic memory allocation, classes, local variables, type checking... And frustrating too.

It's like video games. When I was a kid/teen, I could do over and over again a level until I succeeded. Not today. I would lose patience and bitch about the poor game/level design because I think I know better. I am more often wrong than right on this.

Adult programmers and engineers are like that. They think they know better, so when something is difficult for them, they blame it on the language. They don't have the really open mind kids have. If there's a barrier to entry, it is not in Forth.

mud_dauber · on April 23, 2020

I think I agree with you. Forth was (fortunately IMO) the first language I really grokked after undergrad. Before that it was assembly, Apple Basic & (shudder) Fortran. I had zero exposure to CS 101. So Forth warped my brain in all the right ways.

dri_ft · on April 14, 2020

> I would guess that's for economic reasons rather than technical ones. I think that lispm were considered a great technical success by many people.

Hey, an economic failure's a failure. The technically perfect machine that doesn't exist because no market would buy it... doesn't exist.

upofadown · on April 14, 2020

The opposite happened once C became a thing. Processors started offering features to make C programs faster. The stack frame support that first showed up on 8086 processors comes to mind.

At any rate, what we run these days are for all practical purposes C machines...

jecel · on April 14, 2020

Frame pointers help Pascal. While C can also used them to make nicer assembly code (a given variable always has the same offset) it can work well without them.

RISCs are very good C processors, but it is possible to do even better: https://en.wikipedia.org/wiki/AT%26T_Hobbit

Taniwha · on April 14, 2020

stack frame support was around long before the x86 - VAXs are a great example (B6700s a more extreme one) - the first VAXs were created before C had become popular or wide spread

ahefner · on April 14, 2020

Chuck never hesitated to change the language to fit the CPU. His various Machine Forth dialects definitely co-evolved with the hardware they targeted. In terms of making it fast, I think he was fairly successful within the constraints that these are small CPUs, not superscalar, no giant caches, etc.

vimota · on April 14, 2020

It's probably a stretch but the closest thing to this today is TPUs being built for Tensorflow.

cwzwarich · on April 14, 2020

Has anybody done a study of dependencies of computations in real programs and compared the efficiency of a stack representation and a register representation?

mumblemumble · on April 14, 2020

There's https://www.usenix.org/legacy/events/vee05/full_papers/p153-...

But it's more specifically focused on virtual machines than physical hardware.

lebuffon · on April 16, 2020

Stack Machines, Koopman is the only treatise that I have seen that really digs into the question.

http://users.ece.cmu.edu/~koopman/stack_computers/index.html

My memory of the conclusion is that register machines are a bit faster for general programs. Stack machines are better in some special niches. (High volume/rapid Interrupt handling comes to mind)

I should read the book again myself...

dewster · on April 14, 2020

A stack processor will always be less efficient than a two or three operand register-based processor. This is because all of the the registers can be used directly without any stack manipulations to access them, and the two or three operand operations usually include a move.

If you examine any stack language code, you should consider any stack manipulation to be a NOP type of inefficiency. And when a virtual stack machine is implemented on non-stack hardware, these inefficiencies are compounded.

Taniwha · on April 14, 2020

I think you have to have a larger view of what is 'efficient' historically instruction fetch bandwidth was scarce, caches were expensive, often tiny or non-existent - stack machines with one byte opcodes (look at the Burroughs large systems) were efficient in context.

We've switched to RISC (despite the above I'm a big fan have built several) these days largely because there came a point where we could push everything onto a chip, RISC started to make sense at about the time where cache went on-chip (or was SRAM closely coupled to a single die) - and for the record I think x86 has survived because its ISA was the most RISCy of it's original stable-mates (68k, 32k, z8k etc) - x86 instructions make at most 1 memory access (with one exception) with simple operands

a1369209993 · on April 14, 2020

Two and (especially) three operand processors spend significantly more space encoding register indexes, though. It's worth it to avoid forth-style pop swap rot, tuck u* spaghetti, but register machines just push the NOP-type inefficiency somewhere else. Eg, obviously most 3ops are the last use of at least one source operand, so there's usually no point encoding separate src1 and dst fields, but also many instructions immediately reuse (often the last and only use) the destination register of the previous instruction as a source.

I'd kinda like to see a machine with a intermediate, one-operand style of instructions. Eg:

  add tos stN # *sp += sp[N]
  add stN pop # sp[N] += *sp++
  ld [stN] # *--sp = mem[sp[N]]

agumonkey · on April 14, 2020

I think Henry Baker argued otherwise saying that at the circuit level a stack processor can be designed to operate faster (like shorter path, less clock cycles) compared to register ones.

dewster · on April 15, 2020

The problem with trivial ops like stack manipulations is that they don't really do anything useful, but they can take as long as multiply in the pipeline.

Stack machines made more sense when memory was limited (small opcodes) and there were no hardware multiplies, but these days they make no sense.

I don't understand all the vague nostalgia for something that never could have panned out outside of the creaky old Apollo flight computer or something.

agumonkey · on April 15, 2020

Fair point. That said it's not nostalgia, Baker's article was about linear logic and that forth was naturally fit for this. Cue Rust borrowing and you see why it may be of interest (if baker was right of course).

metroholografix · on April 15, 2020

It's too bad you can't get a single chip conveniently soldered on a schmartboard ready to be used anymore [1].

That was a great deal for only 35$.

The only option now is to buy the evaluation board for close to 500$ or 10 chips for 200$ and do QFN soldering which can be a PITA.

[1] https://schmartboard.com/schmartboard-ez-qfn-88-pins-0-4mm-p...