Show HN: Easy Forth

richdougherty · on Nov 27, 2015

Nice!

If you like Forth, there's information about similar languages over at http://concatenative.org/.

vive-la-liberte · on Nov 27, 2015

I often use dc(1) so my first instinct was to type "1 2 3 +" into the input without intermediate linebreaks. I was delighted to see that this was correctly understood to mean what I wanted to express instead of it doing something silly like taking just the first int of my string or trying to parse the string as a whole straight to int.

These are the sort of things which encourage readers to read on :)

Edit: Unfortunatelly, when I try to scroll and read the rest, the input field steals focus rendering me unable to continue. Firefox on Android.

brudgers · on Nov 27, 2015

Not directly a solution, but there is GNU Forth for Android.

https://play.google.com/store/apps/details?id=gnu.gforth&hl=...

danbolt · on Nov 27, 2015

This is an enjoyable read! I've been coming across mentions of Forth much more often lately, so it feels satisfying being able to get a sense of it.

One thing that caught my attention is that Forth's boolean value for "false" is 0, and for "true" is -1. This makes sense if you look at their binary values, being 00000000 and 11111111, respectively. Does anyone know if there was an underlying design decision for this? Fast hardware checking? Bit masking tricks?

TazeTSchnitzel · on Nov 27, 2015

If you define it like that, you don't need separate boolean and bitwise AND and OR. 11111111 & 00000000 = 00000000, 11111111 | 00000000 = 11111111, ~11111111 = 00000000 etc.

QBASIC (and possibly other Microsoft BASIC dialects?) did the same thing. There's no &&, || or ! in QBASIC.

If you define "true" as just 00000001 then it works for & and |, but not for ~.

abecedarius · on Nov 27, 2015

Bit masking tricks (though Leo Brodie objected to calling that kind of code a 'trick'). Kind of amusing history: in FORTH-79 tests returned 1 for true, then FORTH-83 changed it, and also changed NOT to mean bitwise complement because you could use 0= for logical complement. Confusion! In the next standard NOT was left undefined, or rather they changed the name to INVERT.

peter303 · on Nov 27, 2015

Forth programs are very compact. And so are Forth intepreters. Great for 1970s PCs with memories as small as 8K. But postfix programs are hard to understand. An hour later and you've forgotten what you have written.

kazinator · on Nov 27, 2015

Postfix isn't hard to understand.

Postifix with no syntactic delimitation is hard to understand.

The problem with Forth is that although we undersatnd what each word does in isolation, when we look at a word in the middle of the program, we do not have an instant idea about what material to the left of the word produces its arguments and how, and what material to the right consumes its result. We have to go back and simulate the stack machine in our mind to work this out.

What does this do? w1 w2 w3 w4?

We have to know what w1 through w4 do to understand how their data flows are connected together! That's just wrong. In any sane language, we don't have to know what w4 exactly does in order to understand that, for instance, takes two inputs produced by w1 through w3.

The above words could be totally independent from each other, and produce four operands on the stack. Or it could be that they thread a value through them: each one transforms the top word on the stack.

The expressions in a nice, nested, functional notation give us the functional tree at a glance. We can achieve this while keeping the notation postfix, by adding parentheses:

   (w1 w2 w3)w4

Now w4 is a three-argument function. w1 through w3 take no arguments.

   ((w1)w2 w3)w4

Now w1 produces the argument value for w2. This is the left argument of w4, and w3 is the right argument.

Moreover, now we can reason about different evaluation orders, as long as we know there are no side effects. I can think about w3 and then w2 and w1, or vice versa.

Moreover, if w4 is declared as requiring exactly one argument, the error is now statically obvious.

klibertp · on Nov 27, 2015

Also great for myriads of devices with constrained memory and computing power we use every day in 2015.

> But postfix programs are hard to understand

No.

Badly written programs are hard to understand. That's true for every language. Also, programs written in a language you don't know are hard to understand. That's obvious, right? Postfix, prefix or infix notations have little to do with this.

There are natural languages written from right to left, and also ones written from right to left and vertically at that. Are they "hard to understand" for people who use them? What do you think?

TazeTSchnitzel · on Nov 27, 2015

The direction a language is written in has no bearing on how easy it is to understand. Postfix is a different matter altogether.

klibertp · on Nov 27, 2015

Wait, why? Could you elaborate? How is:

    + 1 2

fundamentally different from:

    1 2 +

? You just look for the operator on the other side of arguments.

I know both prefix, postfix and "crazy infix" (J, some APL) languages and I really don't see a qualitative difference. Of course, after you overcome the initial hurdle and get used to the notation.

EDIT: ok, J/K/APL are a special case and I shouldn't mention them.

TazeTSchnitzel · on Nov 27, 2015

That's too trivial an example. When you get something more complex, the difference becomes clearer.

Can you parse the following easily?

c_z x * s_z y * s_y * c_y z * + c_x * c_z y * s_z x * - s_x * - d_z =

Compare that to the infix form:

d_z = c_x * (c_y * z + s_y * (s_z * y + c_z * x)) - s_x * (c_z * y - s_z * x)

With infix, the operator sits between its operands, so you can easily see what expression makes up the first operand, and which makes up the second: you just look to its left and its right. Postfix, however, requires you to read the entire expression, because the operator doesn't show which operands it has, they're just whatever the last two were, and those last two might themselves be complex expressions. This gets worse the longer the total length of the expression.

Postfix also suffers from not making it clear how many operands a given operator takes. With infix this is always clear.

I like postfix languages for their simplicity, but I refuse to pretend they are as easy to read as infix languages.

(example was taken from https://en.wikipedia.org/wiki/3D_projection#Perspective_proj...)

klibertp · on Nov 27, 2015

Well, for me both examples are totally unreadable. I think math's is the most unreadable notation ever and even a cross between PERL and Brainfuck would be better. Mathematicians are masochists, and I refuse to follow their lead. I prefer meaningful variable names, good use of horizontal and vertical whitespace, context-free grammars, and the like. So maybe this is the difference and the reason for me perceiving the notations as more or less equivalent: I'm not biased, as most people, in favor of infix, but rather biased the other way.

Anyway, let's try doing something with your postfix example (I hope it displays alright; also, I think you made a mistake in your translation to postfix, but I left it as it is):

    c_z   x *
    s_z   y *
        s_y *
    c_y   z *
    +
        c_x *

    c_z   y *
    s_z   x *
    -
        s_x *
    -

        d_z =

Now, this is much more readable than unformatted infix version you give and that's before factoring this into smaller parts. I didn't read that much of Forth, but I'm 97% sure that it would be factored into 3 or 4 words, I think. And it also has an advantage that you read the operation in order they're going to be executed, while with infix you need to read the entire expression to know which computation occurs first.

Of course, it's only more readable for me, with my particular background knowledge and habits; I'm not saying this is or should be the same for anyone else. But, if there are people who see one form as more readable and people who see the other form as more readable, then I think that's a solid argument in favor of both forms not being drastically, qualitatively different.

The problems you point to are definitely real; it's true "the operator doesn't show which operands it has" by default (for example) and you need to go out of your way to show it. But infix notation has it's problems too, and you need to work around them as well. Like operator precedence, which is frankly a terrible idea.

> Postfix also suffers from not making it clear how many operands a given operator takes. With infix, this is always clear.

No, not always. J has words which take one argument on the left and, for example, two on the right. Or three. Or a variable number of arguments on the left (granted, they are `tied` with ` word, but still) and nothing on the right... And Ken Iverson says it's very readable! (To him, at least).

To summarize: I'm still not convinced that there is a major and unfixable difference in readability between the notations, and I still think you can make all 3 notations as readable as any other.

kazinator · on Nov 27, 2015

It's not the direction, it's the lack of delimiting.

  ((1 2 +) (3 4 +) *)

is fundamentally different from

  1 2 3 4 + + *

Each word has a clear arity, and the arguments are delimited. We can follow the evaluation of the parenthesized expression in multiple orders and they come up with the same answer.

klibertp · on Nov 27, 2015

If you take a look at my reply to TazeTSchnitzel you'll see that I "solved" (I think? at least tried to) this problem with indentation (generally whitespace. You can delimit expressions in many ways.

And in Forth case, you can very easily define your own delimiters:

    : [ ;
    : ] ;

which should make you able to write:

    [ [ 1 2 + ] [ 3 4 + ] * ] . 
    21  ok

(tested with gforth and works [EDIT: but of course breaks Forth! Both [ and ] are already defined, so in practice you'd rather choose another chars])

I mean, there is no rule saying that postfix notation cannot provide grouping constructs. I still fail to see a fundamental difference here :)

BTW, it's not going as fast as I'd like, but I managed to parse TXR man page and use it for displaying docs along auto-completion: https://github.com/piotrklibert/txr-mode/blob/master/screen....

I think I'll be able to find some time this weekend (or next weekend) to clean up the code and make it usable for others as well :)

kazinator · on Nov 27, 2015

Whitespace that is not significant to the machine does nothing to help me convince myself that the code is correct. Indentation could be wrong.

If I already know that the code is correct and properly indented, then it helps the readability.

> And in Forth case, you can very easily define your own delimiters:

Those delimiters do nothing but occupy interpreter cycles. Hopefully they get recognized as noops and optimized away by a Forth compiler.

The machine will accept garbage like:

  ] 3 2 + [ 4 /

The fake syntax you've created is there is sort of like a cargo cult airplane made out of bamboo sticks and palm leaves. It has some value as an annotation of correct code, that is all. It could be a useful annotation tool in the process of verifying a piece of code and convincing myself that it's correct. Forth should have these markers built-in so they don't have to be defined as words, and it should check their pairing and nesting. (A syntax highlighting engine can be taught to do that, of course.)

klibertp · on Nov 27, 2015

> Whitespace that is not significant to the machine does nothing to help me convince myself that the code is correct. Indentation could be wrong.

A sloppily written code is hard to understand correctly in every language (to different degrees, of course). You can easily mistake

    if (...)
       do1();
       do2();

for

    if (...) {
       do1();
       do2();
    }

right? This is C - an infix language - and it's arguably its fault for providing this stupid form, but you usually don't judge how readable infix notation is based on this.

> Those delimiters do nothing but occupy interpreter cycles. Hopefully they get recognized as noops and optimized away by a Forth compiler.

I'd expect so.

> The machine will accept garbage like: > ] 3 2 + [ 4 /

Yup. And by the way, this garbage is actually nearly correct J code. One possible fix would look like:

       ] 3 2 + [ 4 (+ /) [ 4

this gives `10 11` as an answer (and please don't ask me why...). ] and [ are left and right identity functions, they simply do nothing, so they may be inserted in many places, in many cases interchangeably.

> Forth should have these markers built-in so they don't have to be defined as words, and it should check their pairing and nesting.

That's unnecessary in Forth, I'm sure you can implement a "real" - with a semantic meaning - grouping in Forth yourself. Comments in Forth are enclosed in parens, and the parens are normal Forth words, defined in Forth. I mean, it's already a grouping syntax with a DROP semantics; you can probably define grouping syntax with other semantics as well.

Also, in Forth there are only words, and the syntax is that simple, but that isn't a property of all postfix languages. We're talking about postfix as a notation in general, not about it's particular brand practiced by Forth. This is why I keep mentioning J, again and again: it's an infix language. I could bash infix notation forever had I used J as an example! It's the same with postfix notation and Forth.

zatkin · on Nov 26, 2015

Everything is great, simple, and easy to understand, and then I get to that Snake example and the code is nearly unreadable.

abecedarius · on Nov 27, 2015

The Snake code is more complex than it needs to be -- example: defining directions as constants from 1 to 4 and then using IF on the direction, instead of defining directions as offsets in the coordinates. But simpler code would not be that much simpler -- Forth just takes getting used to, and I guess the step up to a whole game was too steep. Add some smaller exercises first? Like Sokoban?

(Generally the main thing that'd make Forth more readable is local variables in place of stack manipulations.)

voltagex_ · on Nov 27, 2015

Raise an issue about it? https://github.com/skilldrick/easyforth/issues

to3m · on Nov 27, 2015

If you really insist on having an enum for your directions, what you could do is something like this: (assuming 2s complement and TRUE being ~0)

    : DIREQ DIRECTION @ = ;
    : DELTA DIREQ SWAP DIREQ NEGATE + ;
    : MOVE-SNAKE-HEAD
      LEFT RIGHT DELTA SNAKE-X-HEAD +!
      UP DOWN DELTA SNAKE-Y-HEAD +! ;

It's an open question whether this approach is any better but I think it looks a bit more Forthlike, insofar as I have any grasp of what that is.

I think having offsets would make more sense - e.g., suppose your playfield is 80 units wide, then you'd have left and right as -1 and +1, and up and down as -80 and +80 (which I assume is what you mean).

david-given · on Nov 27, 2015

A couple of months ago I wrote a Forth interpreter, because I'd always wanted to. I haven't really used it in anger, but passes the basic ANS Forth tests, so it should be reasonably complete.

(It's here: https://github.com/EtchedPixels/FUZIX/blob/master/Applicatio... It's a single, portable C file which is also an executable shell script containing an awk script! It compiles to about 8kB of code on a microcontroller.)

From the experience I learnt two main things about Forth:

(a) the realisation of how Forth works, and the way in which the language bootstraps itself out of nothingness, and the way in which it takes about two basic principles and then builds an entire language out of them, is truly mind expanding. The process was full of 'aaah!' moments when it all came together and I realised just how elegant it was.

(b) actually engineering a Forth interpreter, and dealing with the ANS spec, was an exercise in frustration. Those elegant principles are compromised at every stage of the process. The spec defines things which no sane person would define. I'd implement a word, and it'd be clean and work, and then the ANS tests would fail and I would realise that the specification dictates a particular half-arsed implementation which makes no sense whatsoever. The process was full of 'uuugh!' moments when I saw a thing in the spec and realised how much more complicated it would make my life.

Examples follow:

- double words are pushed onto the stack in high word / low word order. Regardless of whether your architecture is big or little endian. Good luck with using 64 bit load/store instructions!

- DO...LOOP is defined to use the return stack for temporary storage. Valid Forth programs can't call EXIT from inside a DO...LOOP structure. If you try, your program does a hyperspace jump and crashes.

- BEGIN...REPEAT is defined not to use the return stack for temporary storage. Valid Forth programs are allowed to call EXIT from inside a BEGIN...REPEAT structure.

- DO...LOOP has different termination characteristics depending on whether you're counting up or down.

- Mismatched control flow structures are not just not detected, but they are actually, in certain combinations, defined to work. The spec actually defines what some of them do --- IF, THEN, BEGIN, WHILE, REPEAT, if I recall correctly --- and lets you mix and match them. Good luck if you want to use different, more efficient implementations.

- The memory model assumes that Forth is the sole owner of the memory. It starts at the bottom and works up. When compiling a word, you have to decide what address it's being written to before you know how long it's going to be. Want to share the heap with something else? Good luck with that.

- Division. How overcomplicated can it be? Answer: extremely.

- Rearranging values on the stack gets old very, very, very quickly.

- Forth isn't typed! Except where it is, and it doesn't check them, and if you get them wrong my the gods have mercy on your soul, because the interpreter surely won't.

I would still say that anybody with any interest in programming should learn at least the basics of Forth, and should write at least the core of a Forth interpreter. (It won't take long, and you'll learn a hell of a lot.) But I'd be really hesitant about recommending it for real programming use, other than for the special niches where it excels, such as embedded systems. Most of the problem is that it's overspecified; it would be so much simpler, faster, and easier to understand if the spec had more undefined behaviour in it. I now understand why so many people just ignore it and write their own dialect. Strong type checking would really help, too.

tonyonodi · on Nov 27, 2015

This looks amazing! Thank you. I've been meaning to learn Forth since forever now.

berntb · on Nov 27, 2015

Fun environment.

A couple of weeks ago I installed 8th to learn a Forth as a hobby. (It promised iOS integration, so it might even be useful. I'm not there yet. :-) )