How does llama.cpp’s grammar adherence work? Does it keep validating the predict...

gamegoblin · on Aug 6, 2024

It's essentially an Earley Parser[0]. It maintains a set of all possible currently valid parses, and zeroes out the probability of any token that isn't valid in at least 1 of the current potential parse trees.

There are contrived grammars you can give it that will make it use exponential memory, but in practice most real-world grammars aren't like this.

[0] https://en.wikipedia.org/wiki/Earley_parser

jules · on Aug 7, 2024

Earley parsers should not need more than O(n^2) memory.

orlp · on Aug 7, 2024

You don't even need that for JSON. JSON can be expressed using a LR(1) grammar, so you can do it in linear time and space.

gamegoblin · on Aug 7, 2024

Yes, the llama.cpp work supports arbitrary CFGs, not just JSON