Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How does llama.cpp’s grammar adherence work?

Does it keep validating the predicted tokens and backtrack when it’s not valid?



It's essentially an Earley Parser[0]. It maintains a set of all possible currently valid parses, and zeroes out the probability of any token that isn't valid in at least 1 of the current potential parse trees.

There are contrived grammars you can give it that will make it use exponential memory, but in practice most real-world grammars aren't like this.

[0] https://en.wikipedia.org/wiki/Earley_parser


Earley parsers should not need more than O(n^2) memory.


You don't even need that for JSON. JSON can be expressed using a LR(1) grammar, so you can do it in linear time and space.


Yes, the llama.cpp work supports arbitrary CFGs, not just JSON




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: