I am interested in this problem as well. Please share any notes.
I am attempting to create parameterized "logic" problems (similar to the zebra puzzle) which cannot be solved by LLMs even when they are trained on it, or even when they "reason" on it.
Meanwhile this approach is even simpler, where it is demonstrated that LLMs cannot recognize 3 state DFAs. https://arxiv.org/pdf/2501.02825
I am attempting to create parameterized "logic" problems (similar to the zebra puzzle) which cannot be solved by LLMs even when they are trained on it, or even when they "reason" on it.
Meanwhile this approach is even simpler, where it is demonstrated that LLMs cannot recognize 3 state DFAs. https://arxiv.org/pdf/2501.02825