I've seen some studies that call what I mean a "chain of thought" approach (also reasonable is just "higher-level reasoning", but maybe we need a term for the engineered implementation), and I like that terminology. I don't know, it's a gut feeling, but a one-shot prompt completion feels intuitively too simple for me - it seems like we need to build up more elaborate "thought processes" next where we use these methods for individual steps in those thought processes, but also have steps that can carry out a computation or go on a fact search.
A bit similar, again, to how humans will answer plenty of questions on the spot from working knowledge, but are also able to classify when they need to drop into a slower "let's do the math" mode and then carry that out.
We may be able to from these "thought processes" then generate training data sets that again reasonably cache "all possible accurate answers" as you demand, but it's not clear to me if that caching will really save computational cost or just redistribute it to different times. At that point it's probably a latency/throughput question of when you do vs. defer the "thinking".
That's undoubtedly true for "reasoning". I don't think it is true for "search".
When I ask <search engine> a question that is "demonstrably" answerable based on "reliable" sources (e.g. what is the capital of Mexico), I don't need or expect it to reason. I want it to (a) tell me the most likely answer, probably (b) provide a list a of URLs that also provide an answer, probably the same one.
And that's what Google and others do right now; ignoring the results presentations being skewed by commercial interests, they do it rather well.
Figuring out when a question is not answerable in this way, but requires reasoning, is certainly a part of the challenge.
A bit similar, again, to how humans will answer plenty of questions on the spot from working knowledge, but are also able to classify when they need to drop into a slower "let's do the math" mode and then carry that out.
We may be able to from these "thought processes" then generate training data sets that again reasonably cache "all possible accurate answers" as you demand, but it's not clear to me if that caching will really save computational cost or just redistribute it to different times. At that point it's probably a latency/throughput question of when you do vs. defer the "thinking".