The first thing I want AGI to do is to be able to tell me when it doesn’t know something, or when it’s not certain, so at least give me a heads up to set expectations correctly. I ran my own personal “benchmark” on Gemini 2.5 and it failed just like all others. I told it that I was playing an old point-and-click adventure game from the mid-90s and I was stuck on a certain part, and asked for spoiler-light hints on what to do next. Not only can they not give me hints, they hallucinate completely the game, and invent some weird non-sensical solutions. Every single model does this. Even if I tell them to give up and just give me the solution, they come up with some non-existing solution.
I wonder how hard it is to objectively use information that is available online for 30 years? But the worst part is how it lies and pretends it knows what it’s talking about, and when you point it out it simply turns into another direction and starts lying again. Maybe the use case here is not the main focus of modern AI; maybe modern AI is about generating slop that does not require verification, because it’s “new” content. But to me it just sounds like believable slop, not AGI.
Context gathering - Attempting to answer question via LLM: Are there existing Conversation classes in the ecosystem this should extend?
Context gathering - LLM provided answer: "No"
Context gathering - Attempting to answer question via LLM: How should model selection work when continuing a previous conversation?
Context gathering - LLM answer was UNKNOWN, asking user.
Asking user: How should model selection work when continuing a previous conversation?
Context gathering - received user response to question: "How should model selection work when continuing a previous conversation?"
> The first thing I want AGI to do is to be able to tell me when it doesn’t know something,
In my demo, the llm agent asks followup questions to understand the users problem. Then it first attempts to answer those questions using context and function calling. When a question cannot be answered this way it is forwarded to the user. In other words, it tells you when it doesn't know something.
I wonder how hard it is to objectively use information that is available online for 30 years? But the worst part is how it lies and pretends it knows what it’s talking about, and when you point it out it simply turns into another direction and starts lying again. Maybe the use case here is not the main focus of modern AI; maybe modern AI is about generating slop that does not require verification, because it’s “new” content. But to me it just sounds like believable slop, not AGI.