Hacker Newsnew | past | comments | ask | show | jobs | submit | akavel's commentslogin

"due to their inherent stochastic nature, there would still be a small likelihood of producing output that contains errors"

This is the part that I find challenging when trying to help my friends build a correct intuition. Notably, the probabilistic behavior here is counter-intuitive: based on human experience, if you meet a random person, they may indeed tell you bullshit; but once you successfully fact-checked them a few times, you can start trusting they'll generally keep being trustworthy. It's not so with "AIs", and I find it challenging to give them a real-world example of a situation that would be a better analogy for "AI" problems.

In my family, what worked (due to their personal experiences), was an example of asking a tourist guide: that even if the guide doesn't know an answer, there's a high chance they'll invent something on the spot, and it'll be very plausible and convincing, and they'll never know. I'm not sure if that example would work for other listeners, though.

I also tried to ask them to imagine that they're asking each subsequent question not to the same person as before, but every time to a new random person taken from the street / a church / a queue in a shop / whatever crowded place. I thought this is a really cool and technically accurate example, but sadly it seemed to get blank stares from them. (Hm, now I think I could have tried asking why.)

Yet another example I tried, was to imagine a country where it's dishonorable, when asked about directions in a city, to say that you don't know how to get somewhere. (I remember we read and shared a laugh at such an anecdote in some book in the past.) Thus, again, you'll always get an answer, and it'll sound convincing, even if the answerer doesn't know. But again, this one didn't seem to work as good as the travel guide one; but for now I'm still keeping it to try with others in the future if needed.

PS. Ah, ok, yet another I tried was to ask them to think of the "game" of "russian roulette". You roll the barrel, you press the trigger, nothing happens. After a few lucky tries, you may get a dangerous, false feeling of safety. But then suddenly you will eventually get the full chamber.

I also tried to describe "AIs" (i.e. LLMs) as taking a shelf of books, passing them through a blender, then putting the shreds in some random order. The result may sound plausible, and even scientific (e.g. if you got medical books, or physics textbooks). The less you know the domain the books were about, the more convincing it may sound, and the harder it is to catch bullshit.

The last two pictures may have gotten some reception, but I'm not super sure, and there was still arguing especially around the books; and again, they were less of a hit than the tourist guide story.

I'm super curious if you have some analogies of your own that you're trying to use with friends and family? I'd love to steal some and see if they might work with my friends!


Also, in meantime, there's https://SWE-rebench.com as a nice riff on SWE-bench, as far as I understand.


There's a really nice, very low-power, 84x48 B&W LCD screen still widely available for electronics use, a clone of a Nokia 5110 screen - see e.g.:

- https://github.com/akavel/clawtype#clawtype

- mandatory "Bad Apple" vid (not mine): https://youtu.be/v6HidvezKBI

(for the "splash screen" linked above I used font u8g2_font_3x5im_te: https://docs.rs/u8g2-fonts/latest/u8g2_fonts/fonts/struct.u8... and a multilingual u8g2_font_tiny5_t_all: https://docs.rs/u8g2-fonts/latest/u8g2_fonts/fonts/struct.u8...)



Well, maybe the flamingo is a really good unicyclist...

https://youtu.be/Rrpgd5oIKwI


r/LocalLlama is now doing a horse in a racing car:

https://redd.it/1slz38i


AFAIU, their claim is that Mythos is in reality used in a framework that builds such contextual hints, and that their (Aisle's) own framework does the same:

"(...) a well-designed scaffold naturally produces this kind of scoped context through its targeting and iterative prompting stages, which is exactly what both AISLE's and Anthropic's systems do."


All evidence is point to LLMs not being sufficient for the taks everyone want them to do. That harness and agentic capabilities that shove them through JSON-shaped holes are utterly necessary and along with all the security, that there's no great singularity happening here.

The current tech is a sigmoid and even using the abilities of the AI, novelty, improvements don't appear to be happening at any exponetial pace.


> The current tech is a sigmoid

What makes you say that? I'm only asking because the data I've seen looks pretty cleanly exponential still, e.g. https://metr.org.


Lol, young padawan, check up those weird old programs that were called "VisiCalc" and "Lotus 1-2-3".

https://en.wikipedia.org/wiki/VisiCalc

https://en.wikipedia.org/wiki/Lotus_1-2-3


Which were before GUI of any complexity were possible. There was no alternative at the time.

Related, see the insane success and excitement from the early GUI based operating systems.


In the classic FLOSS tradition, it would be cool if you might still consider publishing such a "not-ready" repository - some people may (or may not!) be still interested, and also (sorry!) there's the bus factor... But on the other hand, in the classic FLOSS tradition, it's also 100% your decision and you have the full right to do any way you like!


I'm trying to disable "thinking", but it doesn't seem to work (in llama.cpp). The usual `--reasoning-budget 0` doesn't seem to change it, nor `--chat-template-kwargs '{"enable_thinking":false}'` (both with `--jinja`). Am I missing something?

EDIT: Ok, looks like there's yet another new flag for that in llama.cpp, and this one seems to work in this case: `--reasoning off`.

FWIW, I'm doing some initial tries of unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL, and for writing some Nix, I'm VERY impressed - seems significantly better than qwen3.5-35b-a3b for me for now. Example commandline on a Macbook Air M4 32gb RAM:

  llama-cli -hf unsloth/gemma-4-26B-A4B-it-GGUF:UD-Q4_K_XL  -t 1.0 --top-p 0.95 --top-k 64 -fa on --no-mmproj --reasoning-budget 0 -c 32768 --jinja --reasoning off
(at release b8638, compiled with Nix)


Oh very cool! Will check the `--reasoning off` flag as well!

Yep the models are really good!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: