More

baalimago · 2026-02-22T10:58:56 1771757936

Another approach is to spec functionality using comments and interfaces, then tell the LLM to first implement tests and finally make the tests pass. This way you also get regression safety and can inspect that it works as it should via the tests.

baalimago · 2026-02-20T12:49:30 1771591770

I've never gotten incorrect answers faster than this, wow!

Jokes aside, it's very promising. For sure a lucrative market down the line, but definitely not for a model of size 8B. I think lower level intellect param amount is around 80B (but what do I know). Best of luck!

otabdeveloper4 · 2026-02-20T20:03:48 1771617828

Make it for Qwen 2.5 and I'd buy it.

You don't actually need "frontier models" for Real Work (c).

(Summarization, classification and the rest of the usual NLP suspects.)

SkyPuncher · 2026-02-21T02:49:16 1771642156

I completely agree. So many things can benefit from having "smart classifiers".

Like, give me semantic search that can detect the difference between SSL and TLS without needing to put a full LLM in the loop.

PlatoIsADisease · 2026-02-20T14:23:01 1771597381

As someone with a 3060, I can attest that there are really really good 7-9B models. I still use berkeley-nest/Starling-LM-7B-alpha and that model is a few years old.

If we are going for accuracy, the question should be asked multiple times on multiple models and see if there is agreement.

But I do think once you hit 80B, you can struggle to see the difference between SOTA.

That said, GPT4.5 was the GOAT. I can't imagine how expensive that one was to run.

Derbasti · 2026-02-20T13:16:16 1771593376

Amazing! It couldn't answer my question at all, but it couldn't answer it incredibly quickly!

Snarky, but true. It is truly astounding, and feels categorically different. But it's also perfectly useless at the moment. A digital fidget spinner.

anthonypasq · 2026-02-20T15:24:41 1771601081

does no one understand what a tech demo is anymore? do you think this piece of technology is just going to be frozen in time at this capability for eternity?

do you have the foresight of a nematode?

edot · 2026-02-20T13:52:11 1771595531

Yeah, two p’s in the word pepperoni …

baalimago · 2026-02-20T09:54:10 1771581250

There is an ongoing lobbying push for "Made in EU" [0] which is unrelated to OPs article. The winds sure are blowing towards European sovereignty. Thanks, Trump!

[0]: https://www.euronews.com/business/2026/02/19/made-in-europe-...

baalimago · 2026-02-20T09:51:03 1771581063

Inspiring! I'll likely pursue the same thing.

baalimago · 2026-02-17T19:59:31 1771358371

I don't see the point nor the hype for these models anymore. Until the price is reduced significantly, I don't see the gain. They've been able to solve most tasks just fine for the past year or so. The only limiting factor is price.

reed1234 · 2026-02-17T20:02:18 1771358538

Efficiency matters too. If a model is smarter so it solves the same task with fewer tokens, that matters more than $/Mtok

baalimago · 2026-02-16T18:14:16 1771265656

Very cool! I imagine it'll be possible to start a static webserver + WebMCP app then use browser as virtualization layer instead of npm/uvx.

The browser has tons of functionality baked in, everything from web workers to persistence.

This would also allow for interesting ways of authenticating/manipulating data from existing sites. Say I'm logged into image-website-x. I can then use the WebMCP to allow agents to interact with the images I've stored there. The WebMCP becomes a much more intuitive way than interpreting the DOM elements

baalimago · 2026-02-14T11:30:14 1771068614

> "Data is the new oil, but only if you know how to refine it."

Oil[0] is fairly useless without being refined as well. Perhaps: "Data is the new oil, you need to refine it"?

[0]: https://en.wikipedia.org/wiki/Petroleum

baalimago · 2026-02-13T20:10:51 1771013451

Well, anyone can derive a new result in anything. Question is most often if the result makes any sense

baalimago · 2026-02-13T13:29:19 1770989359

It's not an ascii renderer, but a ascii diagram drawing tool

baalimago · 2026-02-13T13:20:14 1770988814

I'm a huge fan of asciiflow, this is better!