More

throwaw12 · 2026-04-15T12:08:55 1776254935

OpenAPI is primarily for machine-to-machine which needs determinism and optimized for some cases (e.g. time in unix format with ms accuracy). MCP is optimized for another use case where LLM has many limitations but has good "understanding" of text. instead of sending `{ user: {id: 123123123123, first_name: "XYZYZYZ", "last_name": "SDFSDF", "gender": "..."..... } }` you could return "Mr XYZYZYZ" or "Mrs XYZYZYZ"

llm doesn't need all these and can't parse it anyway without additional tools (e.g. why should it spend tokens even trying to convert unix timestamp to understand the time)

throwaw12 · 2026-04-14T09:40:28 1776159628

> Writing the code hasn’t been the bottle neck to developing software for a long time

It was!

pre-2022 people needed developers to build software for them, now with platforms like Replit, Lovable - people are creating their own tiny software projects, which wasn't easily accessible in the past.

If you say coding wasn't a bottleneck, then indirectly you could also say, you don't need developers. If you need developers, outcome of their other type of work (thinking, designing based on existing tools and so on) is actually CODE.

throwaw12 · 2026-04-13T08:22:03 1776068523

> you can do a lot with 640k…if you try.

it is economically not viable to try anymore.

"XYZ Corp" won't allow their developers to write their desktop app in Rust because they want to consume only 16MB RAM, then another implementation for mobile with Swift and/or Kotlin, when they can release good enough solution with React + Electron consuming 4GB RAM and reuse components with React Native.

KerrAvon · 2026-04-13T15:04:52 1776092692

Strangely enough, AI could turn this on its head. You can have your cake and eat it too, because you can tell Claude/Codex/whatever to build you a full-featured Swift version for iOS and Kotlin for Android and whatever you want on Windows and Mac. There's still QA for the different builds, but you already have to QA each platform separately anyway if you really care that they all work, so in theory that doesn't change.

Of course, it's never that simple in reality; you need developers who know each platform for that to work, because you must run the builds and tell the AI what it's doing wrong and iterate. Currently, you can probably get away with churning out Electron slop and waiting for users to complain about problems instead of QAing every platform. Sad!

gerdesj · 2026-04-13T10:50:56 1776077456

My Commodore 64 begs to differ.

throwaw12 · 2026-04-09T19:09:31 1775761771

I am interested as well what future would look like. So far what I am seeing is:

(1) specialized AI agent -> (2) we should add 1790 agents to be competitive -> (3) pivot to agentic workforce platform

now we have lots and lots of agentic workforce platforms and sandbox providers to run them. All have similar capabilities: create agent for HR, create agent for Sales,...

Hope to see something interesting to pop-up, at least it was happening in SaaS-era where people were inventing new ways of solving old problems: DocuSign, Salesforce, Zoho,...

Aperocky · 2026-04-09T20:18:41 1775765921

I think both product and engineering is lacking. The only thing that works great today is the LLM model themselves.

Everything is dependent on "agents", but there are either barely any scaffold around them or it is full speghetti, at least it's hard to find one that's well constructed.

For instance, humans zoom around in cars, these cars don't spontaneously combust (most of the time), have seatbelts and airbags, and don't need engine oil replacement every 1 mile. Humans are amazing, the cars are also relatively solidly engineered (at least the ones we drive around today).

The agent product that we have today are decidedly NOT that. Maybe for a single week openclaw was it - and then it decided to add a trawler and a fishhook to the car along with 1000 other addition because why not? And that has been true for almost every one of the LLM/AI product I have seen.

crooked-v · 2026-04-09T19:24:15 1775762655

I think the winners here, such as it is, will be the companies that have an actual specialized service that actually does something, where any "agentic" functionality is on top of that.

throwaw12 · 2026-04-08T16:45:40 1775666740

> I don’t think Zuck’s vision is particularly compelling.

But he has to do it anyways, otherwise Meta can be disrupted easily.

Google, Apple has hardware, distribution channels for their products

Amazon has the marketplace and cloud

Microsoft has enterprise and cloud

Meta is always looking for ways to stay afloat

xnx · 2026-04-08T16:56:15 1775667375

Meta has 3.5 billion daily active users

throwaw12 · 2026-04-08T17:05:54 1775667954

and has competitors like: TikTok, SnapChat, YouTube, Netflix, X, HBO, Amazon Prime, all fighting for the attention time.

They are worried something like Sora can disrupt them quickly

throwaw12 · 2026-04-08T16:43:56 1775666636

How is that Meta spent so much money for talent and hardware, but the model barely matches Opus 4.6?

Especially, looking at these numbers after Claude Mythos, feels like either Anthropic has some secret sauce, or everyone else is dumber compared to the talent Anthropic has

strulovich · 2026-04-08T16:50:06 1775667006

Meta did a bunch of mistakes, and look like Zuckerberg spent a lot of money on talent and made big swings to change it (that happened about a year ago)

I think it’s unrealistic to expect them to come back from that pit to the top in one year, but I wouldn’t rule them out getting there with more time. That’s a possible future. They have the money and Zuckerberg’s drive at the helm. It can go a long way.

solenoid0937 · 2026-04-08T16:50:15 1775667015

It's benchmaxxed.

If they actually matched Opus 4.6 on such a short timeline, it would have been mighty impressive. (Keep in mind this is a new lab and they are prohibited from doing distills.)

throwaw12 · 2026-04-08T16:51:12 1775667072

how do you know it's benchmaxxed?

solenoid0937 · 2026-04-08T17:04:18 1775667858

Friends at Meta with access to the model + personal experience at Meta.

Meta's performance process is essentially "show good numbers or you're out." So guess what people do when they don't have good numbers? They fudge them. Happens all across the company.

luma · 2026-04-08T17:29:51 1775669391

For one, they aren't using the latest version of many of the benchmarks. eg, ARC-AGI 2 and not 3, etc.

prodigycorp · 2026-04-08T17:09:23 1775668163

meta's benchmaxing tendencies are well known. llama4 was mega benchmaxxed, there's nothing that suggests to me that meta's culture has changed.

spindump8930 · 2026-04-08T18:36:53 1775673413

Re: changes, there's been enormous turnover in AI organizations, and in theory this one was developed by a "new" org. Whether that means less or more benchmaxxing is anyone's guess.

bob001 · 2026-04-08T23:02:42 1775689362

More I'd guess since the new org needs to prove itself long enough for stock to vest. Fudge the benchmarks gives them a longer horizon before they're all fired anyways.

coffeebeqn · 2026-04-08T17:02:39 1775667759

Matching Opus 4.6 would be pretty good? It’s the SOTA actually available model

reissbaker · 2026-04-08T17:25:22 1775669122

Muse Spark doesn't even match GLM-5.1 on most benchmarks. And GLM is open source!

CuriouslyC · 2026-04-08T19:55:23 1775678123

Anthropic has just been focused on coding/terminal work longer mostly, and their PRO tier model is coding focused, unlike the GPT and Gemini pro tier models which have been optimized for science.

Their whole "training the LLM to be a person" technique probably contributes to its pleasant conversational behavior, and making its refusals less annoying (GPT 5.2+ got obnoxiously aligned), and also a bit to its greater autonomy.

Overall they don't have any real moat, but they are more focused than their competition (and their marketing team is slaying).

zozbot234 · 2026-04-08T20:19:41 1775679581

Autonomy for agentic workflows has nothing to do with "replying more like a person", you have to refine the model for it quite specifically. All the large players are trying to do that, it's not really specific to Anthropic. It may be true however that their higher focus on a "Constitutional AI"/RLAIF approach makes it a bit easier to align the model to desirable outcomes when acting agentically.

CuriouslyC · 2026-04-08T22:46:54 1775688414

You think it has nothing to do with it. Even they only have a loose understanding of exactly the final results of trying to treat Claude like a real being in terms of how the model acts.

For example, Claude has a "turn evil in response to reinforced reward hacking" behavior which is a fairly uniquely Claude thing (as far as I've seen anyhow), and very likely the result of that attempt to imbue personhood.

impulser_ · 2026-04-08T16:51:00 1775667060

It's not even on par with Sonnet. It's on par with open source models and it not even open source and sit behind a private preview API.

Might as well not release anything.

username223 · 2026-04-08T16:59:00 1775667540

Facebook is working with the talent that can’t find a job at some other company. It doesn’t surprise me they ship mediocrity.

zozbot234 · 2026-04-08T16:48:51 1775666931

> has some secret sauce

Yup, it's called test-time compute. Mythos is described as plenty slower than Opus, enough to seriously annoy users trying to use it for quick-feedback-loop agentic work. It is most properly compared with GPT Pro, Gemini DeepThink or this latest model's "Contemplating" mode. Otherwise you're just not comparing like for like.

throwaw12 · 2026-04-08T16:53:01 1775667181

> it's called test-time compute.

Why can't others easily replicate it?

coder68 · 2026-04-08T17:02:46 1775667766

I have not delved into the theory yet but it seems that the smaller open-source models do this already to an extent. They have less parameters, but spend much more time/tokens reasoning, as a way to close the performance gap. If you look at "tokens per problem" on https://swe-rebench.com/ it seems to be the case at least.

throwaw12 · 2026-04-08T11:21:16 1775647276

until your account gets banned.

you can figure out the fingerprinting today, but if they change it tomorrow and wait 5 months to force update everyone, they will catch you and ban

throwaw12 · 2026-04-08T09:42:16 1775641336

no massacre is justified, but can you remind us how and where did Hamas get helicopters and tanks and all of a sudden all cars were smashed? maybe Hannibal directive handed them over their tanks

throwaw12 · 2026-04-08T07:59:26 1775635166

> reason and evidence

It was upvoted by so many people actually because of reason and evidence.

Also, please stop using race card, no one is blaming a race, people are pointing out to the country who is carrying out these cruelties and majority of government supporting it and majority of army is executing the commands

k33n · 2026-04-08T09:33:41 1775640821

You guys are so correct that you have to flag everything that shows how irrational you are being.

throwaw12 · 2026-04-08T09:38:02 1775641082

I think you are attacking people for flagging it, without reading the actual content of the reply.

throwaw12 · 2026-04-07T18:43:16 1775587396

of course they're not giving access to everyone.

they better make billions directly from corporations, instead of giving them to average people who might get a chance out of poverty (but also bad actors using it to do even more bad things)

krackers · 2026-04-07T19:37:14 1775590634

Anthropic's definition of "safe AI" precludes open-source AI. This is clear if you listen to what he says in interviews, I think he might even prefer OpenAI's closed source models winning to having open-source AI (because at least in the former it's not a free-for-all)