More

impulser_ · 2026-06-11T22:11:21 1781215881

Because there is literally nothing special about coding hardnesses. The models are doing all the lifting. It just user experience that separates them.

A coding hardness with just bash outperforms Codex, Claude Code, OpenCode, Pi ect. The added features are just user experience features.

Supermancho · 2026-06-11T22:26:07 1781216767

If harnesses are basically doing nothing, why would these metrics vary so widely?

https://www.endorlabs.com/research/ai-code-security-benchmar...

There's a lot of ways to configure agents and any implicit configuration to harnesses may have a non-trivial effect.

impulser_ · 2026-06-11T22:39:52 1781217592

It's because they do things that is why they score differently. Coding hardness add features for user experience not for agent efficiency. If they did all the coding hardnesses would be using bash and code mode and letting the agents write code to perform tasks but this doesn't work because you want humans in the loop. You want users to be able to approve and deny writes. You want uses to see edits. So you have to build tool for these. It's hard to show diffs when the agent is just using bash.

Supermancho · 2026-06-11T23:13:52 1781219632

> The added features are just user experience features.

> It's because they do things that is why they score differently.

That was my point. Regardless of how you feel about UX, it's a value added set of features. The question initially posited, stands. Why would a company do any of these things?

> Coding hardness add features for user experience not for agent efficiency.

Pretending it was always about some metric you just decided was important is moving the goalpost. It's not compelling.

I think it makes more sense that it's Freemium Dominance or they act as Low-Cost Marketing tools.

avadodin · 2026-06-11T22:40:43 1781217643

A harness(notice the lack of a 'd') is a strap system to gain control over something.

Like the thing people attach a dog lead to so that their kids won't just go kamikaze into a car.

Coding harnesses are named by analogy to that.

They are not hard.

TurdF3rguson · 2026-06-11T22:56:39 1781218599

The reason I have a dog harness is to distributes weight so I don't choke her when she goes at the other dog that she doesn't like. I'm actually puzzling over kids kamikazeing into cars

imp0cat · 2026-06-12T05:04:15 1781240655

It's actually only a problem if it's the other way around, isn't it?

If kids run into a car, they will most probably just bounce and continue, perhaps inflicting some minor damage. But if a car mows down a kid, that could well be a fatal injury. Leashes for all the cars! ;)

cassianoleal · 2026-06-12T08:22:39 1781252559

A kid running into the side of a car that’s moving over a certain speed will be lucky to break only a few bones.

avadodin · 2026-06-11T23:11:17 1781219477

It is a common fear for parents. Obviously they are not fighting for the emperor but chasing or running away from something.

The strapped kids are often normal with no apparent disabilities(but it is possible they have an ADHD diagnosis).

Never thought about doing it to my own.

impulser_ · 2026-06-11T23:38:09 1781221089

You got to miss spell these days or people assume your ai :)

IncRnd · 2026-06-12T00:26:17 1781223977

That's very punnyy

Its like yuo're on fire!

vidarh · 2026-06-12T08:01:37 1781251297

Try Kimi in Kimi CLI and Claude Code and try saying that again. Kimi quickly collapses into tool calling loops without measures in their CLI but not in Claude Code and is largely useless for any long running tasks in harnesses not taking this into account.

With those measures (which are actually quite interesting) it can at times perform at Sonnet level.

cookiengineer · 2026-06-12T03:20:11 1781234411

I would disagree here.

Building a good and working coding harness with smaller models is really hard. Everything evolves around the limited context size.

Tools must be specification driven to reduce noise and high temp hallucinations, tool call shrinking needs to remove errors and tryouts of different formats of parameters (because LLMs always ignore descriptions in the JSON...), and you have to deal with long running agents because you can't afford them. Planner/orchestrator architecture, agent to agent communication need to be summarized, and then you have the messed up scheduling parts, because you need to prioritize short running agents and give the planner a tool to wait for outputs of spawned contractor agents.

And that's not even talking about sandbox vs playground read/write/access policies of tools.

Harness engineering, if done correctly, is quite hard.

And all of this works 60% of the time, every time.

Anyways, that was somewhat the summary of the last 6 months building my exocomp agentic environment. And it's still not satisfying to work with.

calgoo · 2026-06-12T10:34:04 1781260444

In my limited experience, the smaller the model, the bigger the harness. Where with something like claude or deepseek the context size etc just let's you give it bash access and step back; small models tends to do better with simple action - response , new context each call. Context management becomes a continuous activity. Its a fun space , and I have found big models decent at building and improving these harnesses for the small ones. Using /loop and just run a continuous test - build - test loop.

selcuka · 2026-06-12T00:20:59 1781223659

Your reply doesn't answer the question: What is their motivation for any of it?

impulser_ · 2026-06-11T03:50:27 1781149827

It's kinda weird to think the Chinese AI labs might be more trust worthy than the US labs.

- Anthropic is ran by a bunch of nut jobs.

- OpenAI is ran by a guy you can't trust.

I don't even know if we should include DeepMind, Meta, or xAi in the conversation of AI labs at this point since they can't produce models better than Chinese labs.

reasonableklout · 2026-06-11T04:31:59 1781152319

To be fair, nerfing Claude on frontier research tasks is consistent with Anthropic's stated beliefs. So in that sense you can trust them to always behave consistently if strangely. But this launch was done very poorly with the lack of transparency on when the frontier research policy was violated.

impulser_ · 2026-06-11T05:29:21 1781155761

Yeah and their belief are fucking crazy and dangerous. They are literally sabotaging their users. They built in malware into their model if you prompt it about training a fucking AI model. It doesn't tell you, no it literally sabotages you by editing your prompt and intentionally goes against your request.

You want fucking nut jobs like this building models?

It's one thing to build safeguards on your model and have it prompt the user back. I'm sorry I can't help you with this request. Chinese models do this for some requests.

It's another thing to actively try to make the model perform worst for your user on purpose because it asked the model to do something you, the model creator, didn't like.

Imagine someone is asking a logical medical question and the model swaps the prompt and purpose being less intelligent and gives bad advice to this person.

How do these people not understand they are stupid.

noduerme · 2026-06-11T08:21:25 1781166085

Is it really crazy to nerf a proprietary model to prevent it from training another model? I don't think that's even remotely similar to giving bad medical advice.

preg_match · 2026-06-11T16:13:33 1781194413

It’s not a nerf, it’s sabotage. That’s different. This is like if you’re driving a car and it detects your pulling up to a competing dealership so it cuts the brakes.

This is, in my mind, effectively malware. We don’t know exactly what code the model will inject, and we certainly don’t know when it will happen. It could very easily introduce vulnerabilities.

Balinares · 2026-06-11T12:07:29 1781179649

Given that the "proprietary" model is built on stolen work at an unprecedented scale, it's at the very least hypocritical to a degree that would not be possible without a fundamentally amoral mindset.

skeledrew · 2026-06-11T13:56:41 1781186201

> You want fucking nut jobs like this building models?

It takes *nut jobs" to advance tech like this at the speed it is. They have strong beliefs and they work hard to realize those beliefs.

piyuv · 2026-06-11T17:55:18 1781200518

The thing with Chinese AI labs is that you don’t need to trust them. They publish the models, you can run them on-prem or rent a beefy VPS.

redox99 · 2026-06-11T17:24:35 1781198675

Deepmind is definitely frontier. They just don't care that much about code.

impulser_ · 2026-06-09T17:22:55 1781025775

Every model release is just proof that AGI will most likely only be for the rich. We are a few years into LLMs and majority of people are already getting priced out of intelligence from LLMs and these are no where near AGI.

modeless · 2026-06-09T17:28:25 1781026105

This is like looking at mainframe pricing in 1990 and concluding that PCs will only be for the rich. The price of each new level of capability is going to drop like crazy very quickly. It won't be that long before practically any consumer use case will be possible on models that are dirt cheap.

weakfish · 2026-06-09T17:51:53 1781027513

This premise is based around the assumption that Moore's law is still working, which it very much isn't [0]

[0] https://cap.csail.mit.edu/death-moores-law-what-it-means-and...

andrewmunsell · 2026-06-09T18:06:40 1781028400

Improvements in model performance aren't always strictly compute-constrained in a way that makes them reliant on Moore's Law. Open weight models-- in particular, from Chinese labs-- are optimizing model intelligence with less compute. They're "behind" frontier models by months, but as others have noted, it's possible to get Sonnet 4.5+ level performance at reduced cost, today, from open weight labs.

modeless · 2026-06-09T18:43:52 1781030632

No, I'm not assuming Moore's law. The efficiency of AI datacenters will continue to improve even without Moore's law, but more importantly the efficiency of packing intelligence into gigabytes and FLOPS will improve by leaps and bounds over the coming years, just as it has for the past few years if not faster.

calf · 2026-06-09T22:54:35 1781045675

Then you're assuming an efficiency that is analogous to how Moore's law made it efficient for chips. Same difference. The problem is that AI scaling in the longest term is a completely unknown problem.

modeless · 2026-06-10T15:15:24 1781104524

Training improvements and Moore's Law are "analogous" but not "same difference." They are far from the same thing, governed by completely different factors, and one can happen and has been happening independently from the other.

calf · 2026-06-10T18:36:52 1781116612

Well I never said nor meant that, rather, my third (3) sentence should've hinted that I already believe what you are saying in your second sentence (2). Whereas my second (2) sentence was handwaving at the notion that if the parent commenter's remark (about improvment trends) were to be assumed then the rational argument must be subject to the same standards, ergo same difference (in argument standards). (Also I use a phone, please excuse any confusion due to not spelling out my online opinions in full)

To clarify another way, it seems the parent commenter and obviously many, many lay people seem to think ALL sorts of technology improves eventually and are always very assured of that. That's a common mistaken premise or axiom used in their arguments. (Arguably Moore's law (up until now) has been a factor in confounding this observation because so much other tech has historically benefited from it directly or indirectly)

modeless · 2026-06-13T04:45:36 1781325936

Sorry, but a plain reading of your comment does not imply at all that you agree with me, rather the opposite. I'm not basing my opinion on any mistaken axiom of inevitable technology improvement, of course. I'm projecting obvious trends of the past few years which are overwhelmingly likely to continue in the medium term.

"Same difference" could only mean that you believe my argument should fail in the same way as an argument based on Moore's law. If that's not what you meant then you should have used different words. If that is what you meant, with the justification that "AI scaling in the longest term is a completely unknown problem", I disagree with that too.

In the "longest term" the ultimate scaling of AI doesn't matter for the original question of whether "AGI will most likely only be for the rich". Nobody looks at the TOP500 list today and says "computing is only for the rich". This is because we have an abundance of iPhones and gaming PCs in the consumer market, providing practically any application of computing that a consumer could want at very attainable prices. Similarly, practically any application of AGI will be accessible to consumers at attainable prices. Continued AI scaling after a certain point will be relevant mostly to industry (whose products will still be priced attainably, analogously to the way weather forecasts produced on TOP500 supercomputers are readily accessible to the public today).

ishurand4 · 2026-06-10T22:45:17 1781131517

Its a quadratic graph. It starts low but not that capable, gets better and more expensive, and then the time comes in which the capability needed is not the ones of the frontier models and then the price goes down on the companies who host the models that the capability is "good enough"

hootz · 2026-06-09T17:27:24 1781026044

You are only priced out if you only care for SOTA right now and can't wait for the inevitable cheap model coming in 6 months. DeepSeek, Xiaomi and Moonshot are already really cheap and match frontier performance from 6 months ago.

dyauspitr · 2026-06-09T17:54:52 1781027692

But they’re artificially cheap. When will they be cheap while the company makes a profit.

hootz · 2026-06-09T18:00:53 1781028053

They are not artificially cheap, they are still cheap even when hosted by independent inference providers. Are all providers subsidizing their open-weight models?

modeless · 2026-06-09T19:22:31 1781032951

Nobody's making profits right now, not because they're selling tokens for less than their cost but because they're always investing in the next bigger model.

dyauspitr · 2026-06-09T17:54:08 1781027648

Hardware manufacturing hasn’t caught up yet. Once it does, especially in China these token prices are going to drop hard.

impulser_ · 2026-06-08T22:43:35 1780958615

They are using Google Cloud.

https://security.apple.com/blog/expanding-pcc/?linkId=100000...

"Now, we are collaborating with Google and NVIDIA to run new Apple Intelligence workloads on Google Cloud, extending our industry-leading PCC privacy commitments to third-party data centers for the first time."

btown · 2026-06-09T02:02:13 1780970533

Per that link: I think there's an interesting question about whether a nefarious actor who's infiltrated a cloud provider with physical access to machines that are running signed operating systems, with signed binaries, with TDX remote attestation, and with hardware supply chain verification, has the ability to break the privacy guarantees of a tenant with Apple's sophistication.

Certainly, one could tamper with the hardware, but could one do it in a way that wouldn't get that machine immediately flagged, removed from the routing pool, and told to wipe its memory immediately, by a watchtower (perhaps even the routing layer itself) that runs in a separate secure Apple datacenter?

Cassell · 2026-06-09T07:38:43 1780990723

Those datacentres would be in the same position of trust as a VPN provider in that the data must be unencrypted at points in the process.

They could be making it very safe, and the things apple says they are doing would make it as safe as possible, but as a user there is no way of verifying the claims.

freedomben · 2026-06-09T12:35:21 1781008521

> as a user there is no way of verifying the claims

I think this sums up what it's like to be an Apple user pretty well. With their heavy proprietary and closed approach, all users can do is "trust" them.

brookst · 2026-06-09T13:46:18 1781012778

Have you read the PCC whitepapers? Are you saying the user-facing verification methods in them are insufficient, or vulnerable, or just false?

Cassell · 2026-06-09T21:48:51 1781041731

The previous argument was wrong and imprecise, as it could be used against any modern technology, none of which can be fully understood by a user, in the sense that any vulnerability would be completely invisible.

It’s clear they have made a very intelligent approach to this system.

rasz · 2026-06-09T12:27:35 1781008055

>nefarious actor who's infiltrated a cloud provider

Google is buying that compute from xAI aka Musk

RobMurray · 2026-06-09T14:57:29 1781017049

Apple could simply be ordered to include a hardware backdoor, and legally be prevented from talking about it. Everything else in the architecture could work exactly the way they claim in the PCC paper.

zelon88 · 2026-06-09T03:22:55 1780975375

Spoiler alert; Google is the nefarious actor.

impulser_ · 2026-06-09T03:45:34 1780976734

I think the last thing Google wants to do is get on the bad side of their largest partners.

mrighele · 2026-06-09T07:42:39 1780990959

their largest partner is probably the US government.

TeMPOraL · 2026-06-09T08:20:24 1780993224

Which is...

Wrong answer. Or at least, obvious and not particularly useful.

Truth is, none of those parties are "nefarious" - they're all just not on your side. And "security" is never an unqualified good thing to have (it's not an unqualified bad thing either). It's just a framework of coercion.

The most important questions to answer about any security system is, what is being protected, for who, and from who. People don't ask that much, not even in the industry - it's an implicit assumption that everyone themselves is a "good person" and is on the protected side of security systems. And then they're confused because it turns out end-users are more often seen as threat actors. All the players mention, but perhaps especially Apple, in its own special way, is protecting the computer from the user just as much as they're protecting the user/user's data from third parties.

saagarjha · 2026-06-09T13:10:05 1781010605

It's not.

SoftTalker · 2026-06-09T02:56:05 1780973765

Why bother with all that cloak and dagger stuff when they can just buy the data? You believe Apple and/or Google isn't selling it? I have some land in Florida I'd like to talk about.

appplication · 2026-06-09T03:43:40 1780976620

Having worked at Apple, I will say I firmly believe they do not sell data. I worked in data science and we had the shittiest inference because we had essentially no access, even internally, to longitudinal or cross-app user data. Best we had was 15 minute rotating sessions for a single app. There are internal teams dedicated to deanonymizing data to try to narrow down users - if they can successfully do so, and relevant fields that lead to deanonymization get permanently purged from internal logging.

I can’t speak to the current architecture but Apple has shown a consistent willingness to sacrifice access to user data in the name of selling privacy instead at a premium price (you could argue precisely because no one of their competition have any meaningful posture on this). I do believe they are quite serious in their commitment to that, as they have found this strategy to be more valuable than the data itself.

tjoff · 2026-06-09T05:46:59 1780984019

But sending sensitive private audio recordings to the lowest bidder is par for the course?

https://www.bbc.com/news/technology-49502292

silvandeboer · 2026-06-09T06:34:32 1780986872

This comment makes it sound like they sold private recordings to whomever was willing to pay for them, but they paid third parties to evaluate Siri recordings.

tjoff · 2026-06-09T09:05:03 1780995903

Don't really agree with that, that would have been highest bidder if anything.

And it wouldn't have been much worse compared to be as careless as they have been.

wartywhoa23 · 2026-06-09T06:47:11 1780987631

> Having worked at Apple, I will say I firmly believe they do not sell data.

Selling data is so shabby! Why sell when you can just give it away to letter-soup friends?

fragmede · 2026-06-09T10:05:48 1780999548

Because that's not legal, so they sell it to third party data brokers and it gets resold to someone the TLAs can buy it legally from.

yencabulator · 2026-06-12T19:34:42 1781292882

US wiretapping operations are not bound by publicly-visible laws.

https://en.wikipedia.org/wiki/United_States_Foreign_Intellig...

https://en.wikipedia.org/wiki/Foreign_Intelligence_Surveilla...

https://en.wikipedia.org/wiki/NSA_warrantless_surveillance_(...

https://en.wikipedia.org/wiki/Room_641A

wartywhoa23 · 2026-06-09T11:06:29 1781003189

Illegal to share data with entities that are themselves law enforcement, and which they are known to be demanding, not just asking to share out of good will?

boringg · 2026-06-09T14:15:59 1781014559

Apple's incentives don't align to sell private data as their whole thing is privacy. They do that they tank their business. If you have proof that they are doing it -- I'd love to see it. (*3rd party actors from an app re-selling data doesn't count)

Google is 100% doing that because thats their entire incentive for the business. They sell low cost software / subsidized hardware on the grounds that you pay with your sharing data. That's the implied cost.

Show me the incentives - I will show you the outcomes.

cheriot · 2026-06-09T06:43:13 1780987393

Apple/Google make less money if they sell the data because their ad product would no longer have an advantage. So no, I don't think they do that.

materielle · 2026-06-09T00:06:35 1780963595

That’s not so special, though? There’s a difference between Google infra running Google services.

Versus any F500 company running their services on GCP.

It’s a bit whacky to think about because Apple will operate Google owned software on GCP. But it should be sandboxed just the same.

I’m not making a normative privacy argument here. Just pointing out that this is cloud business as usual. Perhaps it’s interesting Apple is doing it, but basically everything else is already using either AWS or GCP at this point.

airstrike · 2026-06-09T00:46:52 1780966012

I think the difference is scale. This is Apple, so it's an enormous amount of devices. And it's a seamless experience, to the user, going from local model to cloud models.

So the question about which model Apple was going to use and where has been highly anticipated, especially by the likes of OpenAI and Anthropic. Imagine if either one could say they have Apple as their customer?

Apple certainly has the cash to burn if they wanted to train their own model, but it also always seemed out of their core competency. This is a major win for Google.

So "business as usual" but with huge implications for the AI ecosystem in general.

Someone · 2026-06-09T05:22:40 1780982560

Google Cloud, but, the way I read it, not Google’s AI offerings. They, basically, hire Google servers to run their software on it.

They also (claim to) ensure those servers run only software they have approved to run on it.

(Part of their software are models derived from Google Gemini, but that’s orthogonal to this)

fauigerzigerk · 2026-06-09T07:06:45 1780988805

>(Part of their software are models derived from Google Gemini, but that’s orthogonal to this)

You're right that it is orthogonal to the privacy promises Apple makes to its own users.

The moralistic and righteous undertone in their marketing material is questionable though given that these Apple services might not exist if Google didn't exploit Gemini app user data on Android the way it does.

That's fine with me. Users have a choice here. In fact, it's a big improvement over the search deal with Google where Apple sends its own users directly to Google.

huslage · 2026-06-09T02:11:15 1780971075

They are not _only_ using Google Cloud. They continue to build and invest in their own datacenters. It's not a binary choice.

impulser_ · 2026-06-09T03:47:21 1780976841

Yeah, but the models are running in Google Cloud which makes sense they are based on Gemini.

huslage · 2026-06-09T16:45:12 1781023512

They appear to be running them on both GCP and in their datacenter.

dofm · 2026-06-08T22:48:15 1780958895

That is news — I guess not very surprising that they'd need more data centres than before.

But again there is no Apple-to-Google transfer in the inference in the sense of the comment I was originally replying to (I am not suggesting you're implying otherwise, obviously)

But I stand happily corrected where I said they aren't in the picture at all.

That is an interesting press release because it outlines what they would have had to do with any data centre they were outsourcing to.

impulser_ · 2026-06-08T22:49:41 1780958981

This is probably why Google had to rent compute from SpaceX. They needed to free up NVIDIA GPUs for Apple so they probably moved internal workloads to SpaceX compute.

didibus · 2026-06-09T06:31:00 1780986660

Google likely won't rent compute from SpaceX, they have a substantial share of SpaceX (they own 5% of it) and need the IPO to be valued highly, so to prop up the IPO stock, they made this announcement, but if you read the fine print, both SpaceX and Google are allowed to cancel it at any time, as-in, after they cash out from the IPO.

ezfe · 2026-06-09T00:49:30 1780966170

iCloud already uses Google Cloud, so that still doesn't change the operational boundaries of where data goes

LoganDark · 2026-06-09T01:02:05 1780966925

I hope they are still using PCC hardware rather than running private data through third-party servers.

impulser_ · 2026-06-02T00:26:54 1780360014

I don't think you understand the size of the US capital market. We are talking probably ~150 trillion.

It's easy as fuck for Google to raise this money because they are a money printing business. They are the most profitable company in the world, so for anyone this is basically the same as buying US debt.

onlyrealcuzzo · 2026-06-02T00:59:30 1780361970

> We are talking probably ~150 trillion.

Yes, but we are talking about liquidity not valuations...

WarmWash · 2026-06-02T01:26:52 1780363612

Believe it or not there is actually a shortage of assets and an excess of money. That's in part why valuations are so bonkers.

impulser_ · 2026-06-02T01:29:34 1780363774

This is equities + bonds which are pretty liquid assets.

impulser_ · 2026-06-02T00:22:58 1780359778

Yeah, but Google has the money for this. They are quite literally the most profitable company in the world. They are only raising because they don't want to harm there other businesses buy eating up their capital for this.

Why do you think there will only be one winner?

dsl · 2026-06-02T09:35:13 1780392913

> Yeah, but Google has the money for this. They are quite literally the most profitable company in the world.

"Alphabet announced that its 2026 capital expenditures are expected to be $180-$190 billion, and that it expects 2027 capital expenditures to significantly increase [...] over the 12 months ended March 31, 2026, Alphabet generated $174 billion of operating cash flow"

impulser_ · 2026-05-31T00:22:54 1780186974

"If all of this was done to better humanity, AI development would be done in public, data would be legally obtained, models would be released for free, access wouldn't be gatekept behind ever increasing subscription costs."

The vast majority of AI development is public. There are papers literally every single day to read. In fact everything you need to build Claude and GPT models is public. Thanks to Google, DeepSeek, and all the other research labs. There are more research labs than there are closed shops. In fact there really is only one Anthropic, and lately maybe OpenAI. Google still releases papers all the time on AI.

There are more open source models than closed source models and all of them are accessible without a subscription. Yeah you still need to pay for them, but hey as we build out infrastructure and more time is put into efficient models today will easily run on person compute of the future.

danaris · 2026-05-31T06:41:43 1780209703

There are more ants than humans in the world, too.

But which one is driving major changes to the world?

Just because you can point to an absolute number of open source models doesn't mean much when the models that 99.9% of the world cares about aren't.

impulser_ · 2026-05-31T18:32:53 1780252373

"models that 99.9% of the world cares about aren't."

Software engineers aren't 99.9% of the world.

xigoi · 2026-05-31T05:56:25 1780206985

Is there any mainstream model that is actually open-source (not just open-weight)?

cpldcpu · 2026-05-31T08:06:49 1780214809

What do you mean with "open-source"? Of course, the inference code for all the open weight models is publically available - see llama.cpp or hf transformers.

There are, however, very few models where also the full training pipeline is available. Olmo by AI2 comes to mind.

impulser_ · 2026-05-30T03:46:50 1780112810

Harnesses aren't really going to change much of the performance on models like Opus, and GPT.

You literally can just give the model a bash tool and it will do just fine in fact it will most likely do better than majority of harnesses due to how well models are at bash.

The model do all the lifting. It really doesn't matter which harness you use.

impulser_ · 2026-05-29T20:50:38 1780087838

People need to stop thinking that LLMs actually know what they are. They don't. They don't know their Qwen, They don't know they are Opus. They don't even know they are an LLM.

This is why literally every single model's system prompt starts with something like:

"You're Claude Opus a large language model from Anthropic"

free_bip · 2026-05-29T21:26:15 1780089975

They can easily add this "individuality" in RLHF. The base model won't know, correct, but the final user-facing model very much can if that's what they so desire.

impulser_ · 2026-05-28T16:57:59 1779987479

Crazy they bring up honest, when Claude models are literally known for straight up lying about things it has done and tries to act like it did what you asked.

wasabi991011 · 2026-05-28T17:17:43 1779988663

Which is why they brought it up as something they are trying to improve.

boxed · 2026-05-28T16:59:13 1779987553

Less than other frontier models. Which is scary honestly.

impulser_ · 2026-05-28T17:03:37 1779987817

No. GPT models follow instructions significantly better than Claude models.

You tell it too research a repo to find a piece of code it will. Claude will just read the README and guess.

qaq · 2026-05-28T17:07:49 1779988069

I have a codex session I am using to vibe code a db thats being going for like 3 month. Still doing OK. Try that in CC.

ishurand4 · 2026-05-28T20:00:18 1779998418

What's the token usage at?