Hacker Newsnew | past | comments | ask | show | jobs | submit | megabless123's commentslogin

current US regime*

noob question: why would increased demand result in decreased intelligence?

An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.

This is intentional? I think delivering lower quality than what was advertised and benchmarked is borderline fraud, but YMMV.

Per Anthropic’s RCA linked in Ops post for September 2025 issues:

“… To state it plainly: We never reduce model quality due to demand, time of day, or server load. …”

So according to Anthropic they are not tweaking quality setting due to demand.


And according to Google, they always delete data if requested.

And according to Meta, they always give you ALL the data they have on you when requested.


>And according to Google, they always delete data if requested.

However, the request form is on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard'.


What would you like?

An SLA-style contractually binding agreement.

I bet this is available in large enterprise agreements. How much are you willing to pay for it?

Priced in.

I guess I just don't know how to square that with my actual experiences then.

I've seen sporadic drops in reasoning skills that made me feel like it was January 2025, not 2026 ... inconsistent.


LLMs sample the next token from a conditional probability distribution, the hope is that dumb sequences are less probable but they will just happen naturally.

Funny how those probabilities consistently at 2pm UK time when all the Americans come online...

It's more like the choice between "the" and "a" than "yes" and "no".

I wouldn't doubt that these companies would deliberately degrade performance to manage load, but it's also true that humans are notoriously terrible at identifying random distributions, even with something as simple as a coin flip. It's very possible that what you view as degradation is just "bad RNG".

yep stochastic fantastic

these things are by definition hard to reason about


That's about model quality. Nothing about output quality.

Thats what is called an "overly specific denial". It sounds more palatable if you say "we deployed a newly quantized model of Opus and here are cherry picked benchmarks to show its the same", and even that they don't announce publicly.

Personally, I'd rather get queued up on a long wait time I mean not ridiculously long but I am ok waiting five minutes to get correct it at least more correct responses.

Sure, I'll take a cup of coffee while I wait (:


i’d wait any amount of time lol.

at least i would KNOW it’s overloaded and i should use a different model, try again later, or just skip AI assistance for the task altogether.


They don't advertise a certain quality. You take what they have or leave it.

> I think delivering lower quality than what was advertised and benchmarked is borderline fraud

welcome to the Silicon Valley, I guess. everything from Google Search to Uber is fraud. Uber is a classic example of this playbook, even.


If there's no way to check, then how can you claim it's fraud? :)

There is no level of quality advertised, as far as I can see.

What is "level of quality"? Doesn't this apply to any product?

In this case, it is benchmark performance. See the root post.

[flagged]


That number is a sliding window, isn't it?

I'd wager that lower tok/s vs lower quality of output would be two very different knobs to turn.

I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load

It would happen if they quietly decide to serve up more aggressively distilled / quantised / smaller models when under load.

Or just reducing the reasoning tokens.

They advertise the Opus 4.5 model. Secretly substituting a cheaper one to save costs would be fraud.

If you use the API, you pay for a specific model, yes, but even then there are "workarounds" for them, such as someone else pointed out by reducing the amount of time they let it "think".

If you use the subscriptions, the terms specifically says that beyond the caps they can limit your "model and feature usage, at our discretion".


Sure. I was separating the model - which Anthropic promises not to downgrade - and the "thinking time" - which Anthropic doesn't promise not to downgrade. It seems the latter is very likely the culprit in this case.

Old school Gemini used to do this. It was super obvious because mid day the model would go from stupid to completely brain dead. I have a screenshot of Google's FAQ on my PC from 2024-09-13 that says this (I took it to post to discord):

> How do I know which model Gemini is using in its responses?

> We believe in using the right model for the right task. We use various models at hand for specific tasks based on what we think will provide the best experience.


> We use various models at hand for specific tasks based on what we think will provide the best experience

... for Google :)


from what I understand this can come from the batching of requests.

So, a known bug?

No, basically, the requests are processed in batches, together, and the order they're listed in matters for the results, as the grid (tiles) that the GPU is ultimately processing, are different depending on what order they entered at.

So if you want batching + determinism, you need the same batch with the same order which obviously don't work when there are N+1 clients instead of just one.


Sure, but how can that lead to increased demand resulting in decreased intelligence? That is the effect we are discussing.

Small subtle errors that are only exposed at certain execution parts could be one. You might place things differently onto the GPU depending on how large the batch is, if you've found one way to be faster batch_size<1024, but another when batch_size>1024. As number of concurrent incoming requests goes up, you increase batch_size. Just one possibility, guess there could be a multitude of reasons, as it's really hard to reason about until you sit with the data in front of you. vLLM has had bugs with these sort of thing too, so wouldn't surprise me.

Wouldn't you think that was as likely to increase as decrease intelligence, so average to nil in the benchmarks?

No, I'm not sure how that'd make sense. Either you're making the correct (expected) calculations, or you're getting it wrong. Depending the type of wrong or how wrong, could go from "used #2 in attention instead of #1" so "blue" instead of "Blue" or whatever, to completely incoherent text and garbled output.

I accept errors are more likely to decrease "intelligence". But I don't see how increased load, through batching, is any more likely to increase than decrease errors.

"Abolish ICE" must be the moderate compromise position


Straight to jail


Yes, I agree. Everyone in ICE or supporting them.


Not in practice


No, Adams was a big trump supporter.


No. Racism and bigotry must always be pro-actively confronted.


Can you give some examples of his racism?


I think he literally said white people should stay away from black people.

I forget which video it is and don't want to re-watch it anyways. I Googled the specific quote and it sounds about right with my memory (which admittedly could be faulty):

"I would say, based on the current way things are going, the best advice I would give to white people is to get the hell away from Black people."

"Just get the f— away. Wherever you have to go, just get away".

I guess we could discuss whether this is straight up racist, but it sounds pretty bad to me.


Was there any particular reason why he said those things? Some event or something?


TFA has a clear example.


holy shit thats fucked


That's really horrible


You should surface the library of content to public visitors. I would be more likely to convert if I knew that you had the content I wanted.


I just updated the home page (main nav & body content) to add a "Browse All Books" button that takes you into the app to view the current titles. Appreciate the feedback.


Thanks for the feedback!

Do you think adding a button to the homepage/marketing page that says something to the effect of "See all our content" that redirected you here:

https://app.soundreads.io/

Do you think that would do the trick?


I find your comment to be both dumb and negative, now what?


not a fair comparison. water does not have alternatives.


We also do not - yet! - have carbon-neutral alternatives for energy usage, either in amount or kind. This is why people are working both to scale up the kinds of energy use that is already carbon-neutral but which there is not yet enough of, and to figure out alternatives for things that do not already have carbon-neutral solutions.


but it is a fair comparison - there's currently no real alternatives for fossil fuels. Renewables do not have the density required for a lot of transportation, not to mention fossil fuels like oil are the feedstock for a lot of chemicals required for modern day manufacturing.

Some sources of electricity could be replaced by renewables, but not all. And certainly not as cheaply as oil or gas is.


Alternative to fossil fuels are synthetic fuels, for example E85 (ethanol 85%). Government can subsidize a clean local production of ethanol from air, like it did for solar panels, to boot start it.


To what degree could we be okay using less energy?


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: