An operator at load capacity can either refuse requests, or move the knobs (quantization, thinking time) so requests process faster. Both of those things make customers unhappy, but only one is obvious.
>And according to Google, they always delete data if requested.
However, the request form is on display in the bottom of a locked filing cabinet stuck in a disused lavatory with a sign on the door saying ‘Beware of the Leopard'.
LLMs sample the next token from a conditional probability distribution, the hope is that dumb sequences are less probable but they will just happen naturally.
I wouldn't doubt that these companies would deliberately degrade performance to manage load, but it's also true that humans are notoriously terrible at identifying random distributions, even with something as simple as a coin flip. It's very possible that what you view as degradation is just "bad RNG".
Thats what is called an "overly specific denial". It sounds more palatable if you say "we deployed a newly quantized model of Opus and here are cherry picked benchmarks to show its the same", and even that they don't announce publicly.
Personally, I'd rather get queued up on a long wait time I mean not ridiculously long but I am ok waiting five minutes to get correct it at least more correct responses.
I've seen some issues with garbage tokens (seemed to come from a completely different session, mentioned code I've never seen before, repeated lines over and over) during high load, suspect anthropic have some threading bugs or race conditions in their caching/inference code that only happen during very high load
If you use the API, you pay for a specific model, yes, but even then there are "workarounds" for them, such as someone else pointed out by reducing the amount of time they let it "think".
If you use the subscriptions, the terms specifically says that beyond the caps they can limit your "model and feature usage, at our discretion".
Sure. I was separating the model - which Anthropic promises not to downgrade - and the "thinking time" - which Anthropic doesn't promise not to downgrade. It seems the latter is very likely the culprit in this case.
Old school Gemini used to do this. It was super obvious because mid day the model would go from stupid to completely brain dead. I have a screenshot of Google's FAQ on my PC from 2024-09-13 that says this (I took it to post to discord):
> How do I know which model Gemini is using in its responses?
> We believe in using the right model for the right task. We use various models at hand for specific tasks based on what we think will provide the best experience.
No, basically, the requests are processed in batches, together, and the order they're listed in matters for the results, as the grid (tiles) that the GPU is ultimately processing, are different depending on what order they entered at.
So if you want batching + determinism, you need the same batch with the same order which obviously don't work when there are N+1 clients instead of just one.
Small subtle errors that are only exposed at certain execution parts could be one. You might place things differently onto the GPU depending on how large the batch is, if you've found one way to be faster batch_size<1024, but another when batch_size>1024. As number of concurrent incoming requests goes up, you increase batch_size. Just one possibility, guess there could be a multitude of reasons, as it's really hard to reason about until you sit with the data in front of you. vLLM has had bugs with these sort of thing too, so wouldn't surprise me.
No, I'm not sure how that'd make sense. Either you're making the correct (expected) calculations, or you're getting it wrong. Depending the type of wrong or how wrong, could go from "used #2 in attention instead of #1" so "blue" instead of "Blue" or whatever, to completely incoherent text and garbled output.
I accept errors are more likely to decrease "intelligence". But I don't see how increased load, through batching, is any more likely to increase than decrease errors.
I think he literally said white people should stay away from black people.
I forget which video it is and don't want to re-watch it anyways. I Googled the specific quote and it sounds about right with my memory (which admittedly could be faulty):
"I would say, based on the current way things are going, the best advice I would give to white people is to get the hell away from Black people."
"Just get the f— away. Wherever you have to go, just get away".
I guess we could discuss whether this is straight up racist, but it sounds pretty bad to me.
I just updated the home page (main nav & body content) to add a "Browse All Books" button that takes you into the app to view the current titles. Appreciate the feedback.
We also do not - yet! - have carbon-neutral alternatives for energy usage, either in amount or kind. This is why people are working both to scale up the kinds of energy use that is already carbon-neutral but which there is not yet enough of, and to figure out alternatives for things that do not already have carbon-neutral solutions.
but it is a fair comparison - there's currently no real alternatives for fossil fuels. Renewables do not have the density required for a lot of transportation, not to mention fossil fuels like oil are the feedstock for a lot of chemicals required for modern day manufacturing.
Some sources of electricity could be replaced by renewables, but not all. And certainly not as cheaply as oil or gas is.
Alternative to fossil fuels are synthetic fuels, for example E85 (ethanol 85%). Government can subsidize a clean local production of ethanol from air, like it did for solar panels, to boot start it.
reply