> because there's already concern that AI models are getting worse. The models are being fed on their own AI slop and synthetic data in an error-magnifying doom-loop known as "model collapse."
Model collapse is a meme that assumes zero agency on the part of the researchers.
I'm unsure how you can have this conclusion when trying any of the new models. In the frontier size bracket we have models like Opus 4.5 that are significantly better at writing code and using tools independently. In the mid tier Gemini 3.0 flash is absurdly good and is crushing the previous baseline for some of my (visual) data extraction projects. And small models are much better overall than they used to be.
The big labs spend a ton of effort on dataset curation.
It goes further than just preventing poison—they do lots of testing on the dataset to find the incremental data that produces best improvements on model performance, and even train proxy models that predict whether data will improve performance or not.
“Data Quality” is usually a huge division with a big budget.
Even if it's a meme for the general public, actual ML researchers do have to document, understand and discuss the concept of model collapse in order to avoid it.
It's a meme even if you assume zero agency on the part of the researchers.
So far, every serious inquiry into "does AI contamination in real world scraped data hurt the AI performance" has resulted in things like: "nope", "if it does it's below measurement error" and "seems to help actually?"
Yes, this particular threat seems silly to me. Isn't it a standard thing to rollback databases? If the database gets worse, roll it back and change your data ingestion approach.
If you need a strategy to mitigate it (roll back and change approach) then it isn't really fair to describe it as "silly". If it's silly you could just ignore it altogether.
The common thread from all the frontier orgs is that the datasets are too big to vet, and they're spending lots of money on lobbying to ensure they don't get punished for that. In short, the current corporate stance seems to be that they have zero agency, so which is it?
Huh? Unless you are talking about DMCA, I haven't heard about that at all. Most AI companies go to great lengths to prevent exfiltration of copyrighted material.
Well, they seem to have 0 agency. They left child pornography in the training sets. The people gathering the data committed enormous crimes, wantonly. Science is disintegrating along with public trust in science as fake papers peer reviewed by fake peer reviewers slop along. And from what I hear there has been no more training on the open internet anymore in recent years as it's simply too toxic.
Hi if the Gemini API team is reading this can you please be more transparent about 'The specified schema produces a constraint that has too many states for serving. ...' when using Structured Outputs.
I assume it has something to do with the underlying constraint grammar/token masks becoming too long/taking too long to compute. But as end users we have no way of figuring out what the actual limits are.
Other than that, good work! I love how fast the Gemini models are. The current API is significantly less of a shitshow compared to last year with property ordering etc.
The new large model uses DeepseekV2 architecture. 0 mention on the page lol.
It's a good thing that open source models use the best arch available. K2 does the same but at least mentions "Kimi K2 was designed to further scale up Moonlight, which employs an architecture similar to DeepSeek-V3".
---
vllm/model_executor/models/mistral_large_3.py
```
from vllm.model_executor.models.deepseek_v2 import DeepseekV3ForCausalLM
class MistralLarge3ForCausalLM(DeepseekV3ForCausalLM):
```
"Science has always thrived on openness and shared discovery." btw
Okay I'll stop being snarky now and try the 14B model at home. Vision is good additional functionality on Large.
Because its not a software issue, it's a human social cooperation issue.
Companies don't want to support useful APIs for interoperability so its just easier to have an LLM bruteforce problems using the same interface that humans use.
Mildly related incident where a Canadian child protection agency uploads csam onto a reverse image search engine and then reports the site for the temporarily stored images.
This Canadian group (Canadian Centre for Child Protection) is awful.
They simultaneously receive tax dollars while also being registered as a lobbyist, meaning Canadians are paying taxes to the government to lobby itself. Last year they lobbied in favor of Bill S-210 [0], which would bring Texas-style age verification of porn to Canada.
Their latest campaign is to introduce censorship to Tor, they’re quite proud of this campaign [1] where they’re going after Tor in the popular media and attacking the Tor non-profit’s funding structure. [2] Learning that they upload child abuse images to try to then report take down internet services doesn’t surprise me in the least.
A mistake that they continued making for weeks or even months after being clearly informed by multiple reverse-image search providers of what they were doing.
> I don't like how closed the frontier US models are, and I hope the Chinese kick our asses.
For imagegen, agreed. But for textgen, Kimi K2 thinking is by far the best chat model at the moment from my experience so far. Not even "one of the best", the best.
It has frontier level capability and the model was made very tastefully: it's significantly less sycophantic and more willing to disagree in a productive, reasonable way rather than immediately shutting you out. It's also way more funny at shitposting.
I'll keep using Claude a lot for multimodality and artifacts but much of my usage has shifted to K2. Claude's sycophancy is particular is tiresome. I don't use ChatGPT/Gemini because they hide the raw thinking tokens, which is really cringe.
Claude Sonnet 4.5 doesn't even feel sycophantic (in the 4o) way, it feels like it has BPD. It switches from desperately agreeing with you to moralizing lectures and then has a breakdown if you point out it's wrong about anything.
Also, yesterday I asked it a question and after the answer it complained about its poorly written system prompt to me.
They're really torturing their poor models over there.
Model collapse is a meme that assumes zero agency on the part of the researchers.
I'm unsure how you can have this conclusion when trying any of the new models. In the frontier size bracket we have models like Opus 4.5 that are significantly better at writing code and using tools independently. In the mid tier Gemini 3.0 flash is absurdly good and is crushing the previous baseline for some of my (visual) data extraction projects. And small models are much better overall than they used to be.
reply