But… it doesn’t matter? Even if it was some very illegal drug, that doesn’t change the fact that this detention system (and Japans justice system in general) is quite inhumane.
Be absolutely ruthless with technical debt. Opus is perfectly capable of producing idiomatic code in any mainstream language you please, but will seize on any opportunity to justify writing basically-python instead because that's "consistent" with the "convention". Deprive it of that excuse.
Yeah that’s basically what I mean! I have no issues wrangling it myself, but now I’m curious how those who are managing “fleets” of agents while shipping four features a day are doing it. They’re not, I’d assume?
Give it coding guidelines. It'll largely try to do what you ask.
Left to itself, it often follows human developers who conceive of their goal as "get the program working, the end justifies the means." Which makes sense because there are a lot of systems like that in the training corpus.
Oh that's fascinating. 3.6 27B is pretty damned good, but slow in wall-clock times on my DGX Spark-alike. It generates huge reams of thinking before it gets the (usually correct!) answer, so wall-clock time is rough for tasks even at ~20tk/s
I'm surprised the 26B-A4B is better? It should be faster too, interesting. I'm excited to try 31B with MTP, because MTP-2 is what makes 27B bearable on the GB10.
What are you using it for? Agent-based coding, or something else?
General purpose, mostly internet research in the form of slow-crawling. (Emphasis on slow - I've ultimately landed on Scrapling's API for seamless content rendering, and I use image support so as not to exclude informative images or weirdly rendered text.)
For coding I don't need image support so I stuff the entire GPU with text-only mode. I don't have a workflow where I send LLMs off to generate thousands of lines of code but what little coding I did I did with Qwen3.6 and it was spectacular, as you likely suggest.
I didn't say shipping a day. I said shipping at the same time.
The review comes at the end, though I truly believe this will go away as well. Agents will also get better at review until they're good enough that no one will want to do it anyways. Good enough is good enough.
Claude absolutely improves code review quality, but it still misses a lot. It's a second pair of eyes, it doesn't replace/remove the work you have to put in to fully review the code yourself.
It's like saying that you code reviewed faster just because someone else also reviewed the code, that's not how it works.
Agree, and with CC my volume and quality of PR review has substantially increased since 4.5. Without CC for review we would have a ridiculous bottleneck in our dev/qa pipeline.
I'm faster, sure, but more thorough, no. The same, because I was already very careful. But it's not a massive win either; 4.7 misses too much still because it would need to read too much of the context each time to understand the architectural problems I'm catching.
Its nice to not have to care about nits and other things that we don't have lints for though, so that's useful.
reply