Not only that but they seem to have cut my plan ability to use Sonnet too. I have a routine that used to use about 40% of my 5 hour max plan tokens, then since yesterday it gets stopped because it uses the whole 100%. Anyone else experience this?
yeah it seems like sonnet 4.6 burns thru tokens crazy fast. I did one prompt, sonnet misunderstood it as 'generate an image of this' and used all of my free tokens.
It's the other way around. Cashier's spend their 4 percent, where's the lawyers probably save it. Though of course median salary for the two categories means 4 percent change is different in absolute dollars
HN is a pretty influencial forum. Lots of tech journalists in mainstream media use it to get a pulse on what the SV/VC/Startup/BigTech crowd and adjacencies are talking about.
I got Codex CLI running against it and was sadly very unimpressed - it got stuck in a loop running "ls" for some reason when I asked it to create a new file.
You probably have seen it by now, but there was a llama.cpp issue that was fixed earlier today(?) to avoid looping and other sub-par results. Need to update llama-server as well as redownload the GGUFs (for certain quants).
Yes sadly that sometimes happens - the issue is Codex CLI / Claude Code were designed for GPT / Claude models specifically, so it'll be hard for OSS models directly to utilize the full spec / tools etc, and might get loops sometimes - I would maybe try the MXFP4_MOE quant to see if it helps, and maybe try Qwen CLI (was planning to make a guide for it as well)
I guess until we see the day OSS models truly utilize Codex / CC very well, then local models will really take off
I would recommend you fiddle with the repeat penalty flags. I use local models often, and almost all I've tried needed that to prevent loops.
I'd also recommend dropping temperature down to 0. Any high temperature value feels like instructing the model "copy this homework from me but don't make it obvious".
reply