Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is ...

nerpderp82 · 2025-09-30T18:39:22 1759257562

We must have very different workflows, I am curious about yours. What tools are you using and how are you guiding Qwen3-Coder? When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.

solarkraft · 2025-09-30T22:01:15 1759269675

You must write very elaborate prompts for 10 minutes to be worth the wait. What permissions are you giving it and how much do you care about the generated code? How much time did you spend on initial setup?

I‘ve found that the best way for myself to do LLM assisted coding at this point in time is in a somewhat tight feedback loop. I find myself wanting to refine the code and architectural approaches a fair amount as I see them coming in and latency matters a lot to me here.

CaptainOfCoit · 2025-09-30T19:10:54 1759259454

> When I am using Claude Code, it often works for 10+ minutes at a time, so I am not aware of inference speed.

Indirectly, it sounds like you're aware about the inference speed? Imagine if it took 2 minutes instead of 10 minutes, that's what the parent means.

yodon · 2025-09-30T20:33:21 1759264401

2 minutes is the worst delay. With 10 minutes, I can and do context switch to something else and use the time productively. With 2 min, I wait and get frustrated and bored.

dataangel · 2025-10-01T02:10:27 1759284627

Context switching makes you less productive compared to if you could completely finish one task before moving to the other though. in the limit an LLM that responds instantly is still better.

ripped_britches · 2025-09-30T20:08:19 1759262899

Do you use cursor or what? Interested in how you set this up

Shakahs · 2025-09-30T22:34:48 1759271688

I use it via the Kilo Code extension for VSCode, which is invoking Qwen3-Coder via a Cerebras Code subscription.

https://github.com/Kilo-Org/kilocode https://www.cerebras.ai/blog/introducing-cerebras-code

sdesol · 2025-09-30T23:11:54 1759273914

> Sonnet/Claude Code may technically be "smarter", but Qwen3-Coder on Cerebras is often more productive for me because it's just so incredibly fast.

Saying "technically" is really underselling the difference in intelligence in my opinion. Claude and Gemini are much, much smarter and I trust them to produce better code, but you honestly can't deny the excellent value that Qwen-3, the inference speed and $50/month for 25M tokens/per day brings to the table.

Since I paid for the Cerebras pro plan, I've decided to force myself to use it as much as possible for the duration of the month for developing my chat app (https://github.com/gitsense/chat) and here so some of my thoughts so far:

- Qwen3 Coder is a lot dumber when it comes to prompting as Gemini and Claude are much better at reading between the lines. However since the speed is so good, I often don't care as I can go back to the message and make some simple clarifications and try again.

- The max context window size of 128k for Qwen 3 Coder 480B on their platform can be a serious issue if you need a lot of documentation or code in context.

- I've never come close to the 25M tokens per day limit for their Pro Plan. The max I am using is 5M/day.

- The inference speed + a capable model like Qwen 3 will open up use cases most people might not have thought of before.

I will probably continue to pay for the $50 dollar plan for these use cases.

1. Applying LLM generated patches

Qwen 3 coder is very much capable of applying patches generated by Sonnet and Gemini. It is slower than what https://www.morphllm.com/ provides but it is definitely fast enough for most people to not care. The cost savings can be quite significant depending on the work.

2. Building context

Since it is so fast and because the 25M token limit per day is such a high limit for me, I am finding myself loading more files into context and just asking Qwen to identify files that I will need and/or summarize things so I can feed it into Sonnet or Gemini to save me significant money.

3. AI Assistant

Due to it's blazing speed, you can analyze a lot data fast for deterministic searches and because it can review results at such a great speed, you can do multiple search and review loops without feeling like you are waiting forever.

Given what I've experienced so far, I don't think Cerebras can be a serious platform for coding if Qwen 3 Coder is the only available model. Having said that, given the inference speed and Qwen being more than capable, I can see Cerebras becoming a massive cost savings option for many companies and developers, which is where I think they might win a lot of enterprise contracts.