More

wsxiaoys · 2026-03-23T09:31:36 1774258296

> Would be interesting to know how much a jj-specific SKILL.md would race the score.

That is definitely something we're interested in; we will try running this evaluation with skills soon.

> This might not fit the evaluation framework, but I'd still be interested in your experience/setup with terminal-based coding agents like Claude Code.

We have adopted Harbor as our evaluation framework, so evaluating Claude Code is straightforward: https://harborframework.com/docs/agents#installed-agents

wsxiaoys · 2026-01-13T17:13:26 1768324406

Hi HN,

I wrote a 4-part series on how we built the AI edit model behind Pochi’s coding agent.

It covers everything from real-time context management and request lifecycles to dynamically rendering code edits using only VS Code’s public APIs.

I’ve written this as openly and concretely as possible, with implementation details and trade-offs.

Full series:

1. The Edit Model Behind Tab Completion: https://docs.getpochi.com/developer-updates/how-we-created-n...

2. Real-Time Context Management in Your Code Editor: https://docs.getpochi.com/developer-updates/context-manageme...

3. Request Management Under Continuous Typing: https://docs.getpochi.com/developer-updates/request-manageme...

4. Dynamic Rendering Strategies for AI Code Edits: https://docs.getpochi.com/developer-updates/dynamic-renderin...

wsxiaoys · 2026-01-08T17:27:35 1767893255

OP here - I've talked in detail about how we rendered NES suggestions using only VS Code public APIs.

Most tools fork the editor or build a custom IDE so they can skip the hard interaction problems.

Our NES is a VS Code–native feature. That meant living inside strict performance budgets and interaction patterns that were never designed for LLMs proposing multi-line, structural edits in real time.

In this case, surfacing enough context for an AI suggestion to be actionable, without stealing attention, is much harder.

That pushed us toward a dynamic rendering strategy instead of a single AI suggestion UI. Each path gets deliberately scoped to the situations where it performs best, aligning it with the least disruptive representation for a given edit.

If AI is going to live inside real editors, I think this is the layer that actually matters.

Happy to hear your thoughts!

wsxiaoys · 2026-01-05T17:54:25 1767635665

OP here - happy to answer any questions.

This was one of the more unexpectedly tricky layers of building real-time LLM suggestions, and I’d love to hear how others have approached timing, cancellation, and speculative prediction in their editors or agents.

wsxiaoys · 2025-12-08T14:59:57 1765205997

OP here - this is Part 2 of a series documenting how we built NES (Next Edit Suggestions), our real-time edit model inside the Pochi editor extension.

The real challenge (and what ultimately determines whether NES feels “intent-aware”) was how we manage context in real time while the developer is editing live. For anyone building real-time AI inside editors, IDEs, or interactive tools.

I hope you find this interesting. Happy to answer any questions!

wsxiaoys · 2025-11-20T01:51:09 1763603469

I’ve been experimenting with next-edit prediction for a while and wrote up how we trained the edit model that powers our Tab completion feature. This post is part of a broader series where we share how we built this feature from the low-level modeling right up to the editor extension.

The cool part is we fine-tuned Gemini Flash Lite with LoRA instead of an OSS model, helping us avoid all the infra overhead and giving us faster responses with lower compute cost.

wsxiaoys · 2025-10-12T21:19:41 1760303981

I've spent the last few months working on a custom RL model for coding tasks. The biggest headache has been the lack of good tooling for tuning the autorater's prompt. (That's the judge that gives the training feedback.) The process is like any other quality-focused task—running batch rating jobs and doing SxS evaluations—but the tooling really falls short. I think I'll have to build my own tools once I wrap up the current project

wsxiaoys · on Jan 13, 2025

Appreciated! Fixed

wsxiaoys · on Jan 13, 2025

> So using 2 NVLinked GPU's with inference is not supported?

To make better use of multiple GPUs, we suggest employing a dedicated backend for serving the model. Please refer to https://tabby.tabbyml.com/docs/references/models-http-api/vl... for an example

SOLAR_FIELDS · on Jan 13, 2025

I see. So this is like, I can have tabby be my LLM server with this limitation or I can just turn that feature off and point tabby at my self hosted LLM as any other OpenAI compatible endpoint?

wsxiaoys · on Jan 13, 2025

Yes - however, the FIM model requires careful configuration to properly set the prompt template.

wsxiaoys · on Jan 13, 2025

Tabby comes with builtin RAG support so you can add this api framework to it.

Example: https://demo.tabbyml.com/search/how-to-configure-sso-in-tabb...

Settings page: https://demo.tabbyml.com/settings/providers/doc