This reminds me of Amp's article last year[1]. I building my own coding agent [2]. Two goals: understand real-world agent mechanics and validate patterns I'd observed across OpenAI Codex and contemporary agents.
The core loop is straightforward: LLM + system prompt + tool calls. The differentiator is the harness, CLI, IDE extension, sandbox policies, filesystem ops (grep/sed/find). But what separates effective agents from the rest is context engineering. Anthropic and Manus has published various research articles around this topic.
After building vtcode, my takeaway: agent quality reduces to two factors, context management strategy and model capability. Architecture varies by harness, but these fundamentals remain constant.
A few months ago, I had a frustrating experience with the Gemini API while building an AI chat app as a side project. I registered through AI Studio and set up billing via Google Cloud Console, which offered a free trial with $200 in credits or 3 months of API usage. After deploying the Gemini API for my project, I navigated through the numerous settings in Google Cloud Console but forgot to set a billing limit. That month, I was charged over $250 on my credit card, well beyond the free trial allowance. It was entirely my fault for not setting a limit and not reviewing the free trial terms more carefully.
That said, while setting up the Gemini API through AI Studio is remarkably straightforward for small side projects, transitioning to production with proper billing requires navigating the labyrinth that is Google Cloud Console. The contrast between AI Studio's simplicity and the complexity of production billing setup is jarring, it's easy to miss critical settings when you're trying to figure out where everything is.
It seems that billing here (and elsewhere) for cloud is intentionally opaque. Nearly every client (at scale) is having (one of) a service provider to help manage/audit these usage.
Variable costs are great, scale with the business; but visibility is a big (intentional?) challenge.
Jensen Huang is an incredible storyteller. Lots of wisdom and original stories of the start of NVIDIA, how Sega helps them, the origin of FSD Tesla and OpenAI. Lots of personal and the growth of Jensen himself. Respectful.
The very end of the interview where he talks about coming to America and going to school in Kentucky (very rough and poor area) was a great story that most people don't know about him.
> Codex is extremely, painfully, doggedly persistent in following every last character of them
I think this is because gpt-5 (or gpt-5.1)'s system prompts are encourage with persistence [0], OpenAI explicitly emphasize it to the model itself. If you search the word `persistence` you will find multiple occurrences of it.
```
<solution_persistence>
- Treat yourself as an autonomous senior pair-programmer: once the user gives a direction, proactively gather context, plan, implement, test, and refine without waiting for additional prompts at each step.
- Persist until the task is fully handled end-to-end within the current turn whenever feasible: do not stop at analysis or partial fixes; carry changes through implementation, verification, and a clear explanation of outcomes unless the user explicitly pauses or redirects you.
- Be extremely biased for action. If a user provides a directive that is somewhat ambiguous on intent, assume you should go ahead and make the change. If the user asks a question like "should we do x?" and your answer is "yes", you should also go ahead and perform the action. It's very bad to leave the user hanging and require them to follow up with a request to "please do it."
</solution_persistence>
```
I'm curious, what made you think of that?
reply