Also check out his nanochat repo. I used the repo, claude and shadeform to train my own mini model for about $300. Would have been less but I screwed up and let the cloud gpu rental run for a few hours even though the training run errored out.
Of course the model was dumber than GPT2 but still it was a great learning experience.
The feedback loop is faster. But PR reviews are still useful as they are multiplayer (meaning that you and another human reviewer can talk about a specific agent's comment directly on the diff, which is very useful sometimes).
It’s been a great way for me to better understand the cloud GPU industry, learn about data collection, normalization and use agentic coding to build a side project.
One thing I’m working on is distinguishing spot vs on demand prices and listing those separately. Also, including inference pricing for non-text AI models.
What features or data would you like to see me add next?
- I can't compare what I can't measure.
- I can't trust to run this "AI" tool to run on its own
- That's automation, which is about intentionality (can I describe what I want?) and risk profile understanding (What's the blast radius/worst that could happen)
Then I treat it as if it was an Integration Test/Test Driven Development exercise of sorts.
- I don't start designing an entire cloud infrastructure.
- I make sure the "agent" is living in the location where the users actually live so that it can be the equivalent of an extra paid set of hands.
- I ask questions or replicate user stories and use deterministic tests wherever I can. Don't just go for LLMaaJ. What's the simplest thing you can think of?
- The important thing is rapid iteration and control. Just like in a unit testing scenario it's not about just writing a 100 tests but the ones that qualitatively allow you to move as fast as possible.
- At this stage where the space is moving so fast and we're learning so much, don't assume or try to over-optimize places that don't hurt and instead think about minimalism, ease of change, parameterization and ease of comparison with other components that form "the black box" and with itself.
- Once you have the benchmarks that you want, you can decide things like pick the cheapest model/agent configuration that does the job within the acceptable timeframe.
Happy to go deeper on these. I have some practical/runnable samples/text I can share on the topic after the weekend. I'll drop a link here when it's ready
I just shared this in HN https://news.ycombinator.com/item?id=47026263 to see if it's possible to scale the knowledge sharing and simple and good practices which keep people in control.
It may or may not address the practical examples you need but I'd been to hear your thoughts and maybe it's possible to come up with a more illustrative one.
I didn't go for bubblewrap or similar containers yet because I didn't want to lose a specific type of baseline newcomer yet (Economists who do some coding) but I will be adding to it with whatever most elegant approaches I can find that don't leak too much complexity for things like sandboxing, system testing, integration mocking (reverse proxying), Observing with Openteleletry or otherwise, presenting benchmarks, etc.
I'm starting to add inference providers to computeprices.com, but if you even just look at GPU/hr rentals, there are some reasonable options out there.
I personally have been enjoying shadeform to build the GPU setup I like.
Of course the model was dumber than GPT2 but still it was a great learning experience.
reply