I hadn’t heard of libgit2. I wish more applications would expose library-style access, preferably available across different languages!
May not work for apps that want to launch their own threads and processes. But for almost everything else, I prefer function calls to launching processes, managing their lifecycle, communicating via stdout etc. If I wanted to do that, I’d be writing Bash ;)
Have you been satisfied with the quality of code generated by the model? Or did you have to tweak some rule file or skill to improve it? Or is human-readable code not even a goal at this point?
We spent a lot of time tweaking skills, doc files, and prompts. I’d say that was our primary activity as engineers. Our job became tweaking the harness every time we got code or results we didn’t like. Eventually we were pretty happy with most agent runs, but we were always happy to just throw out ones that didn’t meet our standards. I think more than half didn’t.
Why? Because good stuff often falls through the cracks, and I like to see people discuss substantive material rather than just the "easy" submissions which tend to attract quick votes. This is one of the best ways to have a positive impact on the HN submission queue. And mods appreciate the suggestions (they can't keep up with the firehose either, and are well aware of the HN submission queue's weaknesses).
I agree with many of the points made by nimonian above (esp the one starting with 'make a single skill called "code" which describes the lifecycle'), based on my limited experience with these things.
My approach (with LLMs especially) aligns more with what's outlined in "Growing OO Software Guided by Tests" (https://growing-object-oriented-software.com/toc.html). Chapter 4 there says "First, Test a Walking Skeleton", and Chapter 5 has "Start Each Feature with an Acceptance Test". I think it comes down to: get something working end-to-end first in a verifiable way, and then keep refining both the feature and its tests (preferable with TDD).
I've noticed that LLMs tend to generate multiple testcases in one shot (which is not how humans usually go about TDD), and also they don't start with Integration Tests, unless instructed to do so.
I don't think the idea of skills is quite snake oil. It seems you can change what LLM outputs next by what's called few-shot prompting or in-context learning: https://www.promptingguide.ai/techniques/fewshot
May not work for apps that want to launch their own threads and processes. But for almost everything else, I prefer function calls to launching processes, managing their lifecycle, communicating via stdout etc. If I wanted to do that, I’d be writing Bash ;)
reply