To me it sounds like one way to do this would be to have LLMs write Cucumber test cases. Those are high level, natural language tests which could be run in a browser.
This is interesting, and I think worth trying. However,
The process is iterative:
Vibe code users <--> Vibe code software
Step by step, you get closer to truly understanding your users
Do not fool yourself. This is not "truly" "understanding" your "users". This is a model which may be very useful, but should not be mistaken for your users themselves.
Nothing beats feedback from humans, and there's no way around the painstaking effort of customer development to understand how to satisfy their needs using software.
I agree. I do like the general idea as an exploration.
Perhaps the idea is to use an LLM to emulate users such that some user-based problems can be detected early.
It is very frustrating to ship a product and have a product show stopper right out of the gate that was missed by everyone on the team. It is also sometimes difficult to get accurate feedback from an early user group.
That is neat trick, and interesting to know that's how ssh git@github.com works, but that does not feel practical for a real usecase. Aside from relying on a scrape of the Github users API (there's no "look up user by pubkey" API), what if I wasn't expecting to automatically log in with Github?
> estimates only an 80/20 chance of finding a suitable provider
I must be terribly fussy but this genuinely tripped me up while reading. What does this phrasing even mean? Is it an 80% chance of success? This seems like someone has heard the phrase "80/20 rule" and applied it somewhere it makes no sense.
I have gotten code reviews from OoenAI's Codex integration that do point out meaningful issues, including across files and using significant context from the rest of the app.
Sometimes they are things I already know but was choosing to ignore for whatever reason. Sometimes it's like "I can see why you think this would be an issue, but actually it's not". But sometimes it's correct and I fix the issue.
I just looked through a couple of PRs to find a concrete example. I found a PR review comment from Codex pointing out a genuine big where I was not handling a particular code path. I happened to know that no production data would trigger that code path as we had migrated away from it. It acted as a prompt to remove some dead code.
I instinctually agree with nkrisc, but this is an interesting line of thought.
What's an example of something that nobody should be allowed to do e.g. on a laptop? If I buy a system with OS stuff set up from the get-go. What abilities do you withdraw from the user?
If anyone is looking for a clean JS charting framework, I highly recommend Observable Plot.
It's from the creator of D3 and it's much easier than using raw D3. I've been using it outside the Observable platform for debug charts and notebooks, and I find its output crisp and its API very usable.
It doesn't try to have all the bells and whistles, and I'm not even sure if it has animations. But for the kind of charts you see in papers and notebooks I think it covers a lot.
reply