Building VizPy, a prompt optimizer we've been working on for a while now.
The problem it's solving is one most people building with LLMs know well. Your prompt fails on some inputs, you don't really know why, and you end up just tweaking and re-running until something sticks. We kept hitting this ourselves and it felt like there had to be a better way than guessing.
What we figured out after a lot of research: prompt failures almost always follow a pattern. The model isn't failing randomly, it's consistently failing on a particular type of input or reasoning step. VizPy finds that pattern, distills it into a plain English rule you can actually read, and then rewrites your prompt around it. You also get the rule itself so you can review it, tweak it, or just drop it into your existing prompt directly. DSPy-compatible, no pipeline rewrite needed.
We have compared it extensively against baselines such as GEPA on benchmarks like BBH, HotPotQA, GPQA Diamond, and GDPR-Bench and VizPy wins on all of them. We'll have more benchmarks on cyber-security and chip-design coming out soon.
Anthropic released a C compiler this week that was built autonomously. We mined the entire repo for clues to reproduce the scaffolding they used and open sourced it.
+1 to this, my mom died because of COVID in India 1 months after she left US after visiting me, and I still feel guilty that I didnt insist on getting her the vaccine before she left for India, and then at the time of her death India was locked down so no flights and I wasn't even next to her. It's been 4 years but every so often I think about this. I blame myself less now after some therapy, but If you didn't try all that you could you'd probably feel guilty like me.
Yeah I don't understand why the whole thread has been so hostile against a very reasonable/useful observation that you made. If there was a way to prompt commenters to be less snarky on HN that'd be a vast improvement.
I don't know about Shorts but Instagram has solved the addiction problem by ignoring signals like the user tapping "not interested" or scrolling past videos quickly. They just show junk.
I don't know if you are just ignorant about history and unwilling to Google, or if you are making the point that of course British did not force feed opium to the people.
What is very well established is that the british fought a war , literally called the opium war by Western historians themselves with the main objective of keeping their opium distribution into China open after the emperor banned it
Their action was akin to if some majority owner of Purdue pharma invades US and forces US government to "keep the oxy market open" while letting "people make their own decision".
Tbh, what you describe sounds nothing like forcing opium on a people.
If mexico invaded and started making meth in the US, or started sending even more meth into the US than they do now by totally taking over the border, I would not begin taking meth.
The brits were also running the opium shops. So if you think country A selling opiods in country B and then going to war so that country B can not stop the sale of opiods is totally okay, then that is truly very different from my model of good behavior.
The hidden variable is the previa marketing budget. They have budget right now for a billboard and for online ads at the same time and they are focusing on your geography
A few months ago I read about a lady who can smell Alzheimer's and at first she too was not taken seriously. I can't think of how RF can affect human body but I wouldn't dismiss their reports completely.
The problem it's solving is one most people building with LLMs know well. Your prompt fails on some inputs, you don't really know why, and you end up just tweaking and re-running until something sticks. We kept hitting this ourselves and it felt like there had to be a better way than guessing.
What we figured out after a lot of research: prompt failures almost always follow a pattern. The model isn't failing randomly, it's consistently failing on a particular type of input or reasoning step. VizPy finds that pattern, distills it into a plain English rule you can actually read, and then rewrites your prompt around it. You also get the rule itself so you can review it, tweak it, or just drop it into your existing prompt directly. DSPy-compatible, no pipeline rewrite needed.
We have compared it extensively against baselines such as GEPA on benchmarks like BBH, HotPotQA, GPQA Diamond, and GDPR-Bench and VizPy wins on all of them. We'll have more benchmarks on cyber-security and chip-design coming out soon.
Free to try, 10 runs no card: https://vizpy.vizops.ai/
reply