Hacker Newsnew | past | comments | ask | show | jobs | submit | tcoff91's commentslogin

The perverse incentives created by these AI leaderboards are crazy.

The leaderboards are dumb, but I understand the point of telling people not to worry about tokens and just use it. They are trying to get people to try it, to discover new uses without asking “is this worth testing”. It’s basically early R&D budget. Eventually these companies will decide it’s time to transition into efficient usage.

Yes I love that my employer says go wild with it. But I feel like the leaderboard is dumb.

But we need OKRs rocks and METRICS! Everyone must have their own one numberrrrr!

It seems really dumb for the models to not due security related things. What if I want it to do a security audit of my own software that I'm building?

codex will actually help you look but it will refuse to actually try and exploit it.

it won't for example create a POC python script that you normally would use to prove the issue.


Mailing patches is the same as squashing commits. The Linux kernel would be much harder to maintain without messy history being carefully distilled down to well crafted patches.

But mailing patches is a pain in the ass. VCSes should support squashing and rebasing.


These calorie counting picture apps should be sued for false advertising.

Are you giving the LLM the weights of the ingredients as you go? Sounds like a great system.

The data entry is a pain in the ass with those apps when cooking food from scratch. It’s much much easier with LLMs and natural language and voice mode and pictures of a food scale and things like that.

I’ve found that the best way to deal with this is to add an entry to /etc/hosts for my local machine that fits the pattern for QA environment. Then I run a local reverse proxy with a self signed certificate.

So I do local dev on https://local.qa.yourappnamehere.com


That's Anthropic. Codex is OpenAI.


For what it's worth, codex doesn't yet seem to be aggressively terminating accounts or invalidating auth tokens if they detect usage in a non-first party tool. Whether that will continue to be the case or not a gamble though.



I hope none of your accounts are associated with that email address that can be read by an LLM that has access to untrusted input.

OpenClaw lives right in the prompt injection lethal trifecta.

The idea of an OpenClaw instance having the ability to reset passwords on your accounts sounds sketchy as shit to me.


Of course, you need to be careful about what access you give to your agent. I gave my agent its own email, and I can forward it emails if I need it to read anything in my inbox.

Everyone will have their own threshold for what type of access they want to give their agent. some people will give it access to their personal email, bank account, etc, but I wouldn't recommend it yet! But I bet in a couple years this will be standard practice.


There’s a lot of humans I wouldn’t trust to be an assistant with access to my bank account. It’s bold to assume that within 2 years these things are going to be scam resistant.

It’s going to be bleak when there’s articles about how “my agent fell for a scam and now my life savings are gone”.


Yeah if this can truly just autonomously make great software, then where is all the new SaaS that is able to undercut incumbents by charging 10-20% of what they are charging?


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: