More

tcoff91 · 2026-05-01T16:47:06 1777654026

The perverse incentives created by these AI leaderboards are crazy.

dpark · 2026-05-01T16:50:09 1777654209

The leaderboards are dumb, but I understand the point of telling people not to worry about tokens and just use it. They are trying to get people to try it, to discover new uses without asking “is this worth testing”. It’s basically early R&D budget. Eventually these companies will decide it’s time to transition into efficient usage.

tcoff91 · 2026-05-01T17:31:57 1777656717

Yes I love that my employer says go wild with it. But I feel like the leaderboard is dumb.

swader999 · 2026-05-01T17:58:17 1777658297

But we need OKRs rocks and METRICS! Everyone must have their own one numberrrrr!

tcoff91 · 2026-05-01T16:05:19 1777651519

It seems really dumb for the models to not due security related things. What if I want it to do a security audit of my own software that I'm building?

vorticalbox · 2026-05-01T16:15:21 1777652121

codex will actually help you look but it will refuse to actually try and exploit it.

it won't for example create a POC python script that you normally would use to prove the issue.

tcoff91 · 2026-04-29T13:31:31 1777469491

Mailing patches is the same as squashing commits. The Linux kernel would be much harder to maintain without messy history being carefully distilled down to well crafted patches.

But mailing patches is a pain in the ass. VCSes should support squashing and rebasing.

tcoff91 · 2026-04-29T13:16:28 1777468588

These calorie counting picture apps should be sued for false advertising.

tcoff91 · 2026-04-29T13:15:06 1777468506

Are you giving the LLM the weights of the ingredients as you go? Sounds like a great system.

tcoff91 · 2026-04-29T13:13:57 1777468437

The data entry is a pain in the ass with those apps when cooking food from scratch. It’s much much easier with LLMs and natural language and voice mode and pictures of a food scale and things like that.

tcoff91 · 2026-04-24T06:55:25 1777013725

I’ve found that the best way to deal with this is to add an entry to /etc/hosts for my local machine that fits the pattern for QA environment. Then I run a local reverse proxy with a self signed certificate.

So I do local dev on https://local.qa.yourappnamehere.com

tcoff91 · 2026-04-15T23:12:03 1776294723

That's Anthropic. Codex is OpenAI.

baby_souffle · 2026-04-15T23:21:25 1776295285

For what it's worth, codex doesn't yet seem to be aggressively terminating accounts or invalidating auth tokens if they detect usage in a non-first party tool. Whether that will continue to be the case or not a gamble though.

FergusArgyll · 2026-04-15T23:37:35 1776296255

https://github.com/openai/codex/discussions/8338

tcoff91 · 2026-04-15T23:10:15 1776294615

I hope none of your accounts are associated with that email address that can be read by an LLM that has access to untrusted input.

OpenClaw lives right in the prompt injection lethal trifecta.

The idea of an OpenClaw instance having the ability to reset passwords on your accounts sounds sketchy as shit to me.

bryan0 · 2026-04-15T23:18:37 1776295117

Of course, you need to be careful about what access you give to your agent. I gave my agent its own email, and I can forward it emails if I need it to read anything in my inbox.

Everyone will have their own threshold for what type of access they want to give their agent. some people will give it access to their personal email, bank account, etc, but I wouldn't recommend it yet! But I bet in a couple years this will be standard practice.

tcoff91 · 2026-04-16T13:56:49 1776347809

There’s a lot of humans I wouldn’t trust to be an assistant with access to my bank account. It’s bold to assume that within 2 years these things are going to be scam resistant.

It’s going to be bleak when there’s articles about how “my agent fell for a scam and now my life savings are gone”.

tcoff91 · 2026-04-14T21:33:14 1776202394

Yeah if this can truly just autonomously make great software, then where is all the new SaaS that is able to undercut incumbents by charging 10-20% of what they are charging?