More

crabmusket · 2026-01-01T11:39:38 1767267578

To me it sounds like one way to do this would be to have LLMs write Cucumber test cases. Those are high level, natural language tests which could be run in a browser.

crabmusket · 2026-01-01T07:03:38 1767251018

This is interesting, and I think worth trying. However,

    The process is iterative:

    Vibe code users <--> Vibe code software

    Step by step, you get closer to truly understanding your users

Do not fool yourself. This is not "truly" "understanding" your "users". This is a model which may be very useful, but should not be mistaken for your users themselves.

Nothing beats feedback from humans, and there's no way around the painstaking effort of customer development to understand how to satisfy their needs using software.

bahmboo · 2026-01-01T09:30:33 1767259833

I agree. I do like the general idea as an exploration.

Perhaps the idea is to use an LLM to emulate users such that some user-based problems can be detected early.

It is very frustrating to ship a product and have a product show stopper right out of the gate that was missed by everyone on the team. It is also sometimes difficult to get accurate feedback from an early user group.

crabmusket · 2025-12-30T11:15:02 1767093302

"Our users are morons who can barely read, let alone read a manual", meet "our users can definitely figure out how to use our app without a manual".

crabmusket · 2025-12-29T16:08:48 1767024528

That reminds me of BEM, where slow= is like the "element" within a parent "block".

crabmusket · 2025-12-27T13:01:12 1766840472

That is neat trick, and interesting to know that's how ssh git@github.com works, but that does not feel practical for a real usecase. Aside from relying on a scrape of the Github users API (there's no "look up user by pubkey" API), what if I wasn't expecting to automatically log in with Github?

jcgl · 2025-12-27T13:50:20 1766843420

Absolutely. For example, if I use specific SSH keys for specific hosts.

hnlmorg · 2025-12-27T20:37:39 1766867859

Wouldn’t that be solvable with subdomains? Eg

ssh crabmusket.github.exe.dev

crabmusket · 2025-12-20T12:11:41 1766232701

> estimates only an 80/20 chance of finding a suitable provider

I must be terribly fussy but this genuinely tripped me up while reading. What does this phrasing even mean? Is it an 80% chance of success? This seems like someone has heard the phrase "80/20 rule" and applied it somewhere it makes no sense.

crabmusket · 2025-12-19T21:29:44 1766179784

I have gotten code reviews from OoenAI's Codex integration that do point out meaningful issues, including across files and using significant context from the rest of the app.

Sometimes they are things I already know but was choosing to ignore for whatever reason. Sometimes it's like "I can see why you think this would be an issue, but actually it's not". But sometimes it's correct and I fix the issue.

I just looked through a couple of PRs to find a concrete example. I found a PR review comment from Codex pointing out a genuine big where I was not handling a particular code path. I happened to know that no production data would trigger that code path as we had migrated away from it. It acted as a prompt to remove some dead code.

crabmusket · 2025-12-18T20:58:50 1766091530

> "proves"

I like using the word "demonstrates" in almost every case where people currently use the word "proves".

A test is a demonstration of the code working in a specific case. It is a piece of evidence, but not a general proof.

And these kinds of narrow ad-hoc proofs are fine! Usually adequate.

To rephrase the title of TFA, we must deliver code that is demonstrated to work.

crabmusket · 2025-12-13T02:48:28 1765594108

I instinctually agree with nkrisc, but this is an interesting line of thought.

What's an example of something that nobody should be allowed to do e.g. on a laptop? If I buy a system with OS stuff set up from the get-go. What abilities do you withdraw from the user?

charcircuit · 2025-12-13T04:55:18 1765601718

>What's an example of something that nobody should be allowed to do e.g. on a laptop?

Clearing required efi variables, bricking the motherboard.

https://www.phoronix.com/news/UEFI-rm-root-directory

crabmusket · 2025-12-12T04:38:45 1765514325

If anyone is looking for a clean JS charting framework, I highly recommend Observable Plot.

It's from the creator of D3 and it's much easier than using raw D3. I've been using it outside the Observable platform for debug charts and notebooks, and I find its output crisp and its API very usable.

It doesn't try to have all the bells and whistles, and I'm not even sure if it has animations. But for the kind of charts you see in papers and notebooks I think it covers a lot.