More

nathanasmith · 2025-11-13T13:36:06 1763040966

The thing that bothers me about "warmer, more conversational" is that it isn't just a cosmetic choice. The same feedback loop that rewards "I hear you, that must be frustrating" also shapes when the model is willing to say "I don’t know" or "you’re wrong". If your reward signal is mostly "did the user feel good and keep talking?", you’re implicitly telling the model that avoiding friction is more valuable than being bluntly correct.

I'd much rather see these pulled apart into two explicit dials: one for social temperature (how much empathy / small talk you want) and one for epistemic temperature (how aggressively it flags uncertainty, cites sources, and pushes back on you). Right now we get a single, engagement-optimized blend, which is great if you want a friendly companion, and pretty bad if you’re trying to use this as a power tool for thinking.

nathanasmith · 2025-10-11T02:06:44 1760148404

I had an old Galaxy Tab S7 collecting dust on the shelf. Since iOS 26 came out I find myself reaching for the Android tablet more and more. First time that ever happened. (Sent from my Galaxy Tab)

nathanasmith · 2025-06-03T16:09:47 1748966987

Readers would have been better served with the prompts you wrote than the AI generated output.

simonw · 2025-06-03T16:11:30 1748967090

I don't think that's true. What matters to me is the human editorial touch: I don't want to wade through 50 prompts and responses, I want a human author to have resolved that process into a final output that they think is worth sharing with me.

rfoo · 2025-06-03T17:34:59 1748972099

I think the correct benchmark is `len()`. Give me your prompts or your output, whichever is shorter.

tasuki · 2025-06-04T06:35:46 1749018946

thrwthsnw · 2025-06-03T16:32:09 1748968329

Try reading a manuscript copy of a book before it’s been edited. Yes I know some people do this out of interest but for most people it’s not the type of writing they are interested in reading or would get the most out of.

steipete · 2025-06-03T16:11:27 1748967087

All ~50 prompts would take you have an hour to read and wouldn’t bring across my point nearly as good.

layer8 · 2025-06-03T16:49:24 1748969364

But it would provide a better illustration of how you’re actually working.

simonw · 2025-06-03T16:58:16 1748969896

If you're interested in seeing the process behind this piece of writing you can read through a lot of the details in the 71 commits that went into creating the story in the PR: https://github.com/steipete/steipete.me/pull/106/commits

Applejinx · 2025-06-04T10:57:50 1749034670

Well…

nathanasmith · 2025-06-01T21:59:16 1748815156

The person you replied to is in Pakistan.

nathanasmith · 2025-05-04T17:37:20 1746380240

>You're right - I don't really care if the track playing in my favourite cafe is AI-generated or not. You're not supposed to be emotionally invested into background music

I guess different strokes but some of the best music I've ever been turned on to just happened to be playing in some random cafe or coffee shop. Conversely if the music is bland and uninspired I'm much less likely to go back.

nathanasmith · 2025-03-31T18:42:27 1743446547

Is there an API for Grok yet? If not that could be the issue.

nathanasmith · 2025-03-12T13:39:05 1741786745

Unfortunately that wouldn't help as much as you think since talented AI labs can just watch the public leaderboard and note what models move up and down to deduce and target whatever the hidden benchmark is testing.

nathanasmith · 2025-03-03T12:59:24 1741006764

I had been sleeping on Claude's ability to write books until a couple of days ago I had it write a novel set in the Accelerando universe. It whipped up a very convincing complete multi-Act 13 chapter side plot about humans learning to interact with Economics 2.0. It was quite good though I'm sure cstross would be horrified.

nathanasmith · on Jan 13, 2025

I have a T420 I've been using for years. Upgraded to 16GB of RAM, SSD, swapped the dual core i5 for a 4 core/8 thread i7 (yes, the CPU is in a socket!), and swapped the 1600x900 crappy display for a newer 1080p panel that looks much better. I absolutely love this laptop and am not looking forward to the day when it's too old for the modern web.

bikenaga · on Jan 13, 2025

My t420 has an i5-2520M. Do you remember what model i7 you swapped for your i5?

nathanasmith · on Jan 4, 2025

For the lmarena leaderboard to be really useful you need click the "Style Control" button so that it normalizes for LLMs that generate longer answers, etc. that, while humans may find them more stylistically pleasing, and upvote them, the answers often end up being worse. When you do that, o1 comes out on top followed by o1-preview, then Sonnet 3.5, and in fourth place Gemini Preview 1206.