Hacker Newsnew | past | comments | ask | show | jobs | submit | nathanasmith's commentslogin

The thing that bothers me about "warmer, more conversational" is that it isn't just a cosmetic choice. The same feedback loop that rewards "I hear you, that must be frustrating" also shapes when the model is willing to say "I don’t know" or "you’re wrong". If your reward signal is mostly "did the user feel good and keep talking?", you’re implicitly telling the model that avoiding friction is more valuable than being bluntly correct.

I'd much rather see these pulled apart into two explicit dials: one for social temperature (how much empathy / small talk you want) and one for epistemic temperature (how aggressively it flags uncertainty, cites sources, and pushes back on you). Right now we get a single, engagement-optimized blend, which is great if you want a friendly companion, and pretty bad if you’re trying to use this as a power tool for thinking.


I had an old Galaxy Tab S7 collecting dust on the shelf. Since iOS 26 came out I find myself reaching for the Android tablet more and more. First time that ever happened. (Sent from my Galaxy Tab)


Readers would have been better served with the prompts you wrote than the AI generated output.


I don't think that's true. What matters to me is the human editorial touch: I don't want to wade through 50 prompts and responses, I want a human author to have resolved that process into a final output that they think is worth sharing with me.


I think the correct benchmark is `len()`. Give me your prompts or your output, whichever is shorter.


No


Try reading a manuscript copy of a book before it’s been edited. Yes I know some people do this out of interest but for most people it’s not the type of writing they are interested in reading or would get the most out of.


All ~50 prompts would take you have an hour to read and wouldn’t bring across my point nearly as good.


But it would provide a better illustration of how you’re actually working.


If you're interested in seeing the process behind this piece of writing you can read through a lot of the details in the 71 commits that went into creating the story in the PR: https://github.com/steipete/steipete.me/pull/106/commits


Well…


The person you replied to is in Pakistan.


>You're right - I don't really care if the track playing in my favourite cafe is AI-generated or not. You're not supposed to be emotionally invested into background music

I guess different strokes but some of the best music I've ever been turned on to just happened to be playing in some random cafe or coffee shop. Conversely if the music is bland and uninspired I'm much less likely to go back.


Is there an API for Grok yet? If not that could be the issue.


Unfortunately that wouldn't help as much as you think since talented AI labs can just watch the public leaderboard and note what models move up and down to deduce and target whatever the hidden benchmark is testing.


I had been sleeping on Claude's ability to write books until a couple of days ago I had it write a novel set in the Accelerando universe. It whipped up a very convincing complete multi-Act 13 chapter side plot about humans learning to interact with Economics 2.0. It was quite good though I'm sure cstross would be horrified.


I have a T420 I've been using for years. Upgraded to 16GB of RAM, SSD, swapped the dual core i5 for a 4 core/8 thread i7 (yes, the CPU is in a socket!), and swapped the 1600x900 crappy display for a newer 1080p panel that looks much better. I absolutely love this laptop and am not looking forward to the day when it's too old for the modern web.


My t420 has an i5-2520M. Do you remember what model i7 you swapped for your i5?


For the lmarena leaderboard to be really useful you need click the "Style Control" button so that it normalizes for LLMs that generate longer answers, etc. that, while humans may find them more stylistically pleasing, and upvote them, the answers often end up being worse. When you do that, o1 comes out on top followed by o1-preview, then Sonnet 3.5, and in fourth place Gemini Preview 1206.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: