This post is the latest in a series about Townie, our AI assistant. Our first ha...

wbhart · on Jan 4, 2025

This blog article is written in a very engaging way. It seems to be more or less a masterclass on how to keep someone's attention, although there is no meta-story making you wait for the big fulfillment at the end.

I think it is the short, punchy sections with plenty of visuals and the fact that you are telling a story the whole way through, which has a natural flow, each experiment you describe, leading to the next.

stevekrouse · on Jan 6, 2025

Thank you! I was so proud of this comment that I read it aloud to my fiance and my dad :)

deadmutex · on Jan 4, 2025

Interesting. On lmsys, Gemini is #1 for coding tasks. How does that compare?

https://lmarena.ai/?leaderboard

nathanasmith · on Jan 4, 2025

For the lmarena leaderboard to be really useful you need click the "Style Control" button so that it normalizes for LLMs that generate longer answers, etc. that, while humans may find them more stylistically pleasing, and upvote them, the answers often end up being worse. When you do that, o1 comes out on top followed by o1-preview, then Sonnet 3.5, and in fourth place Gemini Preview 1206.

MacsHeadroom · on Jan 4, 2025

lmsys is a poor judge of coding quality since it is based on ratings from a single generation rather than agentic coding over multiple steps.