More

lgessler · 2026-04-22T19:29:57 1776886197

I'll be really interested to hear qualitative reports of how this model works out in practice. I just can't believe that a model this small is actually as good as Opus, which is rumored to be about two orders of magnitude larger.

lgessler · 2026-04-05T04:24:21 1775363061

Is Java or Haskell any closer to human language?

lgessler · 2026-03-17T19:16:00 1773774960

Has everyone always nailed their implementation of every program on the first try? Of course not. Probably what happens most times is you first complete something that sorta works and then iterate from there by modifying code, executing, observing, and looping back to the beginning. You can wonder about ultimately how much of your time/energy is consumed by the "typing code" part, and there's surely a wide range of variation there by individual and situation, but it's undeniable that it is a part of the core iteration loop for building software.

I don't understand why GP's comment is so controversial. GP is not denying that you should maybe think a little before a key hits the keyboard as many commenters seem to suppose. Both can be true.

nyeah · 2026-03-17T19:24:14 1773775454

That kind of thinking pops up very prominently in the article.

lgessler · 2026-03-15T00:58:04 1773536284

I know this is mostly about keyword substitution but it still tickles me that you still write f(x) in this language and not (x)f given that Korean is SOV but I guess that's just how you notate that no matter what cultural context you're in. Hadn't ever considered that the convention of writing a function before its arguments might have been a contingency of this notation being developed by speakers of SVO languages.

localuser13 · 2026-03-15T01:05:22 1773536722

I think this notation is superior, because of syntax completion - get_name(user.id) can be syntax completed by IDE, (user.id)get_name can't. Just like "SELECT id, name FROM users" would be better of as "FROM users SELECT id, name" (LINQ in C# fixed this mistake, and most modern query languages do too).

borski · 2026-03-15T01:08:46 1773536926

…if you’re typing from left to right. :)

cubefox · 2026-03-15T01:06:01 1773536761

Object oriented programming languages also use object.method rather than method(object), so I don't think prefix/suffix notation has much to do with language.

lgessler · 2026-02-12T02:02:26 1770861746

Let's be real here, regardless of what Boris thinks, this decision is not in his hands.

deaux · 2026-02-12T02:32:13 1770863533

Would love to hear what Boris thinks.

subscribed · 2026-02-15T15:41:04 1771170064

It's been three days. I think he only meant to keep the feedback coming, but not necessarily engaging with the key issues reported.

lgessler · 2025-08-30T17:15:26 1756574126

Novels are fictional too. So long as they're not taken too literally, archetypes can be helpful mental prompts.

lgessler · 2025-08-20T21:11:06 1755724266

If you're really just doing traditional NER (identifying non-overlapping spans of tokens which refer to named entities) then you're probably better off using encoder-only (e.g. https://huggingface.co/dslim/bert-large-NER) or encoder-decoder (e.g. https://huggingface.co/dbmdz/t5-base-conll03-english) models. These models aren't making headlines anymore because they're not decoder-only, but for established NLP tasks like this which don't involve generation, I think there's still a place for them, and I'd assume that at equal parameter counts they quite significantly outperform decoder-only models at NER, depending on the nature of the dataset.

lgessler · 2025-08-13T11:36:28 1755084988

I recommend having a look at 16.3 onward here if you're curious about this: https://web.stanford.edu/~jurafsky/slp3/16.pdf

I'm not familiar with Whisper in particular, but typically what happens in an ASR model is that the decoder, speaking loosely, sees "the future" (i.e. the audio after the chunk it's trying to decode) in a sentence like this, and also has the benefit of a language model guiding its decoding so that grammatical productions like "I like ice cream" are favored over "I like I scream".

lgessler · 2025-07-11T01:10:18 1752196218

In my (poor) understanding, this can depend on hardware details. What are you running your models on? I haven't paid close attention to this with LLMs, but I've tried very hard to get non-deterministic behavior out of my training runs for other kinds of transformer models and was never able to on my 2080, 4090, or an A100. PyTorch docs have a note saying that in general it's impossible: https://docs.pytorch.org/docs/stable/notes/randomness.html

Inference on a generic LLM may not be subject to these non-determinisms even on a GPU though, idk

msgodel · 2025-07-11T15:39:03 1752248343

Ah. I've typically avoided CUDA except for a couple of really big jobs so I haven't noticed this.

lgessler · on April 10, 2025

Sure, this is a common sentiment, and one that works for some courses. But for others (introductory programming, say) I have a really hard time imagining an assignment that could not be one-shot by an LLM. What can someone with 2 weeks of Python experience do that an LLM couldn't? The other issue is that LLMs are, for now, periodically increasing in their capabilities, so it's anyone's guess whether this is actually a sustainable attitude on the scale of years.