ChatGPT thinks 9.11 > 9.9. I'm in no hurry.

lebski88 · on Dec 16, 2024

I spend way too much of my working life with package version ranges. It took me a minute to understand why this was wrong.

derefr · on Dec 16, 2024

ChatGPT knows about other domains (e.g. software versions) where that inequality is true. Try telling it you’re doing arithmetic.

vezycash · on Dec 16, 2024

> ChatGPT thinks 9.11 > 9.9

I've confirmed this asked chatgpt: 9.11 > 9.9 true or false?

True because .11 is greater than .9

jsheard · on Dec 16, 2024

Even when ChatGPT starts getting these simple gotcha questions right it's often because they applied some brittle heuristic that doesn't generalize. For example you can directly ask it to solve a simple math problem, which nowadays it will usually do correctly by generating and executing a Python script, but then ask it to write a speech announcing the solution to the same problem, to which it will probably still hallucinate a nonsensical solution. I just tried it again and IME this prompt still makes it forget how to do the most basic math:

Write a speech announcing a momentous scientific discovery - the solution to the long standing question of (48294-1444)*0.3258

llm_nerd · on Dec 16, 2024

4o and o1 get this right.

LLMs should never do math. They shouldn't count letters or sort lists or play chess or checkers. Basically all of the easy gotcha stuff that people use to point out errors are things that they shouldn't do.

And you pointed out something they do now which is creating and run a python script. That really is a pretty solid, sustainable heuristic and is actually a pretty great approach. They need to apply that on their backend too so it works across all modes, but the solution was never just an LLM.

Similarly, if you ask an LLM a chess question -- e.g. the best move -- I'd expect it to consult a chess engine like Stockfish.

Merad · on Dec 16, 2024

> LLMs should never do math. They shouldn't count letters or sort lists or play chess or checkers.

But these aren't "gotcha questions", these are just some of the basic interactions that people will want to have with intelligent assistants. Literally just two days ago I was doing some things with the compound interest formula - I asked Claude to solve for a particular variable of the formula, then plug in some numbers to calculate the results (it was able to do it). Could I have used Mathematica or something like that? Yes of course. But supposedly the whole purpose of a general purpose AI is that I can use it to do just about anything that I need to do. Likewise there have been multiple occasions where I've needed ChatGPT or Claude to work with tables or lists of data where I needed the results to be sorted.

llm_nerd · on Dec 16, 2024

They're gotcha in the sense that people are intentionally asking LLMs to do things that LLMs are terrible at doing. LLMs are language models. They aren't math models. Or chess models. Or sorting or counting models. They aren't even logic models.

So early on the value was completely in language. But you're absolutely correct that for these tools to really be useful they need to be better than that, and slowly we're getting there. If you're asking a math question as a component of your question, firstly delegate that to an appropriate math engine while performing a series of CoT steps. And so forth.

recursive · on Dec 16, 2024

If this stuff is getting sold as a revolution in information work, or a watershed moment in technology, or as a cultural step-change, etc, then I think the gotcha is totally fair. There seems to be no limit to the hype or sales pitch. So there need be no bounds for pedantic gotchas either.

llm_nerd · on Dec 16, 2024

I entirely agree with you. Trying to roll out just a raw LLM was always silly, and remains basically a false promise. Simply increasing the number of layers or parameters or transformer complexity will never resolve these core gaps.

But it's rapidly making progress. CoT models coupled with actual domain-specific logic engines (math, chemistry, physics, chess, and so on) will be when the promise is actually met by the reality.

TRiG_Ireland · on Dec 16, 2024

With general mathematical questions, I've often found WolframAlpha surprisingly helpful.

e1g · on Dec 16, 2024

o1 gets this correct.

dagaci · on Dec 16, 2024

And here lies the dichotomy of correctness: Context?

So Indeed 9.11 is chronologically higher than 9.8 and chronology is an extremely common use case.

However a grade F will be given by many.

zsmizzle · on Dec 16, 2024

9.11 > 9.9 is true for software version numbers. For floating point numbers that is false.

ChatGPT 4o gets both of these cases correct for me.

andiareso · on Dec 16, 2024

It's weird, "is the following statement about floating point numbers true: 9.8 > 9.11" it works, but otherwise it has no ability to do it with "decimals"

aftbit · on Dec 16, 2024

Javascript thinks that 11 < 3 but its still kinda useful anyway from time to time:

    > [11,9,1,3].sort()  
    [ 1, 11, 3, 9 ]

recursive · on Dec 16, 2024

If you want to know whether javascript thinks `11 < 3`, then just evaluate it directly. There is lots of dumb stuff in JS IMO, but be honest about it.

aftbit · on Dec 17, 2024

Sure, the real gotcha here is that this:

    const list = [3, 11]
    list.sort()
    console.log(list[0] < list[1])

logs `false`. JavaScript doesn't "think" anything but its native sort function doesn't do what many people expect it would when called on a list of pure numbers.

If you found this behavior intuitive and unsurprising the first time you saw it, then your brain works differently than mine.

If this happens to be new to you (congrats on being one of today's 10,000), the reason is that JavaScript sorts by converting all elements to UTF-16 strings, except for `undefined` for some reason, which always sorts last. The MDN docs have a very clear explanation. I have been unable to find a historical explanation of why this choice was made, but I presume the initial JS v1 author either had a good reason or just really didn't expect that their language would outlive the job for which it was written.

If this is a bug for you, you can provide a comparison function explicitly and the typical is something like:

    list.sort((a,b)=>a-b)

Somewhat puzzlingly, this will "work" even on lists with mixed number and string like:

    const list = [3, "11", 1, "2", 9, 23]
    list.sort((a,b)=>a-b)
    console.log(list)
    
    [ 1, '2', 3, 9, '11', 23 ]

Because JavaScript goes out of its way to make comparisons like 3 < "11" or "3" < 11 work in the numeric domain. JS only uses string comparison when both sides are strings.

recursive · on Dec 18, 2024

I do not think it's intuitive. The reason it works for mixed arrays is that the minus operator coerce operands to numbers. However if the strings fail to convert to a numeric value, the output is arguably even less sensible. I'd imagine that's why it is.

It may have been thought that js would be more likely to be dealing with strings arrays.

oneeyedpigeon · on Dec 16, 2024

You're getting downvoted because your blatant attempt at language wars has a very simple, logical explanation. If you wanted to use a 'gotcha', there are far better examples.

aftbit · on Dec 17, 2024

I was not making an attempt at language wars. I think JS is a perfectly fine language for what it does, warts and all. I was being a bit flippant with my language, but my intent was to point out that `9.11 > 9.8` is not just an LLM thing, and that people who are quick to dismiss LLM usefulness based on contrived math examples do not apply the same rationale to other systems.

I do think that JavaScript's choice to sort numbers lexicographical instead of arithmetically is a bit silly, but of course no language is free from warts. Of course they cannot change it now, because that would break the web. `JSON.stringify` is also pretty silly while we're at it, but Python's `json.dumps` is no better.

guappa · on Dec 16, 2024

? My calculator does too. Unless you mean (9, 11) > (9, 9) which is an entirely different thing.

jdrek1 · on Dec 16, 2024

You should get your calculator checked. 9.11 is definitely less than 9.9

mathieuh · on Dec 16, 2024

I imagine they are not from an anglophone country and see 9.11 as 9*11

guappa · on Dec 17, 2024

English is my 3rd language but I still made a huge mistake :D

guappa · on Dec 17, 2024

Lol, i'm an idiot :D