Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I don't want to get in a pedantic argument over the definition of "intelligence" but ChatGPT got a 1020 on the SAT

https://twitter.com/davidtsong/status/1598767389390573569



ChatGPT got a 1020 on SAT Practice Test #10 (https://satsuite.collegeboard.org/media/pdf/sat-practice-tes...), a sample exam whose questions have been extensively discussed online since at least 2019 (eg https://youtube.com/watch?v=18oahUy6cGA). The exact text of the questions, with the corresponding correct answers, almost certainly appeared in ChatGPT's training set.


Prompt: Would a chatbot scoring 1020 on an SAT practice test whose questions and answers were part of its training data be evidence of general understanding?

ChatGPT Response: A chatbot scoring 1020 on an SAT practice test whose questions and answers were part of its training data would not necessarily be evidence of general understanding. Scoring well on a standardized test like the SAT requires a specific set of skills and knowledge, but it does not necessarily indicate that the chatbot has a general understanding of the world or the ability to think and reason like a human.

Standardized tests like the SAT are designed to measure specific abilities and knowledge, and a chatbot that has been trained on the questions and answers from the test would be able to perform well on the test by simply recognizing and reproducing the correct answers. This type of performance does not necessarily indicate that the chatbot has a deep understanding of the material or the ability to apply it in a broader context.


I do wonder how output like this would fare on the essay section. All the spelling and grammar is correct, and ChatGPT can presumably generate as many paragraphs of it as you want. (And I think that's what SAT graders mostly look for in practice.) But it also repeats some generic sentence about "a standardized test like the SAT requiring specific skills and knowledge" twice with superficial avoiding-a-plagiarism-checker alterations, takes until the second paragraph for it to figure out what point it's making, and uses the wrong conjunction when it's making that point.


It's like it knows it's not conscious. Maybe it's wrong about that, though.


Maybe it's trying not to think too hard about it.

https://toldby.ai/HdnuUiTuME2


This is like listening to that distant cousin who’s done too many drugs.


That cousin is just a chat gpt imagining itself as stuck in a loop.


> The exact text of the questions, with the corresponding correct answers, almost certainly appeared in ChatGPT's training set.

This seems like it would be easy to check for, so I’m sure it will come to light fairly quickly if so?


OpenAI doesn't release its training datasets, AFAIK, but we know they're based on sources like Common Crawl that scrape websites the same way search engines do. So here's an experiment you can try at home: type "juan purchased an" into Google Search and look at the auto-complete suggestions. If it suggests the word "antique", that's thanks to question 18 on the no-calculator math section of this exam. (Similarly if you type "jake buys" and it jumps in with "a bag of popcorn", that's question 3 on the calculator section.)


And yet it can't play an extremely simple game: https://news.ycombinator.com/item?id=33850306


What’s that mean relative to other scores?


52nd percentile, according to a reply.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: