Prompt: Would a chatbot scoring 1020 on an SAT practice test whose questions and answers were part of its training data be evidence of general understanding?
ChatGPT Response: A chatbot scoring 1020 on an SAT practice test whose questions and answers were part of its training data would not necessarily be evidence of general understanding. Scoring well on a standardized test like the SAT requires a specific set of skills and knowledge, but it does not necessarily indicate that the chatbot has a general understanding of the world or the ability to think and reason like a human.
Standardized tests like the SAT are designed to measure specific abilities and knowledge, and a chatbot that has been trained on the questions and answers from the test would be able to perform well on the test by simply recognizing and reproducing the correct answers. This type of performance does not necessarily indicate that the chatbot has a deep understanding of the material or the ability to apply it in a broader context.
I do wonder how output like this would fare on the essay section. All the spelling and grammar is correct, and ChatGPT can presumably generate as many paragraphs of it as you want. (And I think that's what SAT graders mostly look for in practice.) But it also repeats some generic sentence about "a standardized test like the SAT requiring specific skills and knowledge" twice with superficial avoiding-a-plagiarism-checker alterations, takes until the second paragraph for it to figure out what point it's making, and uses the wrong conjunction when it's making that point.
OpenAI doesn't release its training datasets, AFAIK, but we know they're based on sources like Common Crawl that scrape websites the same way search engines do. So here's an experiment you can try at home: type "juan purchased an" into Google Search and look at the auto-complete suggestions. If it suggests the word "antique", that's thanks to question 18 on the no-calculator math section of this exam. (Similarly if you type "jake buys" and it jumps in with "a bag of popcorn", that's question 3 on the calculator section.)
https://twitter.com/davidtsong/status/1598767389390573569