Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is there reason to believe ChatGPT is unusually good at AP Computer Science, or would it do as well on other AP subject tests?


Annecdotally, my friends and I asked it to prove some simple math theorems earlier today, and it was mostly garbage. It seems like it's tuned to explain stuff, so it took great pains to pedantically explain every step, but it sometimes made unwarranted leaps of logic. For instance, we asked it to prove there were an infinite number of primes, and it argued there were an in infinite number of natural numbers (correctly) but then just asserted that, since primes were a subset of the natural numbers, they must be an infinite set too.

It does seem pretty good at producing code though. And it's able to write stories that have a real structure to them, but they're pretty bland.


And of course it's quite happy to "demonstrate" utter nonsense as well.

> Explain in terms a high school student will understand why there are a finite number of primes.

Sure, a prime number is a whole number greater than one that is only divisible by one and itself. For example, the first few prime numbers are 2, 3, 5, 7, 11, and 13. One way to think about why there are a finite number of primes is to imagine a list of all the whole numbers greater than one. Since there are an infinite number of whole numbers, the list of prime numbers would have to be a subset of the list of whole numbers. But since the prime numbers are only divisible by one and themselves, there must be a point where the list of primes stops, because at some point, every number on the list would be divisible by a prime number that came before it. For example, the number 20 can be divided by 2, and the number 15 can be divided by 3, so these numbers would not be considered prime. Does that help? Let me know if you have any other questions.


I also asked it to prove some elementary theorems. Didn't really have a proper unterstanding of definitions. But it's actually fairly good at solving simple word problems. Something like: assume you have 3 bottles, each with a capacity of two liters, and each filled to 20% of its capacity. Say you have a fourth bottle with the same capacity und you pour the content of each of the first three bottles into the fourth. To what capacity is the fourth bottle now filled. And it comes up with the correct answer, which is absolutely astounding.


A bat and a ball together cost $1.10, the bat costs $1 more than the ball. How much does the ball cost?

> The ball costs $0.10. Let's prove this using an equation. {uses equation}. The ball costs $0.05.


I guess at least yours can solve for equations, here's what mine got:

> The ball costs $0.10. Let's prove this using an equation. {uses equation}. The ball costs $0.10.


For me the ball turns out to cost nothing:

> The bat costs $1 more than the ball, so the bat costs $1 + x, where x is the cost of the ball. The bat and the ball together cost $1.10, so the cost of the ball is $1.10 - $1 - x = $0.10 - x. Since the bat costs $1 + x, we know that $1.10 = $1 + x + $0.10 - x = $1 + $0.10. Therefore, the cost of the ball is $0.10 - x = $0.10 - ($1.10 - $1) = $1.10 - $1 - $0.10 = $0. The ball costs $0.


A major part of its training set is Wikipedia. Given the impenetrability of maths Wikipedia, I am not at a surprised that it's bad at maths. Usually when I read maths articles on Wikipedia, I get worse at maths.

On a serious note, I wonder how well the training works when a lot of maths is represented symbolically. For example, when they feed it books, do they convert it to LaTeX? MathML? Just leave it out?


There are more specialized models that would do better, e.g. https://ai.facebook.com/blog/ai-math-theorem-proving/

"neural theorem prover that has solved 10 International Math Olympiad (IMO) problems — 5x more than any previous AI system"


Similar experience, I asked it to prove the square root of 2 is irrational. It seemed to try to pretend to do the standard proof using even numbers, but then randomly made a weird leap and declared victory:

"we get 2k^2 = q^2. Since q^2 is even, this means that q must be even as well. But this contradicts the original assumption that q is not equal to 0, so we must conclude that √2 is irrational"

The language of these fake proofs tends to resemble real math well enough that I hope it doesn't fool people.

In areas other than math I've been very impressed however with it overall. While human experts are obviously still better in terms of depth, accuracy and creativity, it's already pretty good. Whereas in terms of breadth, no human could compare. Given the rate of improvements, these things seem destined to launch a new era


It’s going to test Wittgenstein’s thesis. The months ahead are the most exiting months of my lifetime.


Which thesis, specifically?


“If a lion could talk, we would not understand him.”


What is Wittgenstein's thesis?


Wittgenstein's thesis is a philosophical idea that is central to his work, known as the picture theory of meaning. This theory proposes that the meaning of a word or concept can be understood by comparing it to a mental image or picture of the thing it represents. Wittgenstein believed that words and concepts are essentially symbolic representations of objects or ideas in the world, and that our understanding of them is determined by the way in which they are related to these objects or ideas. This means that the meaning of a word or concept is determined by its relationship to the thing it represents, rather than by any inherent meaning that it may possess. This idea is known as Wittgenstein's thesis because it forms the basis of his philosophical theories on language, meaning, and representation.

(ChatGPT output)


Probably because there's a lot more material and detail about computer science online that it would be regurgitating. You can be sure that someone has already answered whatever CS-related question you are asking it verbatim - all it does is connect the dots and glue the concepts together.


It definitely has had very deep training on computer science topics- I can ask it detailed documentation questions on obscure programming apis and it can usually answer them.


I'm not sure details about APIs should be considered computer science, at least not in the academic sense emphasized by the AP exam.


Maybe it's a function of the Q&A style for programming questions where correct snippets of code are found to build base know-how and source code is used to enhance it?


"Is there reason to believe ChatGPT is unusually good at AP Computer Science"

It's a robot...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: