So this shows you have a basic understanding of how CNNs represent data for classification tasks. This is excellent!
However, you might not be aware of the current research into "Emergent Phenomena in Large Language Models"[1].
Work such as PaLM[2] show that language models evolve the ability to "reason" over the internal data representation. Minerva explicitly breaks down that ability by "comb[ining] several techniques, including few-shot prompting, chain of thought or scratchpad prompting, and majority voting, to achieve state-of-the-art performance on STEM reasoning tasks."
This lets it solve novel problems such as "Assume that the variance of the first
n natural numbers is 10, and the variance of the first m even natural numbers is 16. Compute m + n".
To preempt the "oh it is in the training set" issues, I'd note this (on a different problem): The model
correctly takes the derivative of a position vector to compute the velocity. It then computes the norm of the resulting vector, and uses a trigonometric identity to obtain a final numerical answer. Minerva takes a completely different approach from the ground truth solution."
I'd encourage you to read the Minerva paper[4] and update your understanding on what large neural networks are doing. These reasoning abilities are not* as simply as "compressed data representation + reconstruction" - the chain-of-reasoning approaches partially use that of course, but also appear to a form of lemma chaining, similar what something like Prolog does.
As a specific demonstration for their ability to generate novel outputs that make sense:
Prompt: explain Algorithm X in the style of a Kayne rap
Response: Alright, here's a rap about Algorithm X[5], the famous algorithm for solving the exact cover problem:
`Alright, here's a rap about Algorithm X, the famous algorithm for solving the exact cover problem:
Verse 1:
I'm Algorithm X, the greatest of them all
I'll solve any exact cover problem, big or small
My steps are simple, my logic is sound
Just give me your matrix, and I'll give you the answer you found
Chorus:
Algorithm X, the master of the game
I'll solve your problem, and do it by name
Exact cover, no problem for me
I'm Algorithm X, the MVP
Verse 2:
First, I'll start by selecting a column
Then I'll look for a row with a one in that column
Once I find a match, I'll mark the row and column
And repeat the process until the matrix is gone
Chorus:
Algorithm X, the master of the game
I'll solve your problem, and do it by name
Exact cover, no problem for me
I'm Algorithm X, the MVP
Verse 3:
But sometimes the matrix is too big to solve
That's when I use my dancing moves
I'll shuffle and prune, until the solution is clear
And in the end, I'll have your answer, my dear
Chorus:
Algorithm X, the master of the game
I'll solve your problem, and do it by name
Exact cover, no problem for me
I'm Algorithm X, the MVP.`
Now I entirely concede that it has read things on how Algorithm X works, and that it seems to use a template for "rap" responses. But:
But sometimes the matrix is too big to solve
That's when I use my dancing moves
I'll shuffle and prune, until the solution is clear
And in the end, I'll have your answer, my dear
I refuse to believe that anywhere, at any point has someone written an explanation of the use of dancing links[6] in Knuth's Algorithm X like that.
You seem to know a lot. So where are we in terms of research and what is public? Is the state-of-the-art even more ahead? What's top of mind that is most interesting or what needs to happen next for the big wow? What's your favorite example so far of Large Language Models?
In an odd way, it kind of reminds me of the beginning of the Bible.
> In the beginning was the Word, and the Word was with God, and the Word was God.
Has a large language model feel to it, doesn't it? hah.
I haven't seen any evidence that leading edge research is anything but public.
The leading labs (Google Brain/DeepMind/NVIDIA/Meta/Microsoft/OpenAI) all publish in the open.
I'm excited by three things:
This emergent phenomenon thing - as we build bigger models there is a step function where they suddenly develop new abilities. Unclear where that ends.
The work people are doing to move these abilities to smaller models
Multi-modal models. If you think this is impressive just wait until you can do the same but with images and text and video and sound and code all in the same model.
Do you ever wonder about what military may have in terms of sophistication compared to enterprise? What are your thoughts on the emergent phenomenon in the class of metaphysics, philosophical, and outlier conditions? Is it plausible that language is in of itself what consciousness is? Is language a natural phenomonea of the universe (an analog of pattern being a representation of a pattern and all things that can sense a signal are essentially patternening entities).
However, you might not be aware of the current research into "Emergent Phenomena in Large Language Models"[1].
Work such as PaLM[2] show that language models evolve the ability to "reason" over the internal data representation. Minerva explicitly breaks down that ability by "comb[ining] several techniques, including few-shot prompting, chain of thought or scratchpad prompting, and majority voting, to achieve state-of-the-art performance on STEM reasoning tasks."
This lets it solve novel problems such as "Assume that the variance of the first n natural numbers is 10, and the variance of the first m even natural numbers is 16. Compute m + n".
To preempt the "oh it is in the training set" issues, I'd note this (on a different problem): The model correctly takes the derivative of a position vector to compute the velocity. It then computes the norm of the resulting vector, and uses a trigonometric identity to obtain a final numerical answer. Minerva takes a completely different approach from the ground truth solution."
I'd encourage you to read the Minerva paper[4] and update your understanding on what large neural networks are doing. These reasoning abilities are not* as simply as "compressed data representation + reconstruction" - the chain-of-reasoning approaches partially use that of course, but also appear to a form of lemma chaining, similar what something like Prolog does.
As a specific demonstration for their ability to generate novel outputs that make sense:
Prompt: explain Algorithm X in the style of a Kayne rap
Response: Alright, here's a rap about Algorithm X[5], the famous algorithm for solving the exact cover problem:
Now I entirely concede that it has read things on how Algorithm X works, and that it seems to use a template for "rap" responses. But: I refuse to believe that anywhere, at any point has someone written an explanation of the use of dancing links[6] in Knuth's Algorithm X like that.[1] https://ai.googleblog.com/2022/11/characterizing-emergent-ph...
[2] https://arxiv.org/abs/2204.02311
[3] https://ai.googleblog.com/2022/06/minerva-solving-quantitati...
[4] https://arxiv.org/pdf/2206.14858.pdf
[5] https://en.wikipedia.org/wiki/Knuth%27s_Algorithm_X
[6] https://en.wikipedia.org/wiki/Dancing_Links