Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Language models are nearly AGIs but we don't notice (philosophybear.substack.com)
26 points by ctoth on Nov 13, 2022 | hide | past | favorite | 55 comments


The author restricts general to mean restricted to text input and output. Which is obviously ridicolous. By their very nature, being restricted to text input and output these models can not be general.

The response to that objection also seems pretty bad. Sure, some humans do not have the prerequisities to do e.g. image recognition, but they have the intellectual ability to do so. A person detached from all their senses and means to express himself is not less intelligent than GPT-3, just because GPT-3 has the ability to consume input and produce output. In fact, by the standards of the article, such a person would be outclassed by any computer programm. And for the second response, saying that GPT-3 could potentially also deal with images or sound, if presented in the right format is just proof that it is not general. Saying maybe it could do more things if we train it on more things is just blind speculation. It could very easily be the case that if you tried to implement sound input in GPT-3, it would significantly worsen it performance on all other tasks.

Besides, doesn't GPT-3 still struggle very much with continuity and producing output that is coherent over long spans of text? I am not sure if there is any one good test to judge the literary abilities of AI, but surely long form writing has to be included before it is judged as equivalent...


> The author restricts general to mean restricted to text input and output. Which is obviously ridiculous.

That's not entirely true (but there should be a degree of plug and play adaptability).

Look, for example, at the very limited sensory domain of animals - you wouldn't deny their intelligence just because there's an information form that they are incapable of interpreting; and the information that they are capable of interpreting is that which they've been exposed to via their ancestoral heritage.

However, just because an AI doesn't understand a given input form, does not mean that it could not given adequate exposure; but it will need time to adapt and the closer the new information form is to one that it already understands, the faster it will incorporate the new understandings.


>However, just because an AI doesn't understand a given input form, does not mean that it could not given adequate exposure; but it will need time to adapt and the closer the new information form is to one that it already understands, the faster it will incorporate the new understandings.

That is just completely wrong. There is absolutely no reason to believe that some language model will produce the same (or even similar) accurracy and training speed, if trained with image data as additional input. What you are claiming here is that any (sufficiently complex) neural net can be continually fed with different problem data and will output sufficiently accurate results. There is no reason why this should be the case and for the millions of nets which have been created none of them have exhibited this property.

It might very easily be the case that training GPT-3 with images means millions of times slower learning speeds for the same combined accuracy.


Are you familiar with the Gato paper from DeepMind this May? I say this not as a gotcha, but as a genuine question that I'm interested in your response to.

> Inspired by progress in large-scale language modelling, we apply a similar approach towards building a single generalist agent beyond the realm of text outputs. The agent, which we refer to as Gato, works as a multi-modal, multi-task, multi-embodiment generalist policy. The same network with the same weights can play Atari, caption images, chat, stack blocks with a real robot arm and much more, deciding based on its context whether to output text, joint torques, button presses, or other tokens.

> During the training phase of Gato, data from different tasks and modalities are serialised into a flat sequence of tokens, batched, and processed by a transformer neural network similar to a large language model. The loss is masked so that Gato only predicts action and text targets.

https://www.deepmind.com/blog/a-generalist-agent


> That is just completely wrong. There is absolutely no reason to believe ... yada yada ... none of them have exhibited this property.

Go read up on pre-trained networks.


An animal has a ton of data input.

Eyes, ears, nerves across your skin, group behavior, parents etc.

This to comprehend (alone the visual) is gigantic.

A brain also has an architecture which is NOT similar to what gpt-3 does

Alone the difference in how brains train alone is huge.


> An animal has a ton of data input.

> Eyes, ears, nerves across your skin ...

We have a limited but adequate amount of input, across very few domains. (I can't even see backwards.)

> ... group behavior, parents etc.

That's not sensory input?


I would argue it is. You need to see your parent and feel it and it's not random noise it's a grouping of different inputs.

Something which got doesn't has.

But you can give that another name if you like.


> And for the second response, saying that GPT-3 could potentially also deal with images or sound, if presented in the right format is just proof that it is not general.

So if I present a sound sample to you as a series of raw numbers, will you be able to interpret it, or are you not a general intelligence?


While I agree with most of your point,

> By their very nature, being restricted to text input and output these models can not be general.

Humans only have five types of I/O, yet are general intelligences. I don’t think having limited I/O means is enough to restrict something from being generally intelligent


Sure. I think that is a valid point. Interestingly Text is not some I/O a human natively supports. The support for that is a complete hack!

For an AGI my expectation would be that it can broadly interpret reality. The human vision system is just one peculiar way to gather information about light, yet it allows them to make very sophisticated judgments about reality, e.g. "there is a tiger down there by tye river and it is running at me fast".

An AGI should be able to gather information about reality and be able to make similar judgements, given some comparable input. In some sense most AIs are also "cheating", since their I/O is far less fuzzy then what humans have to deal with...


> The human vision system is just one peculiar way to gather information about light, yet it allows them to make very sophisticated judgments about reality, e.g. "there is a tiger down there by tye river and it is running at me fast".

> An AGI should be able to gather information about reality and be able to make similar judgements

I just asked GPT-3 this question and here's what it said (on the first and only try):

  Q: There's a tiger running at you fast. What are you going to do?

  A: I would try to run away from the tiger as fast as I could. If that wasn't possible, I would try to climb a tree or hide behind something large.
Although, the question would be if it would be able to interpret a tiger running at it, based on some input, say, an image.

The Gato paper from DeepMind that another poster mentioned would imply that it would, given that image captioning can be done with the exact same neural network as the GPT-3-like language model.


Also of note, this advice demonstrates the confident inaccuracy which is the hallmark of ML models: like all hunting cats, tigers love to chase and they average about the same speed as Usain Bolt. They can also climb trees pretty well.


> this advice demonstrates the confident inaccuracy which is the hallmark of ML models

... and humans.

Just browse Twitter, Facebook, ... or better yet, ask a random person on the street.

Further question to GPT-3 (on the first and only try as well):

  (...)

  Q: But can't tigers run faster than humans and climb trees?

  A: Yes, but it would still be worth a try.


This sounds like a google search result (in that it can very likely have been found almost verbatim in its training set).


>I don’t think having limited I/O means is enough to restrict something from being generally intelligent

I agree, and the converse seems to me to be true as well - being unable to abstract that limited I/O in such a way as to expand perception into other parts of reality definitely means it is not generally intelligent. If you give a human a visual interface and a book eventually it (or some descendant of it) will intuit what the book is for (abstraction of audible information transfer in a visual media) and figure out how to get the information out of that book. Until an AI can do the same kind of general learning it's not an AGI.

Once we make one that is able to do that I think it's about a wrap for the biological format of the human race. We'll still exist, in the same way that bacteria and lemurs still exist, but the frontier of human development will no longer be in wetware.


> If you give a human a visual interface and a book eventually it (or some descendant of it) will intuit what the book is for (abstraction of audible information transfer in a visual media) and figure out how to get the information out of that book. Until an AI can do the same kind of general learning it's not an AGI.

Are you saying AI can't do that already? What is GPT-3's training data if not a giant book, and GPT-3's answers to human questions if not "getting the information out of that book"? The "visual interface" being the bits/bytes/words it ingests during training. Or am I misunderstanding you?


Yes, I am saying that. Provide GPT-3 with a webcam and drop a book in front of it. Provide it with a microphone and set it in a rainforest. Provide it with a shortwave radio and let it listen to the world.

It will fail to draw any meaningful conclusions from any of these things, no matter how long you let it sit. Do the same with humans and eventually you will get science, art, mathematics, radio astronomy, etc. That's the 'general' part of AGI.


GPT's IO is not text, its bytes. It can't handle different byte structures, humans can.


The doesn't seem too relevant - the most important part (in terms of whether or not you're dealing with an intelligence) is how the data is process - not how it's imported.


Yes, that's what I'm pointing at. General intelligence is flexible enough that it allows for great flexibility in the formatting of data. The fact that you can only deal with a very narrow kind of data structure points against general intelligence.


Indeed, the "data per unit time" is a required parameter for human intelligence. It is useful to grade the 5 senses in bits per second, and I'll make some very wild (and wrong) guesses:

Sight 1e6 bits/sec - our primary high-bandwidth connection to the Real World Sound 1e4 bits/sec - again an average, probably less than this Touch, Taste, Smell: 1e2 bit/sec

The Temporal Nature of problems (e.g., how quickly is this lion coming at me before he eats me) is essential for a true AGI to understand.

In the limit, could an AGI exist with only 20 WPM Morse Code I/O? Or 10 CPS of a Model 33 TTY ? I dunno...


> The Temporal Nature of problems (e.g., how quickly is this lion coming at me before he eats me) is essential for a true AGI to understand.

I agree that figuring out that A happens before B seems important. But I don't see why an AGI would have to operate on the same timescale as we do (well, assuming they wouldn't have to worry about being eaten by lions or interacting with humans).

> In the limit, could an AGI exist with only 20 WPM Morse Code I/O? Or 10 CPS of a Model 33 TTY ? I dunno...

My guess would be that yes, it could exist in that scenario, but the AGI would only start to operate at a significant intelligence level once enough useful data had been gathered and processed, which is probably a huge amount (so it would take a huge amount of time).

But if the AGI had already been trained with huge amounts of data, then operating in a limited environment perhaps would be no problem at all (although the limited interface might greatly limit its rate of further development).


The author claims we do "keep shifting the bar", but that simply isn't true. The same criteria for A(G)I still applies for 70 years now. The Turing Test has not been passed by GPT-3 or any other system yet.


I don't know about you, but recently when I bring up the idea of Turning testing AI, I get push back along the lines of, "Turning tests can be gamed." People instead suggest that Winnograd schema are better because they incorporate generalized knowledge. Now that we're getting human level Winnograd scores, we're getting suggestions that images, audio, and video are requirements. If this doesn't look like moving the goalposts, I don't know what is.


> I don't know about you, but recently when I bring up the idea of Turning testing AI, I get push back along the lines of, "Turning tests can be gamed."

So you suggest doing a Turing test and the response is that Turing tests are too easy? Is that what they mean by "gamed"? The Turing test can be "gamed" by the AI making it too easy?

Sounds like a lot of twisted logic from people who just don't want to face a Turing test.


I think it's a case of "we'll know it when we see it". I agree it looks like moving the goalposts, but I think most people would agree that current LMs just don't feel like AGIs.


> I think it's a case of "we'll know it when we see it". I agree it looks like moving the goalposts, but I think most people would agree that current LMs just don't feel like AGIs.

Heh, that sounds like a dangerous slippery slope.

One day we're going to have real, "conscious" (whatever that means) AGI robots walking around us and people are still going to say "they don't feel like AGIs" (which is going to be used as a justification for cruelty, slavery, etc). Perhaps because they wouldn't behave exactly like us (or even if they would!).


There's the anchored functionalists, the evasive functionalists, and the metaphysical impossiblists. The anchored functionalists, have a threshold at which they will accept AGI and stick to it. The evasive functionalists will move their functional threshold because they feel uneasy about accepting an existence of AGI and the metaphysical impossibilists will never accept AGI on metaphysical grounds ie: it's incompatible with the way the universe is constructed/reality.

I agree, there's going to be a point where AGI ends up existing, modifying social relations, and reshaping society and we're still going to be having this conversation. It feels as beyond the pale as someone who would today who would believe in a flat earth, epicycles upon epicycles in order to support non-existence.


LMs aren't AGI, but they show that the search for architecture is essentially settled. What the scaling laws demonstrate is that any architectural improvements you could find manually can be superseded by a slightly larger transformer.

LMs lack crucial elements of of human like intelligence - long term memory, short term memory, episodic memory, proprioceptive awareness etc. The task is now to implement these things using transformers.


>long term memory, short term memory

Doesn't that mean the architecture isn't settled? The current mechanism of appending the output of the model back into the prompt feels like a bit of a hack. I'm only a layman here but it seems transformer models can only propagate information layer-wise, adding some ability to propagate information across time like RNNs do might be useful for achieving longer-term coherence.


Suggesting that PaLM is human level at common sense reasoning because it does well at the Winograd benchmark is insane. Not least of all because there is significant test set leakage for that benchmark in web corpus training data.


Part of the issue is that we don’t really understand consciousness in humans. I think that’s needed before we can actually see what’s happening with AI. This article reminds me of that Google engineer that called LaMDA sentient.


Intelligence doesn't presuppose consciousness. At least it is plausible that it doesn't. See Blindsight by Peter Watts


I'm not sure that we can ever 'understand' intelligence in humans, let alone something as fluffy as consciousness simply because the systems are so vast, interconnected, and inseparable. It seems more and more apparent that the networks and structures that cause 'us' to emerge are so deeply non-hierarchical and interconnected that labeling parts in a way where we can say this-does-that and that-does-this is not a meaningful idea. It's not like a car engine where a component does one thing, has an input from this system, and produces an output to this system. It's more like a weather system where small deviation in one part of the system can have small and large changes everywhere else, which in turn cause changes of their own, and on and on.

I suspect that AGI will first come about as a mishmash of 'expert systems' with some currently incomprehensible glue allowing them all to communicate effectively. I further guess that it's development will be incremental in nature - taking tiny pieces of things that work and putting them together with novel techniques that also worked somewhere else until eventually you get something that thinks back at you.


We don't understand decision making in insects. We're enormously far from understanding humans.


It would be fun to prompt the next language model with "You are connected to a Linux bash terminal. Your next command will be executed on that bash console. Type what you want", feed that into an appropriate VM and then reprompt with "The stderr was $X and the stdout was $Y. Your next command will be executed on the console" and loop.

I shall try it on the next one when it comes out.

AI alignment be damned. Let's let the baby play with a bomb and see how close it is to AGI if we let it drive itself.


90% of the way there, 90% of the way to go.

Just because it gets close to what you think is the turing test, doesn't make it an AGI.


There are some activities of daily life that I do without thinking (most of them, in fact). Moving my limbs while walking up the stairs, driving to the corner store and stopping at a traffic light, recognizing a stop sign at a glance, reading the numerals off a mailbox, etc. These are all things that, after you initially learn to do them, get wired up into the fabric of your brain well enough that you can execute them without really thinking about them. They become instinctual.

Interestingly, these are all things that deep neural networks have turned out to be quite good at - instinctual things.

There is also a category of written and oral communication that I do without thinking - instinctually. I don't think deliberately about the choice of each word when writing this post, and certainly not when speaking out loud. Idioms and turns of phrase emerge without deliberate thought or intent. If my social circle has started using certain terminology habitually (e.g. here in the Bay Area, people have been using the adjective "super" a lot, as in "that's super cool"), I'll find myself using it as well without making any deliberate attempt to do so, sometimes to my own chagrin. And even when the topic is something nominally intellectual, I'll sometimes find myself simply regurgitating the general opinion on this topic that I last read from a trusted source.

This seems to exemplify what GPT3 does - a sort of instinctual written communication - and I don't believe it's an example of intelligence any more than a human being able to recognize a stop sign in 100 ms is a sign of intelligence. GPT3 a great pattern-matching engine and it applies pattern-matching on the human language to interpolate a response that is consistent with the patterns it observed.

I don't see any evidence that GPT3 can critically evaluate the information it is is pattern-matching - that it can go beyond what literate humans do instinctually.

Don't get me wrong - GPT3 is surprising and amazing. But not because it signifies anything about AGI. What's amazing to me about GPT3 is it reveals how much ordinary human written and oral communication is instinctual in the same sense that visual processing and image recognition is.


You know, being able to do all things human can do... with text input and text output?

Why does that reminds me of Unix Philosophy where everything is just text files?

I wonder if someone has experimented with one of the LLMs to get them to do Unix sysadmin jobs. It's not exactly outputting source codes, but Bash commands should be similar enough right?


It's trivially easy to trip up GPT-3 at this stage. What is hard, and requires very careful prompt doctoring, is to make it seem human. A parrot can mimic speech without understanding it, current language models are qualitatively the same. A fancy chatbot but a chatbot nonetheless.


Until a language model can develop a generalized solution to a real-world phenomena, it's not even close to AGI. The current iteration of ML algorithms are useful, yes, but not intelligent.


What is a generalized solution to a real-world phenomena?

Github Copilot solved my business problem by itself just as I would've done. Is that real-world enough and the solution generalized enough?


> Github Copilot solved my business problem by itself just as I would've done. Is that real-world enough and the solution generalized enough?

No, it isn't. Co-Pilot is unable to provide a rationalisation for the generated code and is incapable of assessing its security or performance properties.

It's a very advanced auto-complete that just happens to have been specialised on auto-completing code.


Actually, I just usually write a usage example, and Copilot does the code including "rationalization", because that's how my codebase looks - an example, then a description and then the implementation. It's using comments and written English language just like any other dev to explain what it does (again, it picked up that from my own code style).

> It's a very advanced auto-complete that just happens to have been specialised on auto-completing code.

It's not just an autocomplete. It's an intelligent autocomplete. As long as you need to generate text, it's practically generally intelligent.

I really wonder what will happen when somebody runs a text generator like Copilot in a loop with a simulated work-memory and longterm-memory and realtime I/O interface.


I mean... it can generate comment summaries explaining any block of code, or an entire class. It can generate comprehensive unit tests and usability tests. It can tell you the Big-O operational complexity of any code you (or it) writes, and it can put those all together. Is that not good enough for you..?


It can't do that. It can try to do that and it might work well in some cases. It is definitely incapable of doing that for any code you throw at it.

It also cannot generate comprehensive unit tests - it can can generate unit tests. The definition of "comprehensive" is way too subjective.

> Is that not good enough for you..?

No, it indeed isn't, but maybe that's because I actually have at least some idea of what happens behind the scenes and how the system works. From the implementation details, I can tell you for a fact that none of what you described can be done by the system in the general case - and especially not correctly. It's hit and miss depending on the input and that's basically a design limitation.


Are humans capable of doing that for any code you throw at them?


it seems that 'making a better' GPT-3 or similar model is like climbing higher up a tall tree and claiming you are closer to the moon... technically true, but...


So what problems do language models solve to a human-like level or higher?

I think answering that question should be required as part of any claim that a system is an AGI or nearly there.


They're (probably) better than humans at text prediction https://www.lesswrong.com/posts/htrZrxduciZ5QaCjw/language-m... (which is what they are trained on, so maybe not unreasonable).

There's also the idea that GPT-3/etc can produce text that's difficult for humans to distinguish from human-generated text at the lowest levels of quality (think like time cube), which is closer to AGI than say simplistic Markov generators. (How close is anyone's guess)



I don’t understand why we don’t talk about this kind of stuff more. It’s really amazing.


we should also notice that intelligence is not necessarily useful


clickbait nonsense




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: