If babies learn much more from much less, isn't that evidence that the LLM approach isn't as efficient as whatever approach humans implement biologically, so it's likely LLM processes won't "scale to ago"?
For video data, that's not how LLMs work(or any NNs for that matter). You have to train them on what you want them to look at, so if you want them to predict the next token of text given an input array, you need to train it on the input arrays and output tokens.
You can extract the data in the form you need from the video content, but presumably that's already been done for the most part, since video transcripts are likely included in the training data for gpt.
If babies learn much more from much less, isn't that evidence that the LLM approach isn't as efficient as whatever approach humans implement biologically, so it's likely LLM processes won't "scale to ago"?
For video data, that's not how LLMs work(or any NNs for that matter). You have to train them on what you want them to look at, so if you want them to predict the next token of text given an input array, you need to train it on the input arrays and output tokens.
You can extract the data in the form you need from the video content, but presumably that's already been done for the most part, since video transcripts are likely included in the training data for gpt.