Seems like a bad bet to me. It looks like authors are going to lose this case setting the precedent that you not only don’t need to license training data, obtaining it illegally (for free) is totally okay.
Didn't Anthropic's case already set the precedent that training itself is fine? It's not like copyrighted novels are a large portion of human-generated text data. It's just the stuff that's easier to get because it's preserved in bulk.
Video transcription has more or less been solved. Imagine how much data Google has in YouTube transcripts. And the longer these AI chat bots operate the more data they manage to collect for training as well (I think Google making it so you can easily upvote or downvote a response by the bot is a good idea).
Is this maybe more about the quality of the documentation? I say this 'cause my thinking is that reading is reading, it takes the same time to read the information.
I think it varies. Most enterprise software is good enough if it just works. In the consumer space quality and polish is way more important. Then there are things like modeling and video where performance is a much bigger deal.
Sure, no one really cares about the code but the quality of the code matters more for some products (and in different ways) than others.
The article isn't clear on this point, I believe because Meta isn't clear on this themselves. Other bits of this piece highlight third parties reviewing the responses of the AI assistant; it's possible that people are recording and some sound they make triggers the AI assistant which, in turn, leads to the video being reviewed.
OTOH, Meta could just be desperate for training content and they're just slurping up all recordings by people who've opted into the AI function. It would be great for them to clarify how this works.
My reading was that as soon as you enable the "AI" functionality you are opted into having your recordings labeled.
"But for the AI assistant to function, voice, text, image and sometimes video must be processed and may be shared onwards. This data processing is done automatically and cannot be turned off."
Right, that's the section I was confused by because it was in the context of an experiment trying to use the AI stuff without an Internet connection, which obviously won't work. The article is using the "shared onwards" terminology to refer to at least inference. But the inference part is uninteresting to me, and the data labeling is. The article doesn't really separate those out.
I would figure if there is AI labeling that some things will confuse the system and will be sent to a human. And some things will randomly be sent to a human for error checking. Same thing with Alexa, I figure there's always a low probability chance that anything I say to her will end up reaching a human. She's not always listening as some people fear (the data use would have been detected long ago if she were), but humans occasionally trigger her accidentally--and such errant triggers will be more likely to be sent to a human because they are not going to make sense.
This was one of the first hits on Kagi. 404 has a similar article (I think) but it's behind a paywall.
"The demand for this ‘Ray-Ban hack’ has been steadily increasing, with the hobbyist’s waiting list growing longer by the day. This demonstrates a clear desire among Ray-Ban owners to exercise more control over their privacy and mitigate concerns about unknowingly recording others."
I can take a verbal description from a meeting with five to ten people and put together something they can interact with in two weeks. That is a lot slower than Claude Code! Yet everywhere I’ve worked, this is more than fast enough.
Over two more weeks I can work with those same five to ten people (who often disagree or have different goals) and get a first draft of a feature or small, targeted product together. In those latter two weeks, writing code isn’t what takes time; working through what people think they mean verses what they are actually saying, mediating one group of them to another when they disagree (or mostly agree) is the work. And then, after that, we introduce a customer. Along the way I learn to become something of an expert in whatever the thing is and continue to grow the product, handing chunks of responsibility to other developers at which point it turns into a real thing.
I work with AI tooling and leverage AI as part of products, where it makes sense. There are parts of this cycle where it is helpful and time saving, but it certainly can’t replace me. It can speed up coding in the first version but, today, I end up going back and rewriting chunks and, so far, that eats up the wins. The middle bit it clearly can’t do, and even at the end when changes are more directed it tends toward weirdly complicated solutions that aren’t really practical.
reply