Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

on that note, shouldn’t we ask people to CONSENT to having their data trained on? And when i say consent i mean asking them directly, not hidden in some terms of service. That’s just slimy…like a used car salesman lol.

Why is it okay for companies to just vacuum up all user data without 90% of users knowing it’s happening ?

Or shall the “stealing” of knowledge and creative works without consent continue?



LLMs need too much data to ethically source their data sets. That's why they rely on aggressive scraping, user-provided prompts, and of course straight-up piracy to fill their datasets.

Outcry made Adobe and other such companies put (opt-out) user controls for gathering training data, but writers, especially writers on the internet, are usually ignored. I've seen even the angriest "AI is stealing my art if you use Dall-E you're a bad person" people use ChatGPT, because they don't seem to consider writing to be art or expression as much as they do their own works.

Textual data just doesn't seem to be valued, and as a result data scrapers often don't care about annoyances such as "ethics" or "consent" when it comes to gathering training data.


There’s the rub. We pretend a change to the law will make LLM development stall, yet we acknowledge nobody is following the existing laws anyway.

Not sure how i feel about the whole thing to be honest. (legal gray area)


> We pretend a change to the law will make LLM development stall, yet we acknowledge nobody is following the existing laws anyway

No. The development is a given. Where it happens is not. That’s the point. If you want to use European data to train, you’d better not have a European nexus.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: