Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I've thought about this issue a lot - I participated in some deep research into the training data that went into Stable Diffusion, for example: https://simonwillison.net/2022/Sep/5/laion-aesthetics-weekno...

I'm beginning to settle into the conclusion that the trade-offs here are worth it.

The positive value provided by these tools outweighs the massive number of people who have each been hurt in a very small way.

I'm not happy about this. But I also don't think it's possible to build tools that are this powerful without massive amounts of training data that have been gathered without consent.

I've written a lot of material that has gone into training these models. Copilot for example seems to know how to write Datasette Plugins, and I've written over 100 of those over the past five years (way more than anyone else) and shared the code on GitHub. I'm delighted that my code has been used in this way, even though I don't get attributed for it.

But I also don't make a living in a way that's damaged by these models (I don't think they're going to replace me as a programmer: I just think they're going to let me be more productive and effective).

If I was an illustrator working on commissions who was losing work to generative image models that perfectly initiated my style I would likely feel very different about this.

So it's complicated. I still like my AI vegan analogy from a few months ago: https://simonwillison.net/2022/Aug/29/stable-diffusion/#ai-v...

I'm most definitely not a vegan, but I do think about how the AIs I am using have been created. And then (like eating meat) I decide to use them anyway!



I’ve also thought about this issue a lot.

> in a very small way

Financially it is certainly a small way for me, as I made a conscious choice to not depend on content creation or OSS dev as primary revenue streams and (while I can’t say the same about future generations) I’m somewhat sure in my work I won’t be replaced by an OpenAI product within the years I have left.

However, I have gotten where I am in many respects thanks to countless people who shared information in an open way—and some of those people either made living from ads/donation buttons/upsells or enjoyed knowing that somebody appreciates the work they do, maybe even writes them about it, etc. All of those incentives depend on people visiting their web property, and all of those incentives are being taken away.

(It’s adjacent but slightly different than the plight of illustrators or writers some of whom justifiably feel as if they dug their jobs’ graves by doing what they love in the open. Having a few close friends in art space, I feel ashamed for the bait-and-switch that people in ‘my’ industry have served them.)

> I'm not happy about this. But I also don't think it's possible to build tools that are this powerful without massive amounts of training data that have been gathered without consent.

Attribution would solve part of the abovementioned issues, and intuitively it seems trivial—some extra layers that let the network reflect on where it might’ve gotten the inspiration for this or that wouldn’t require any true “understanding” by ML (which like you I consider an oxymoron), and surely the investment they raised or profits they make could handle it. Call me a cynic but I suspect they intentionally don’t make this a priority.

> I’m most definitely not a vegan

So far as meat-eating is legal, I’m not a fan of this analogy. While I know for a fact that many people would find this useful, “whether to respect IP or not” is just not up to personal choice—IMO the only factor that makes ChatGPT’s a slightly vague issue (as in “not definitely illegal but still ethically dubious”), like SD’s, is that IIRC the model is supposed to be open to all for private use.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: