If a contributor comes to my project with a pull request that an AI wrote using ...

fgfm · on Dec 21, 2023

Thanks for sharing, that's an interesting social component of the equation. From your comment, I assume you're referring to something I've also encountered as a maintainer: we filter out signals where no efforts were put in. If I get the feeling that a PR is perhaps a bit useful but that the author has committed an LLM-generated piece of code, I'll be on the fence. If I'm asked to review a PR with the bare minimal added value, but the author has tried their best and is seeking help to get them started with OSS contributions, I will help. Was that your experience as well?

In that regard, the proxy for "no effort" usually defaults to "it looks like the PR doesn't check any of the guidelines in the CONTRIBUTING.md or the PR template". Here we're trying to always bring that guideline context, make it requestable, and inject it into your coding workflow. In the process, we want to educate those developers about your specific engineering culture.

Besides, code generation is inevitably going to become a growing part of software engineering. Here we're making sure this transition isn't operated without proper alignment or context. It's already challenging to get everyone on the same page in code reviews, so team alignment isn't a trivial problem and it's not gonna improve with the extra thousands of LoC developers will be able to produce each day. Or do you foresee a significant proportion of OSS maintainers consistently rejecting automatically-generated code?

davidmurdoch · on Dec 21, 2023

onion2k · on Dec 21, 2023

I imagine it's because including open-but-license-incompatible code (such as GPL3) would change the way the project legals work and potentially open you up to litigation.

Strictly speaking this is true for non-AI generated code too, such as a copy paste, but it's easier to tell when that happens. It's also true for closed source code but the fallout from that is going to take a few decades to manifest.

fgfm · on Dec 21, 2023

That last part feels very relatable to me: I've seen organizations who are mindful of the licenses of tools they use to avoid further problems, and others assuming that because it's closed source the problem won't ever arise.

License-wise, we're getting more and more transparency on the permissions that apply to the training sets of each OSS model. But I would argue that once we're passed that, developers are gonna raise their expectations:

- control over dependency multiplicity ~= "rewrite this using only a single linear algebra library with Apache 2 license" or even "rewrite this in pure Node JS"

- adding corresponding reference/license notice: the model copies/adapts a section of a library that requires copyright notice reproduction.

- transparency on the similarity with the source material if it was copied/adapted from somewhere else (even if the license allows this, this enters the realm of social courtesy/community codes)