> If I ask you to reproduce a block of GPL code in my codebase and you do it, you violated the license. It does not matter that I primed you or lead you to that outcome. What matters is the legally protected code is somewhere it shouldn’t be.
This isn't accurate. If I reproduce GPL code in your codebase, that's perfectly acceptable as long as you obey the terms of the GPL when you go to distribute your code. In this hypothetical, my act of copying isn't restricted under the GPL license, it's your subsequent act of distribution that triggers the viral terms of the GPL.
The big question that is still untested in court is whether Copilot itself constitutes a derivative work of its training data. If Copilot is derivative then Microsoft is infringing already. If Copilot is transformative then it is the responsibility of downstream consumers to ensure that they comply with the license of any code that may get reproduced verbatim. This question has not been ruled on, and it's not clear which direction a court will go.
> The big question that is still untested in court is whether Copilot itself constitutes a derivative work of its training data.
Microsoft has a license to distribute the code used to train Copilot, and isn't distributing the Copilot model anyway, so it doesn't matter whether the model itself infringes copyright.
Whereas that same question probably does matter for Stable Diffusion.
As in " including improving the Service over time...parse it into a search index or otherwise analyze it on our servers" is the provision that grants them the ability to train CoPilot.
(also, in case you're wondering what happens if you upload someone else's code: "If you're posting anything you did not create yourself or do not own the rights to, you agree that you are responsible for any Content you post; that you will only submit Content that you have the right to post; and that you will fully comply with any third party licenses relating to Content you post.")
But you may not have the rights to grant that extra license if CoPilot is determined to violate the GPL, they can yell at you all they want but they will have to remove it, as nobody can break someone else's license for you.
It'll have to be tested in court, but likely nobody actually gives a shit.
> But you may not have the rights to grant that extra license if CoPilot is determined to violate the GPL
Which is why that second provision is there to shift liability to you. You MUST have the ability to grant GitHub that license to any code you upload. If you don't, and MS is sued for infringing upon the GPL, presumably Microsoft can name you as the fraudster that claimed to be able to grant them a license to code that ended up in copilot.
How is that different from a consultant who indiscriminately copies from Stack Overflow?
Tangent to that is the "who gets sued and needs to fix it when a code audit is done?"
Ultimately, the question is then "who is responsible for verifying that the code submitted to production isn't copying from sources that have incompatible licensing?"
The consultants would have to knowingly copy from somewhere. One can hope they're educated on licensing, at least if they expect to get paid.
If Microsoft is so confident in Pilot doing sufficient remixing then why not train it on their own internal code? And why put the burden of IP vetting on clients who have less info than Pilot?
> How is that different from a consultant who indiscriminately copies from Stack Overflow?
and how is that different from a student learning how to code off stackoverflow (or anywhere else for that matter), then reproducing some snippets/learnt code structure, in their employment?
Or a random employee copies some art work that is then published ( https://arstechnica.com/tech-policy/2018/07/post-office-owes... ). You will note all the people that didn't get in trouble there - neither the photographer who created the image, nor Getty in making it available, nor the random employee who used it without checking its provenance.
In all of these cases, it is (or would be) the organization that published the copyrighted work without doing the appropriate diligence on checking what it is, if it would be useable, and how it should be licensed.
> The Post Office says it has new procedures in place to make sure that it doesn't make a mistake like this again.
... which is what companies who make use of AI models for generating content (be it art or code) should be doing to ensure that they're not accidentally infringing on existing copyrighted works.
Pilot is regurgitating snippets of code still under copyright and not in the public domain. Some may consider publicly available code fair use, but the fact that they're selling access for commercial use may undercut that argument.
This isn't accurate. If I reproduce GPL code in your codebase, that's perfectly acceptable as long as you obey the terms of the GPL when you go to distribute your code. In this hypothetical, my act of copying isn't restricted under the GPL license, it's your subsequent act of distribution that triggers the viral terms of the GPL.
The big question that is still untested in court is whether Copilot itself constitutes a derivative work of its training data. If Copilot is derivative then Microsoft is infringing already. If Copilot is transformative then it is the responsibility of downstream consumers to ensure that they comply with the license of any code that may get reproduced verbatim. This question has not been ruled on, and it's not clear which direction a court will go.