So playing devil's advocate. What if the courts just don't care, and rule that copying code verbatim is not a crime because you didn't copy it, and copilot is not a human so it can't commit crimes. What's the net effect of a system that draws upon all public code repos? It sounds... net beneficial to society?
On the plus side, a large body of work effectively becomes public domain. On the negative side, copyleft licenses lose their teeth. You probably see more power shift to those with big budgets. You probably see fewer things made source available, because you either have the public license or the private license now. This feels like a bad path but I'm not convinced the end result isn't better still.
I can setup my drone to detect me and attempt to crash into me. AI would be quite poor, probably would attempt to crash at any human. Would it be my fault it didn't crash into me and someone lost eyes?
Can I setup torrent box that automatically downloads and seeds all detected links from public trackers? Would I be responsible for it?
Both of these examples include you creating something and then using it. I don't know how copilot works, but using the second example, if you wrote a script to download and seed trackers, and someone else used it, I don't think you would be held under any liability, especially if you don't profit off of it.
wikipedia states: Generally, copying cannot be proven without some evidence of access; however, in the seminal case on striking similarity, Arnstein v. Porter, the Second Circuit stated that even absent a finding of access, copying can be established when the similarities between two works are "so striking as to preclude the possibility that the plaintiff and defendant independently arrived at the same result."
This is a different situation in which exact replication can be reasonably occurred without access to the original.
Secondly, can you actually claim Github has violated copyright if it doesn't have any claims to the work in question?
I think it's totally plausible that they win this in the long run.
1) So you are saying if I get a disk duplication machine I can freely copy and distribute blu ray disks as long as I don't watch the movie on the disk?
2,3) Seems pretty settled at this point, look at the cases around the VCR and copy machine. In general the one using the machine is liable. The creator of the machine can be held liable if there aren't substantial non infringing uses.
> It's not a violation of copyright to train a model.
Many people on HN assert this based on the Authors Guild vs. Google case, but it's quite important to keep in mind that that case was about Google creating a search algorithm, which is not generating "new" output.
We are talking about a very different kind of system here and in many other cases. Claiming the Authors Guild case sets precedent for these very different systems seems unbased to me.
> It's not a violation of copyright to train a model.
This is a very bold assumption, one that I assume will not hold in the court of law in all cases. I think the nuanced question is: to train a model that does what, exactly.
Let's say distributing meth recipes is illegal[1], can one legally side-step that by training a model that spits out the meth recipe instead? No court will bother with the distinction, causation is well-trod ground.
1. As an example - not sure if its illegal. You may replace with classified nuclear weapon schematics if you like.
It's not illegal to train a model to spit out classified nuclear weapon schematics. Possessing the original data might be. Releasing software that does this might be illegal, but not for copyright reasons, which is the issue at hand.
Could be. But I could also see the courts ruling an individual can't be liable for copyright violations if they never accessed the original work, which is generally required.
The really nice thing is that this basically creates a library of industry methods and practices. It'd be really nice to be able to destroy copyright trolls because what their patent "covers" is already a known and established industry method, or a prior art.
Would that mean I can start sampling songs if they get fed through a neutral network? It'll be fine if I train it on whatever is playing on the radio right? Doing the same for poems?
I would expect the legal argument to get into the intentions of the user and their relationship to the tool. I would also expect perspectives of art and code to diverge.
On the plus side, a large body of work effectively becomes public domain. On the negative side, copyleft licenses lose their teeth. You probably see more power shift to those with big budgets. You probably see fewer things made source available, because you either have the public license or the private license now. This feels like a bad path but I'm not convinced the end result isn't better still.