This is pretty interesting for AI in general. Should you be able to train with material you don't own? Can your training benefit from material that has specific usage licenses attached to it? What about stuff like GameGAN?
> ...Should you be able to train with material you don't own?
If relating this to how humans learn, books and other sources are used to inform understanding and human knowledge. One can purchase or borrow a book without actually owning the copyright to it. Indeed, a given passage may be later quoted verbatim, provided it is accompanied with a reference to its source.
Otherwise, a verbatim use without attribution in authored context is considered plagiarism.
So, sure one can use a multitude of material for the training. Yet, once it gets to the use of the acquired "knowledge" - proper attribution is due for any "authentic enough" pieces.
What is authentic enough in this case is not easy to define, however.
"If relating this to how humans learn" seems like a big IF though right? Are we going to treat computer neural nets as human from a legal standpoint?
At some point Neural Nets like GameGAM might be good enough to duplicate (and optimize) a commercial game. Can you then release your version of the game? Do you just need to make a few tweaks? Are we going to get a double standard because commercial interests are opposed depending on the use case?
It would be pretty funny if Microsoft as a game publisher lobbies to prevent their IP being used w/ something like GameGAN, but then takes the opposing stand point for something like their CoPilot! Although I'm sure it'll be spun as "These things are completely different!".
This is the key question. In school I was taught to be careful to always cite even paraphrased works. If Copilot regurgitates copyrighted fragments without citation or informing acceptors of licenses involved then it's facilitating infringement.