Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

disclaimer: I work on GenAI at google, but views are my own

The question is, how did the model create Mario&Luigi or Scrooge McDuck without training on copyrighted data? It can't just crawl Wikipedia because Fair Use in Wikipedia doesn't constitute Fair use for a commercial AI model.

One possible outcome is more transparency on what datasets were used to train the models.



Disclaimer: ibid

> It can't just crawl Wikipedia because Fair Use in Wikipedia doesn't constitute Fair use for a commercial AI model.

Why not? The lawyers I've discussed this with socially think that questions like this are unresolved. There are certainly competing legal theories, but we're in uncharted territory. No one knows what the outcome will be until rulings come down or Congress acts.

I find the NYT's argument a little hokey. Where are the damages? No one is using ChatGPT to read NYT articles and the residual value of day old news stories is close to zero.


> > It can't just crawl Wikipedia because Fair Use in Wikipedia doesn't constitute Fair use for a commercial AI model. > Why not?

Because it’s tantamount to lying and deceptive conduct? It’s like asking for a licence to use something non-commercially, getting a hold of it, and conveniently deciding 10 minutes later, that you’re actually going to become a re-seller for all this stuff you have. Or going to the soup kitchen because you don’t want to pay your private chef tonight.


This analogy doesn't work. Fair use is an affirmative defense to copyright infringement claims. Entities that are training models largely claim that their uses are transformative and fall under fair use. Creative Commons, among others, agrees with this position. [0] If they're right, it simply doesn't matter what license a copyright holder is offering.

There are competing legal theories and no one can say how courts are going to rule on these issues. Smart lawyers who work on copyright and AI don't know. Technologists certainly don't know.

[0] https://creativecommons.org/2023/02/17/fair-use-training-gen...


Then there is the argument that the rules around fair use aren't even reached because the training of the model doesn't even do anything that requires a fair use exemption.


That's a good point. I agree it's not clear cut one way or another and we gotta let it play out.


Training is probably going to turn out to be fair use as the suits settle:

https://www.eff.org/deeplinks/2023/04/how-we-think-about-cop...

It's the usage and not the training that needs to be policed, and the answer there is going to be that Google or OpenAI or whoever is going to make bank by creating a fine tuned model which can detect copyright infringements and providing access to it to companies to double check gen AI outputs for exact or "similar enough" infringements.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: