Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Whose code is it trained on? Do you respect their software licences?


next you'll say I need a license to read your comment because I'm copying it in my mind. Crazy!


Asking what the training data was, is valid. Knowing what these AI are trained on is to everyone's benefit.


Your comment is correct but misdirected.

I didn't say anything about OP's first question. Your comment is about that.


As a contrived example... If you train exclusively on AGPL source code, the probability of generating something identical to AGPL licensed code is likely non-zero.

This is a very important question.


An LLM is not the same thing as a human mind and does not automatically receive the same exception to copyrights.

Whatever you think makes sense, it would be wise to be careful, because the law might not turn out to be what you want it to be.


It will depend on the jurisdiction. It will be fun when a model is trained in Japan on material that would be copyrighted in the USA, then produces an image that is not able to be copyrighted in the USA.

> Japan's government recently reaffirmed that it will not enforce copyrights on data used in AI training.

https://cacm.acm.org/news/273479-japan-goes-all-in-copyright...


What exception?


Perhaps it's overly technical or even pedantic, but in light of arguments like "AI is just doing the same thing humans do", I think we should admit it:

Yes, we are making a copy in our minds when we read something.

I suspect such copies are allowed (as an exception to copyright law) mostly because lawyers and judges don't think about it. Nonetheless, once we do think about it, the law isn't required to treat humans and LLMs in the same way, or allow LLMs to do something simply because humans are allowed to do something similar.


You are not making a copy - because you are not able to identically reproduce it.

And, if you are and you do, that's likely an infringement. Most jurisdictions say that, for example, you can't perform an in-copyright creative work without compensating the owner. Look at the lawsuit around George Harrison's "My Sweet Lord" for an example.


Most people could memorize enough to count as a copyright violation if they did reproduce it; as you said, actors preparing for a role certainly do so. And people with excellent memories could remember much more.

And yes, there would be a violation if someone then made another copy by reproducing it from memory. But the copy in the mind is overlooked, or forgiven. That's a copy too, just as the copy stored somewhere in the weights of an LLM is a copy.

We may not understand exactly how it's stored in either case, but it's in there somewhere. And it's worth saying again: the copy in an LLM may not be overlooked by the law.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: