Sure, but as is a big discussion in this thread, that public private distinction is quite messy. We don't have just private chats and old twitter style where everyone can see everything. Facebook has posts which you can share with only friends, or with even more limited groups. This changes your perceived privacy levels and expectations. We have various group chats with widely different expectations of privacy.
But RTFA, because what it is discussing is not public facebook/twitter posts. I'm not sure why the discussion keeps moving there as if that's the only case. Besides, a public facebook post is still only visible to those on facebook. It is as public as a private country club, even if it's easy to get access. What about what you write in copilot? What about your code on github? In a private repo? The article mentions this is already protected under current law as personal information.
The fundamental problem here is about consent. What would a reasonable person expect? 10 years ago certainly no one expected this. Mass gathering of data was only for the biggest of tech and governments. Now it's in the hands of anyone that wants to train an LLM. Things changed very fast and I think a lot of people are forgetting that.
> The fundamental problem here is about consent. What would a reasonable person expect? 10 years ago certainly no one expected this.
I just don't think that's the right angle. I think if you're in the public sphere, the way that you anticipated your creations to be used just shouldn't matter at all. If you've given something to the public, the public should not be limited to whatever you thought they'd use it for.
And the conversation keeps coming back to that because people keep making that argument. I specifically agree that private Facebook posts are not covered under this argument! However, I also think that consent as a basis for analysis just doesn't apply to public posts.
Why does everything have to be binary these days? We aren't machines. The world is messy and so are we humans. Removing nuance from conversations and ideas does not help anything. The scenario you laid out, continued, leaves no person with privacy anywhere. I specifically don't want to explain because I specifically want more people thinking and thinking hard. To specifically attempt to break from their way of thinking and try to see things from a different point of view. This is important especially when there is disagreement. You don't have to be convinced by my arguments, but you certainly do have to understand them. There definitely isn't enough of that happening these days. Our interconnectedness has only made the world more complicated, not more simple. It has made communication more difficult too, as we talk with many who have far larger differences in priors than we would were we constrained to in person communication.
As for consent, you may disagree, but that's how our entire legal framework works, and I for one am happy about this. Consent is the foundation of any democracy. A world without consent isn't figuratively tyranny, it is literally tyranny. Agree or not, the law does not give you the right to use any information (words spoken, pictures taken, bits, or whatever) that is publicly available/viewable in whatever way you want. There are rules. It is why Taylor Swift can't even post a picture of herself on Instagram and why you can't follow someone around all day taking pictures of them (aka. stalking). There are limitations because we live in a society and I for one think this is a good thing.
Yes, you are right when posting something public that people shouldn't limits on use for only what the poster explicitly intended. But that freedom is neither unlimited. That's the point I'm making. The line of public and private is getting blurrier by the day. These laws are always made by that of a reasonable expectation. Certainly no one that posted on Facebook 10 years ago had a reasonable expectation that their post would be used to train an LLM.
Personally speaking: all the art I consume is remixes. Almost all the books I read are fanfics. If the repurposing of cultural content without the consent of the author becomes illegal, my cultural universe will vanish overnight. As such, I think I'm just on the other side of that fight.
If we have to lose privacy and control in order to keep our participatory culture, I'm sorry, I'll take culture. If it's a multidimensional topic, I'm for tightening privacy in private spaces and loosening privacy in public spaces. If there's only one axis, I'll push the lever towards loose.
> Personally speaking: all the art I consume is remixes. Almost all the books I read are fanfics. If the repurposing of cultural content without the consent of the author becomes illegal, my cultural universe will vanish overnight.
You'd be surprised because these industries have figured these problems out. What is allow without license, what needs royalties, what needs permission, and what can't be done. This was a huge conversation in the 90's and you can find many discussions about hip hop (around sampling) and copyright law.
Your culture is not in danger, even if you don't know its history. I'd suggest learning it though, because that's the best way to ensure your culture stays out of danger (a culture I am also part of fwiw).
To the best of my knowledge, all claims that fanfiction is in the clear, that it's okay to do derivative works if you aren't charging money, and so on, are all just fandom folklore. If any author wanted to legally shut down fanfic, to my knowledge they could. And I'm against that - hell, in my opinion fanfiction writers should be able to sell their work even without authorial consent. Look at the Touhou scene, look how much cultural production was achieved just by one popular creator being chill about copyright. The idea that you can own exclusive rights to a story or a setting is a historical aberration.
Yeah, that's to the best of my knowledge true but IANAL. I'm with you about the fanfic scenario too. I'm also very open about sampling. But I think these issues are different than the data that we're talking about in this thread. If data can be recovered (an in some ways it is, others it isn't) then that's not really derivative. Derivative also needs some distance and not be too close. And importantly, these data are being used to create a product that is being sold. Where the processing of the data is the thing of value.
My point is that the environment has changed and there's a lot of gray area here. Turning this into a binary distinction is unhelpful. There's new nuances here and there were new nuances when sampling became popular in hip hop. We need to have open and honest discussions about these things, and I think a lot of discussions we have or observe are rather dismissive of these nuances (coming from both sides of the debate fwiw). I'm obviously very open to using data but we must also be aware of our data privacy, how it can be used, our social contracts, and what a reasonable level of a priori consent is. If we overly simplify these conversations then they aren't actually discussions. My points are to this, that there is gray and that there are very clear cases where you do not have unlimited access and usage to works that are publicly available. Private ownership is the root of capitalism afterall, and so it should be rather unsurprising that we have many laws and social contracts over ownership and the extent of what one may do with things they did not create. There certainly is a lot of anger and frustration in these conversations and I don't expect an artist making their living off of their art to understand all these nuances nor am I surprised that they are upset and possibly afraid. This is new territory and pretending it isn't is just as obtuse as calling generators fuzzy copy-paste machines.
But I want to be clear that we can have both goals. We can protect data rights, privacy, fairness AND have this sampling and creativity culture, for lack of better words. We just need to be careful, nuanced, and thoughtful to determine how to do this though. We won't be perfect and won't make everyone happy, but we can maximize social agreement conditioned under fairness and privacy. I just want to ensure that we are not approaching this conversation as that there are clear answers in what can be done with data and what can't be. Hell, we don't even have that answer for music, sampling, or fan fiction. We have answers as to what laws say, but even as you point out, that's ambiguous in many cases without even considering that the environment is not only changing, but changing rapidly. I think we all understand that there is a difference between using the Akira slide compared to the "Ice Ice Baby v Under Pressure" scenario. No one has the answers, and that's why we need to talk. And unfortunately "edge cases" are the norm in topics like these.
Note: I am an ML researcher. I use publicly available data to train models that have images of people, their art, their animals, their property, and such that I'm sure many do not know exists in these datasets. Similarly I do not even know all the data within some of these datasets. But I can still recognize that there is a gray area that exists here and personally I see it as my ethical duty to ensure we have these discussions in an open and honest way to determine the limits of what I should and shouldn't be able to do. It isn't up to me, it is up to our society to create a social contract.
Yeah well, I'm part of society and I've said my opinion on the matter. :P
Sure, it's a continuum. I'd say it's more that, viewed as a continuum, my opinion about the ends of the spectrum is binary, in that I want one end to go up and the other to go down. That doesn't mean I want to split it into two cases so much as that I want to consider it among other things on an axis between two points, "private" and "public", and my opinion is a function of placement between those two points.
Artists are frustrated about fanfic too! They feel genuine ownership of these characters. I've seen people have borderline breakdowns at fanfics. The idea that people are doing things that they didn't intend to "their" characters can be genuinely traumatic to a writer. If you make that a question about consent then fanfic becomes comparable to abuse, and I just don't agree that those should be the terms. Any work placed before the public is inherently participatory. Copyright splits the world into "creators" and "consumers" and I think that's problematic, because there's no such thing as passive consumption of a work. The listener or reader recreates the experience of the work in their head, based on their own assumptions, preconceptions and personality, and if they have an experience or viewpoint that diverges from that of the author, they must be able to share that in turn. So I have a genuine values disagreement with some of what we currently call "rightsholders" in the matter.
What's that got to do with deep learning? Very little: I just see where it leads if you phrase the topic about consent, and it's not a world where unwashed readers can get their dirty mitts on their characters.
(My personal favorite is... closed-source Minecraft (Java) mods. "Do not decompile against my wishes!", the description said. Minecraft mods, meanwhile, are only possible in the first place by decompiling Minecraft's source code against Mojang/Microsoft's wishes. You were this close to learning something!)
But RTFA, because what it is discussing is not public facebook/twitter posts. I'm not sure why the discussion keeps moving there as if that's the only case. Besides, a public facebook post is still only visible to those on facebook. It is as public as a private country club, even if it's easy to get access. What about what you write in copilot? What about your code on github? In a private repo? The article mentions this is already protected under current law as personal information.
The fundamental problem here is about consent. What would a reasonable person expect? 10 years ago certainly no one expected this. Mass gathering of data was only for the biggest of tech and governments. Now it's in the hands of anyone that wants to train an LLM. Things changed very fast and I think a lot of people are forgetting that.