It is obviously a technical challenge. Image yourself trying to facing 3 pictures, one of CSAM, another of terrorist material and just some random photo of random guy eating ice scream: what steps would you do to determine if this last photo is "legal" or not?
YouTube have decades of attempt at determining if uploaded content violates copyright, normally through fingerprinting content submitted by right holders. YouTube still fails at catching all copyright infringement. That's why they are generally protected from prosecution while they demonstrate reasonable attempt that preventing their services from being misused.
Imagine ALL IMAGES IN THE WORLD being submitted for fingerprinting. How about malicious or erroneous submitions that taint datasets?
Also: right to be forgotten is a horrible misleading name. At most you have right to request data to be removed from datasets. And only if you have some legal basis that demonstrate that your data is in that dataset. It is not a right to be "forgotten". Imagine you yourself being told to "forget an image, sound, etc", how is this enforced? You may even ask for proof that you know the content you are being told to forget. How do you do this?
If you have the image and the person reaches out saying it’s them and they want it removed, you have a pretty good idea of the legality. As stated, this isn’t a technical problem it’s a bureaucratic one. Most companies didn’t think they had to remove it so didn’t have steps in place to do so. Now a lot of them do, with GenAI just being the latest group of companies who feel they have an argument in favor of not having to do so.
They only get rid of those materials that they can seize. I doubt they will be able to seize a properly hosted onion domain. It's just that most of the actors aren't good enough with technology.
That's the reason you mostly see the pretty dumb guys getting caught. As long as you are smarter and more tech-savvy than 80% of the criminals, you are pretty much out of reach for the feds.
The government has enormous trouble getting rid of CSAM. The FBI had databases of content that is decades old - basically images recognised as "classics" that are still shared amongst the consumers of such things.
It is illegal to possess, but despite the most aggressive enforcement on the planet it is still out there.
> One artist even found that her private medical records containing photos of her facial condition were in LAION, a foundational dataset for many GenAI products.
From the article I'm extremely skeptical people have actually identified anything, since LAION does not store content. LAION is a list of links to content hosted by others, publicly.