This is alleged, and it is very likely that claimants like New York Times accidentally prompt injected their own material to show the violation (not understanding how LLMs really work), and clouded in the hope of a big pay day rather than actual justice/fairness etc...
Anyways, the laws are mature enough for everyone to work this out in court. Maybe it comes out that they have a legitimate concern, but the way they presented their evidence so far in public has seriously been lacking.
Prompt injecting their own article would indeed be an incredible show of incompetence by the New York Times. I'm confident that they're not so dumb that they put their article in their prompt and were astonished when the reply could reproduce the prompt.
Rather, the actual culprit is almost certainly overfitting. The articles in question were pasted many times on different websites, showing up in the training data repeatedly. Enough of this leads to memorization.
They hired a third party to make the case, and we know nothing about that party except that they were lawyers. It is entirely possible, since this happened very early in the LLM game, that they didn’t realize how the tech worked, and fed it enough of their own article for the model to piece it back together. OpenAI talks about the challenge of overfitting, and how they work to avoid it.
Anyways, the laws are mature enough for everyone to work this out in court. Maybe it comes out that they have a legitimate concern, but the way they presented their evidence so far in public has seriously been lacking.