Right, but that sounds like the bigger issue here is that the model might spit out copyrighted material, not just that it scrapes it. The former seems like a technology problem that Microsoft can solve.
The issue is that not only might the model spit out copyrighted material verbatim (which it is) but that it might also spit out non-obvious derivative works that will get you in legal hot water years down the road.