> LLMs don’t “learn” from the information they operate on, contrary to what a lot of people assume.
Nothing is really preventing this though. AI companies have already proven they will ignore copyright and any other legal nuisance so they can train models.
They're already using synthetic data generated by LLMs to further train LLMs. Of course they will not hesitate to feed "anonymized" data generated by user interactions. Who's going to stop them? Or even prove that it's happening. These companies have already been allowed to violate copyright and privacy on a historic global scale.
I have no doubt that Microsoft has already classified the nature of my work and quality of my code. Of course it's probably "anonymized". But there's no doubt in my mind that they are watching everything you give them access to, make no mistake.
Nothing is really preventing this though. AI companies have already proven they will ignore copyright and any other legal nuisance so they can train models.