The idea is that any user supplied data has to be separated into two groups: Privacy relevant information, which has to be stored encrypted, and can not be used for advertising, for training models, etc – like communications, private messages, email, etc.
And into user-supplied not-privacy-relevant data. The results of users filling out captchas, classifying info for training of neural networks, the data from Google’s MapMaker, etc.
In the case where these two sets of data overlap, privacy is more important, but on request, the company has to hand out every data you ever gave them in a machine-readable format. So if you decide to leave facebook, they have to give you a .zip with your photos, your posts as xml, etc.
The problem is there is a lot of gray area. In fact I think there is probably more utility in spelling out the cases where privacy is not a concern. Captcha is a good, specific example. However releasing that training data would break the system for everyone. What are some others? Is Google takeout up to par with what you're suggesting?
Google takeout is not nearly good enough, but an acceptable first step.
For captcha, the system has been broken for a long time. Specifically, you can outsource captcha solving to people in third world countries for tenthousands of captchas per dollar.
And into user-supplied not-privacy-relevant data. The results of users filling out captchas, classifying info for training of neural networks, the data from Google’s MapMaker, etc.
In the case where these two sets of data overlap, privacy is more important, but on request, the company has to hand out every data you ever gave them in a machine-readable format. So if you decide to leave facebook, they have to give you a .zip with your photos, your posts as xml, etc.