We know that HN is visited by a fair share of Facebook employees.
Can some of you weigh in (anonymously?) on this topic? Do you guys do hard deletes of user data instead of just soft deletes? If so, are logs or backups kept? For how long?
In other words: if I'm a user of $POPULAR_SERVICE and I delete my account at time t0, is there a t1 > t0 after which every trace of my data is gone from the platform?
Google takes deletes seriously. Extreme efforts go into deleting stuff within 30 days of the user requesting deletion.
Imagine how hard that is when a datacenter is switched off for 14 days for maintenance, and then a fire breaks out and takes it offline for a further 20 days... When something is powered off, it's very hard to do those deletions... Yet misses of the deadline are exceedingly rare, even in cases like the above.
Sometimes disks are crushed in a crusher to meet the deadline if software approaches to deletion can't be done in time.
Excellent. Thank you for posting this. If anybody I would expect Google to take this serious, they have the eye of Sauron on them continuously and can not afford to trip up.
As does Microsoft. I'm always baffled at these "soft deletes are normal" threads. Maybe at a bootstrapping startup, but most real tech companies implement hard deletes within the first couple years.
> But clicking delete or unsend on a photo is not that.
In Google, a user clicking delete is treated exactly the same as a written deletion request.
In fact, the law requires that they be the same - "Therefore, an individual can make a request for erasure verbally or in writing. It can also be made to any part of your organisation and does not have to be to a specific person or contact point.". (https://ico.org.uk/)
Sorry, but no. That is a deletion request. The GDPR tells you exactly what to do once such a request is made. There is no such thing as a 'specific GDPR deletion request'.
That's true, but regulators operate outside of that and will take the intent rather than the letter of the law to heart, and the GDPR is quite specific in its language.
A person who may decide to bring suit however should always cite chapter and verse to lay down the line and to indicate that they are very serious about it. Just the fact that you would be citing that article will likely give you a better chance of seeing your request honored. But if your request is refused and you decide to tip off a regulator it won't make all that much of a difference, they will do their own investigation outside of the particular case and may broaden/narrow the scope of that investigation as they see fit.
This has already surprised more than one company by the way, they decided to play fast and loose with a single individual and as a result found their whole infra and processes under review with plenty of things found out of order. Fines were handed out that were higher than what it would have cost to arrange things properly in the first place.
Soft deletes are generally a good idea. Being to recover data for some time is very useful. Ideally this is also exposed to users. However respectable companies will keep this time-limited. For example Google has very strict deadlines on all of their deleted data wiping. Their privacy policy is quite vague but gives some examples: https://policies.google.com/technologies/retention
I work at Facebook, and I even work in storage, but even that might not make my experience as complete or relevant as you might think. You see, I work on one storage system. There are others that get primary data before us and still others that get it after us (for longer-term backup). There are systems above us that do their own replication on top of the service that we provide. There's a system off to the side to do all sorts of analytics on that data, which often involves copying some pieces of it. In fact, that system is our biggest internal customer, even bigger than the one that sits in the "normal" I/O path. There are systems whose whole purpose is to move data around between these others, which naturally requires some buffering.
You're probably starting to see the problem here. It's that the data actually exists in many systems, big and small, all of them with different processes and staffed by different teams. So deletion is really not a single operation but a coordination of many actions, relying heavily on a complex system of attribution and provenance to find all the places that each piece of data (among literally trillions) went.
All of this infrastructure is huge and it's active. I've been pinged many times while oncall to provide information or take actions in support of it. Every log stream, every database table, has to be carefully scrutinized to see if it could possibly contain user data, no matter how remote that possibility might be. It really is something we work hard at, and I know we're not perfect but anyone who says it's because we don't care is talking out of their ass. We're merely human.
That said, I'm hard pressed to explain the particular scenario in the OP. It seems to me that, no matter what other mechanisms are in place, there should be an egress filter to provide that One Last Check on data leaving our custody, and that should have kicked in here. But I have almost no interaction with Instagram from where I sit, so I can't speak for them any more than anyone else here can. Nor should I try. Probably said too much already.
I have actually worked at Facebook, but not on deletions. So I’m probably better informed than the other random speculation in the comments replying to you, but still, take it with a grain of salt.
My understanding is that deletion is a hard problem with entire teams working on it (imagine how many different random systems data flows to...) but that yes, the intended behavior is for deleted data to really be gone after 30 days. This is necessary to comply with GDPR and various other laws.
Of course, if two people each own a copy of a piece of data (for example, messages person A sent to person B), then person A deleting their copy won’t affect person B’s copy (just like how emails work).
Contrary to popular belief, Facebook doesn’t actually have anything to gain from nefariously storing data you delete. Ad targeting has plenty of non-deleted data to train on; Facebook has no incentive to break the law to keep tiny amounts of dubiously useful extra data on the margins. I’m almost certain the issue described here was genuinely a bug.
If you weren't working on deletions it would seem to me that you are not better informed at all.
The rest of what you write is an open book to anybody in tech. And whether Facebook has anything to gain or not from nefariously storing data you delete is a much lighter shade of gray than building up shadow profiles, profiles on people without an account.
For non-cynical reasons, no. It's basically 90 days for FB data, which is mostly because the majority of logs get deleted after 3 months.
However, there's usually some large slice of user data under legal hold, which legally can't be dropped as it's pertinent to some random long running court case, so not every trace of your data is gone.
This is how most services work at scale. It's much cheaper to set a flag than actually delete an entry in a database. The data can then be scrubbed by some periodic maintenance process.
Only that doesn't typically happen. It just sits there, for years or until the company goes bust.
The typical reasoning is that marketing wants to hold on to the data, they will never ever say 'ok, enough, you may delete it' because there is this infinitely small chance that they can re-activate an account, market to it for some other product (no matter that that is against the GDPR) or to sell the data to some third party if there ever is a cash crunch or panic. They see data as having positive value no matter what, whereas data that you shouldn't be holding on to is actually a liability.
Did I claim that Facebook never really deletes data?
I just answered the GP, maybe you have your threads mixed up?
But if I were to speculate I would say that if Instagram does what the article title says that you could already make that claim about Facebook since they are a part of it.
We’re in a thread that specifically asks whether Facebook really does hard deletes, so I took your comment as claiming they don’t.
> We know that HN is visited by a fair share of Facebook employees.
Can some of you weigh in (anonymously?) on this topic? Do you guys do hard deletes of user data instead of just soft deletes?
The problem with just a flag is it slows down queries. You can scrub it later, but that’s just kicking the can down the road and now you have to deal with the consequences of an actual hard delete.
I wonder when popular databases will have some first class support for soft deletion built in.
Can some of you weigh in (anonymously?) on this topic? Do you guys do hard deletes of user data instead of just soft deletes? If so, are logs or backups kept? For how long?
In other words: if I'm a user of $POPULAR_SERVICE and I delete my account at time t0, is there a t1 > t0 after which every trace of my data is gone from the platform?
My (cynical) guess is no, but I hope I'm wrong :)