Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Was anyone really thinking of those as private?


Sure, if you pay for the product, the expectation is that the data is not used for training, because that's what the contract says. And if you have a temporary chat, the data will be deleted after a day or two.


Paying for it doesn't change that the data is on their server.


Unfortunately yes, by a lot of non-technical people.


Why does it have to be about being technical or not? You’re signed into an account with no obvious social networking capabilities, what about chatgpt screams “this will be public chat between me and an llm” ?..


It's not your computer so of course it isn't private. Apparently you do need to be technical to understand that.


Yes, you do indeed need to be technical to understand that. The tech industry, and that includes most of us here (especially all the FAANG people that curiously always stay silent in threads like this one), has worked very hard to make everyone believe that online privacy is a thing, while working even harder to undermine that at every possible step.

Ordinary people expect stuff that they don't actively share with others to stay private. Rightly so! It's the ad industry that got it wrong, not the People.


I never talked about it because it seemed obvious to me until today. I kind of see why people are confused now that I think about it a bit and read other people's replies.


The FAANG people also have a lot of direct personal experience contradicting a lot of mainstream FUD titled "All your data is being sold to the lowest bidder".

Having worked at one of those companies (and having quit that job being disillusioned by a lot of things), there is still so much mainstream misinformation about this. Yes data is often used for tracking and training. In aggregate form. Sensitive data is anonymized/de-id-ed. The leading research on these techniques are also coming out from these companies btw.

There are layers and layers of policy and permission safeguards before you're allowed to access user data directly as an engineer. And if/when someone tries to exploit the legitimate pathways to touch user data (say customer support), they get promptly fired.

But it's much easier to believe that FAANG is some great monolithic evil, out to surveil you personally for some vague benefit that never gets specified. All the legitimate concrete monetary benefits (e.g. tracking for ad targeting work and training ML models) can be had just as well with aggregate data, but privacy FUD doesn't want to listen to that.

Meanwhile stupid legislation and the ability of courts and law-enforcement to subpoena any data they want whenever they want keeps data on their servers longer than they'd want to. Yet people will prefer to blame the "Evil Tech Cartel" instead of multiple branches of their government wanting to read their texts and GPS logs.


> All your data is being sold to the lowest bidder

There aren't that many possibilities on how geolocation data vendors get access to high-precision location data of millions of people. A publicly traded company that generates revenue from targeted ads can never be fully trusted to behave. A social network that optimizes for time spent looking at ads will never really care about its users well-being. Algorithmic feeds are responsible for a widening social divide and loneliness. Highly detailed behavioral analysis can hurt people even when aggregated, for example when they get less favorable insurance terms based on their spending habits. Data that can be used to increase revenue will not be left untouched just to keep moral higher ground. Sensitive information shared with an LLM that end up in training data today might have dangerous consequences tomorrow, there is no way to know yet.

This isn't even about proper handling of individual pieces of data, but the higher-order effects of handing control over both the world's information and the attention of its inhabitations to shareholder-backed mega-corporations. There are perverse incentives at play here, and anyone engaging in this game carries responsibility for the outcome.


> There aren't that many possibilities on how geolocation data vendors get access to high-precision location data of millions of people.

In a world where cellphones have all sorts of radio antennas on at all times, there are more ways than you'd think.

> A publicly traded company that generates revenue from targeted ads can never be fully trusted to behave. A social network that optimizes for time spent looking at ads will never really care about its users well-being. Algorithmic feeds are responsible for a widening social divide and loneliness.

I'm really not interested in debating dogmatic philosophy about how cynical one should be in the world. The entire point of my comment was that cynicism induces FUD that's not necessarily backed by direct evidence. One can come up with all sorts of different theories to explain what's happening in the world. Just because they sound somewhat consistent on the surface, doesn't mean they're true. That's just inverted inference.

I do agree with you that there are bad incentives in play here, but if we don't want them to be exploited and actually care about privacy, we should convince our effing legislators to plug the loopholes and enshrine online privacy in actual law. Instead of companies being able to write whatever they want in their Terms of Service. And then create mechanisms to enforce said legislation. Instead of moralizing actions of a company as some sort of monolithic (un)-ethical entity.

I think humanizing and moralizing the actions of large companies is a gigantic waste of time. Not only it accomplishes nothing, it gives us (the affected party) a distraction from focusing our efforts on the representatives that we elected who aren't doing their job. Maybe it's representative of where we feel we can make change


I think you're confusing cyncism with reality and logic.

It's not cynical to say ad-driven social networks are adversarial to their users, it's logical and unavoidable. Because they're optimizing for different things.

Networks want the best, most targeted ads, so they need the most data. They want the highest watch times and retention, so they MUST develop addictive algorithms.

It's like selling a cigarette. Is there any non-adversarial way to sell a cig? No. You're optimizing for the most smoking. Okay great, let's concentrate the Tobacco then so we have more nicotine. Let's use butane rings so the cig burns faster.

I do agree 100% with your points about legislation - this is the only path forward. And, about not humanizing corporations. Corporations are more akin to machines or algorithms.

But, because they're more akin to machines or algorithms, we can prove when, and why, they are working against our interests, and it's not cynicism.


The cynicism I'm referring to is not simply about recognizing the conflicting objectives at play. Those are abundantly evident. The cynic takes that to an extreme of "Since this entity is adversarial to me, it, and everyone participating in it makes the worst, most evil choices possible with blatant disregard to any consequences to others or themselves."

There's an illogical extension of conflict that's sometimes applied in this context, with heavily implicative language that's often misleading. No Google isn't interested in reading your personal email (as if Google as an entity could have any interest in the first place), they will definitely serve you targeted ads and sell product integrations based on it though.


I would agree, but I will say the waters get murky when we factor in data breaches and things like this subpoena. Keeping data, even if it's just used for predictable usecases, isn't free. There's a liability there, a risk, that most users do not understand.


Absolutely. I'm not arguing there aren't many exposure vectors to having your data out of your direct influence. It's the quickness to jump to malice (on part of the companies) regarding it instead of a combination of many factors (incompetence, murky/weak legislation, myopic greed and sometimes actual malice), without using concrete evidence to make those judgments that bugs me.


> In a world where cellphones have all sorts of radio antennas on at all times, there are more ways than you'd think.

That doesn't explain why soldiers can be identified by their location traces at known military sites; the data must be sent from the device.

> The entire point of my comment was that cynicism induces FUD that's not necessarily backed by direct evidence.

That is exactly the kind of deflective attitude common in big tech I was referring to: There is concrete evidence for these effects (e.g. [0][1][2][3]). Google, Netflix, Amazon et al. would falter if it weren't for the violation of their user's privacy. Even if we leave dogma out of this, lots of negative effects would simply not be possible without their data collection practices.

You cannot participate in—and profit off of—something bad and then distance yourself by claiming your specific part in it was not inherently evil.

  [0]: https://www.science.org/doi/abs/10.1126/science.ade7138
  [1]: https://arxiv.org/abs/2305.16941
  [2]: https://arxiv.org/abs/1607.01032
  [3]: https://guilfordjournals.com/doi/10.1521/jscp.2018.37.10.751


Re-identifying data is really, really easy. Anonymised data is largely... Not anonymous for long. [0] The leading research has been saying that for decades.

And whilst you say there's so much protection... We have countless examples of where it's been done. [1]

The only real way to be safe with data is... To not have it in the first place. (Which, bonus, often means governments can't compel you to keep it.)

[0] https://digitalcommons.law.uw.edu/wjlta/vol5/iss1/3/

[1] https://www.forbes.com/sites/simonchandler/2019/09/04/resear...


I will not dispute what you claim here. But it doesn't address the main thrust of my comment.

The point I was making wasn't that De-id is a solved problem, or that your data is "safe" with FAANG companies. The point was more about the malice that's attributed to them as a blanket measure, in comments such as these:

> (especially all the FAANG people that curiously always stay silent in threads like this one), has worked very hard to make everyone believe that online privacy is a thing, while working even harder to undermine that at every possible step.

There are many people and execs at these companies who are unscrupulous. But there are also many parts of them that are trying to work on doing things the "somewhat right" way when handling user data.

De-id and anonymization is a hard problem. But there's a lot of concrete evidence for me that many people in the FAANG world are at least trying to make progress on it (sinking billions of dollars of eng and research resources on them), instead of blatantly making bag, which they totally could.


Well, when you get scandals like Facebook trying to get patient data [0], Cambridge Analytica [1], TikTok spying on reporters [2], and so very many more [3], it is rather hard to see incompetence over malice.

I absolutely believe that there are people at those companies, trying to rein in the corporate behemoth so it doesn't squash its own legs. However, evidence looks like they're... Losing that particular battle.

The corporations still haven't learnt to respect individuals - they're just resources. [4]

Until a corporation acknowledges that safety comes with... Simply not spying on everyone... The risk in trusting them isn't going to be one that people want to take. Yes. These are hard problems. So don't make them a problem you have to face.

[0] https://www.cnbc.com/2018/04/05/facebook-building-8-explored...

[1] https://www.cnbc.com/2018/03/21/facebook-cambridge-analytica...

[2] https://firewalltimes.com/tiktok-data-breach-timeline/

[3] https://www.drive.com.au/news/tesla-shared-private-camera-re...

[4] https://www.theverge.com/meta/694685/meta-ai-camera-roll


I am inclined to align with you on Meta, I didn't work there. But again, goes back to my point about treating "FAANG" like a monolith. Meta's handling of these things doesn't say anything to me about Apple's handling of these things, but most people do extrapolate it.


My doctor’s notes aren’t on my computer. Does that mean I should expect them to pop up on the internet?


So basically, everything that happens on an iPhone is not private?


Lol yes we've been warning people about that for years.


Imagine treating the 3rd party doctrine as legitimate instead of a misruling.


because technical people are aware that anything you type over the internet isn't private unless e2e encrypted.


This is silly. The second e in e2e is the one compelled to provide the info. Has nothing to do with e2e, even if it’s encrypted at rest, they’ll likely be forced to decrypt.


You can't decrypt if you don't have the private keys. I mean, these people aren't Zoom, who keep the same private keys on the same server as your data. We can't handhold these tech giants and baby them. They know better. The data should have never been stored in plaintext. And, if it was or is encrypted, they should never have access to the private keys. Why did they do it? I'm assuming because they got greedy, and they wanted those prompts for their own internal training.


What do you think ssl does?


Not that. SSL is just transport-layer encryption, once the information is on the server it's plaintext.

Everything uses SSL or, more accurately, TLS. Very few things are E2EE. Consider Signal - is Signal equivalent to Whatsapp because Whatsapp using TLS? Of course not.

If you have data in plaintext on a server, you should always assume that data lives forever. You might be wrong sometimes, rarely. But usually it does live forever. Most delete buttons don't even actually delete anything.


The email simile someone else used here is pretty good: image the NYT would have gotten access to all emails stored and processed by gmail. That's a pretty invasive court order!


If I want private i run the LLM on my machine. Everything else should be considered public basically


There are different levels of privacy. I can expect data I share with a company for a specific use case to not be public knowledge, yes.


You can't expect that when major data breaches happen all the time.

Even assuming a perfectly benevolent company, that means nothing. Just them having the data is a liability. Which is why Rule Number 1 of data security is: have the least data.


Absolutely not. If they get subpoenaed (as is the case here) they have no choice but to share it.


Isn’t that the point of this thread - people are questioning whether the scope of this subpoena is excessive?


As much as our emails are. So, I don’t know.


It's not like we expect any newspaper in the world to get access to all of our emails, same with these chat logs: we should expect them to be private in this context.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: