Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's extremely difficult to determine good vs bad analytics companies. Every analytics company is tracking data for multiple websites and therefore can track people across the web. How is the user suppose to know what you are doing with their data? Even if you aren't doing anything now, how can they be sure that won't change, especially if the company is sold.


This is sort of true but it depends. We set a first party domain specific cookie. We can't track a user across different domains, or customers. Technically we could correlate based on IP and activity times, but it's not the same as setting a super cookie that is shared between sites.

You are still right, how is there user supposed to know if one tool is reputable and another isn't. Worse than that, one may be fine today but then it gets acquired and someone starts putting the pieces together and uses your historical data in good_tool to link you in bad_tool.

I don't have a good answer. As the other commenter mentioned, regulation may help.

Beyond that, some kind of standardized policy that can be checked and tested would be nice.


    We can't track a user across different domains, or customers.
To be super clear- yes you can. You don't. That's very different. With full JS access on a site you have the ability to collect a lot of information. As another poster mentioned, it only takes 30 bits of entropy to identify all 3 billion internet users.


You're right. My bad. We don't. Not we can't.

Technically, though we can't, since we'd have to dedicate engineering time to making the changes necessary to do that kind of tracking, and we're not going to :P


There are technical approaches to the problem, it's just that they're not ready yet.

However (and without revealing too much about specific IP), there are various mathematical techniques that can be used to discuss the privacy disclosing properties of queries against a set of data and ways in which you can force the data to be both fuzzy and self-destructing.

The reality is that customers (rightly) won't see a difference between analytics companies until one of them actually puts their money where their mouth is on privacy, and removes personal data from being an asset than can be sold or combined later (both by limiting number of queries to prevent reconstruction and with fuzzing applied to prevent over-targeting).

For those looking for some technical details: I work at a marketing company (we're early stage start-up, so I don't want to name names, as we're not really ready for attention), and we're currently in the process of developing a technology which fuses the research in to what questions you can ask of a database (and how many) before the privacy is depleted with non-fully-homomorphic encryption, which destabilizes after a period of time. This approach will let us build containers which homomorphically process data, and decay in a number of questions less than the number which would deplete the privacy of the data in the container. (For various reasons, operating in homomorphic space gives you increasingly fuzzy answers the more questions you ask, and thus you cross a probability bound on your answer being useless near when the privacy is depleted.)

While this isn't perfect, since you still need to trust that we're both a) not storing the base keys and b) did our math/implementation correctly, it's a substantial improvement over the trust asked for by marketing companies now. (Especially since contracts could require us to do those things fairly easily, thus exposing us to breach of contract if we failed to. Once those actions are properly completed, there's no taking them back, so our technology mainly protects the case of "We're the good guys now, but tomorrow is a different day".)


Interesting. I'd love to chat more about this or hear when you're further along and can share more!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: