I'm reviving my blog, and currently plan to explicitly ask:
1. May we retrieve common libraries from third party CDNs? Doing so helps support this site by saving on our bandwidth costs, but may expose information about you to those third parties.
2. This site allows commenting through Disqus. We have no control over what Disqus does with your data, and so your information may be exposed to Disqus and any third parties they communicate with. Would you like to enable comments?
3. (Similar for tracking, if I decide to do something other than log parsing.)
Default 'no' to all, and I still need to find a way to ask the questions in a way that doesn't disrupt simply viewing a blog post that someone linked. Perhaps if someone returns, I'll prompt then.
Just code your site to do something sane with 3rd-party content blocked. E.g. handle load errors with fallbacks.
That way people with µMatrix or similar blockers can use the control tools they have instead of needing to do something site-specific.
Also, such decisions can't be remembered if cookies/localstorage are disabled. So prompting over and over again could also be annoying.
> 1. May we retrieve common libraries from third party CDNs? Doing so helps support this site by saving on our bandwidth costs, but may expose information about you to those third parties.
In an ideal world browsers would never send a cache-refresh request for resources tagged with SRI[0] because the hashes would guarantee that the content is 100% stable. Alas, it has non-trivial privacy implications, so they don't do that.
Maybe it could be implemented as a privacy addon with a whitelist for CDN domains, but then sites would still have to adopt SRI for that addon to do its work. Or maybe an addon that injects cache-control: immutable[1] to CDN could work too, but that's limited to https.
1. All local. Unless you don't want and a 100-200kb JS file is too much of a strain on your server bandwidth. Or are you serving 15Mb of JS files?
2. Screw Disqus. Screw Facebook Comments. Start thinking about your visitors, as someone said on another related thread, you are responsible for the tracking of your visitors by 3rd-party sites. Local comments or turn them off if you don't care about what others are saying. Don't save any information about the commenters except what they enter in the boxes. One-way hash the IPs if you need to compare for spam reasons.
3. If you need your ego stroked when you see you had xx visitors on your site, go ahead, use Google Analytics and screw us all. We're gonna block it anyway.
[1] This is a privacy policy I use and respect very much when interacting with the visitors/commenters on my personal blog.
> 3. If you need your ego stroked when you see you had xx visitors on your site
That's a bit harsh - wanting to know whether you get 0 visitors to your blog post or 2,000 yesterday isn't just about ego; it helps you understand the value of your posts (and whether you should bother). Knowing how many people visited isn't the same as bragging about it.
Maybe it was harsh, the idea is: You might write better when you don't know how many people read your articles. On the other hand, you might write better when you know. Plan accordingly :)
2. I need to look into self-hosted comments; but I was hoping to make the blog portion of the site static to keep it simpler. Project pages may have demos/etc that pull in JS libraries. But you raise a good point in 1 that also applies to 2 - given that I'm probably going to be reaching only a handful of people initially (and perhaps longer), worrying about bandwidth is a premature optimization.
3. I've just about talking myself into going with log-based analytics here. I find ga's omnipresence too worrisome to contribute to it, even with consent.
Thanks for that link, that's essentially the policy in my head before I started thinking about things like comment support. It's way better written than I would've come up with.
Can't say I would disagree. A lot of folks these days just seem to append an HN/Reddit link to posts that get discussions on those sites. There's the blog as an expression of author's personality, and then there's the discussion space as an area with a life of its own.
1. If you're only interested in saving bandwidth and don't care about cache hits from overlapping with other sites, maybe you can host static content somewhere free (GitHub Pages?) or even just set a long cache header (ensure version numbers in filenames, cache for > 1 month) since presumably you're going to serve them the first time before the user has answered anyway?
2. I'm thinking of putting a "Click to load comments" box in place of Disqus on my blog so nothing gets loaded unless the user clicks. Seems better than bothering the user up-front.
3. I use Google Analytics - I figure it's common enough that if people don't like that, they'll already have it blocked, so there isn't really any additional tracking they won't want (unless the twitter timeline widget is tracking; which it might be, but I suspect I'll remove it soon anyway).
1. That's a possibility - though any time you're sending off to a third party for content, there's no way of knowing what they're doing around cookies and browser fingerprints across their properties. A step up from running scripts loaded from those sites though. And yeah, the default will be serving the content locally until explicit consent is received.
2. I like that idea. I also kind of like the idea of just not using comments - when I used disqus years ago it was mostly spam - but I think I want to try again and see if it's worth it.
3. Also a good point, but that only accounts for those people who are aware of the tracking as a point of concern. Given that the blog will be technical with a personal bent and vice-versa, one or two of my ten readers may not be aware of tracking as a thing :)
On this front, though, I'm probably just going to start with log analytic tools. It's really the only way to get a fully accurate picture across visitors (server side logging can't be blocked, but GA and even self-hosted data gathering can), and I don't really care too much about the additional info that analytics can provide.
2. I use Disqus and don't actually get much spam (maybe 1 spam post every 6 months, and it always gets flagged by Disqus) but there is often useful stuff in the comments. I think my blog would be much worse without the comments (and I wouldn't get the occasional "Thanks!" comments, which help me know that my posts aren't useless) :-)
Given the option, I would probably also just parse logs - I don't think Analytics is adding much on top of that; I just don't have that option using GH Pages. The reason I moved from AppEngine to GitHub was to stop messing with the code for my blog in an attempt to make me write more posts instead! =D
1. Serving from github still shares the tracking information. It can be argued that github is better than cloudflare/facebook, however bear in mind github has politically motivated staff. Long cache is a great idea. Alternatively cut out unnecessary js.
2. Nice idea, it does hamper the ease of use of your blog though - I would never click to view, though I did read some that were visible when I finished the article.
3. Do you find the information from this useful? In a way that isn't trivially parsable from server logs? I ask because we are reviewing the quality of our user analytics, and our ga seems rather pointless atm.
1. Good point; I'm not really sure where I was going with this now; GitHub and another CDN are basically the same. I must've been distracted while replying!
2. Yeah, it's not ideal. In this case, it looks like Disqus are gonna fix stuff though (they've commented on my post; there's a link right at the top of the article now).
3. I don't have access to the server logs as I'm running on GitHub Pages, so something like Analytics is all I have. I do find it useful (given no server logs), it's nice to see the traffic to my blog; there's no point posting if nobody is reading! :-)
3. That is very interesting, now knowing your stack (pages + disqus + adverts) I see one side of the 'problem' is that bloggers don't have much choice in terms of revenue, so the infrastructure charges with user data . The other side is likely the complexity, incompatibility, and time wasting of home rolled solutions.
The really nice part of a CDN deployed blog is handling the traffic spikes though.
They receive a large amount of internet traffic and have the potential ability to fingerprint users and subvert privacy protections. AFAIK they don't do anything malicious, but I don't know they don't.
In fact I would say CloudFlare are better than both GitHub and Facebook, and I am only wary of them because of their position of power and the potential they have (ie. they are a victim of their own success). Both Facebook and GitHub have shown themselves to make political decisions at the expense of their users.
Depends on the definition of wrong! CloudFlare is a bit of an HN darling thanks to their employees' active contributions and submitting every technical post on their blog. Free distributed DNS and potential DDoS protection is also a tempting offer.
To privacy-conscious users: CloudFlare is the man-in-the-middle for more and more of the Internet, potentially tracking at Google-like levels.
CloudFlare may: ... Add script to your pages to, for example, add services, Apps, or perform additional performance tracking. (Unfortunately this is opt-out rather than opt-in.)
To Tor users: CloudFlare implements a captcha to protect servers from malicious traffic; the implementation has caused tremendous annoyance in the past and the company may have been slow to address this problem.
To CloudFlare customers: CloudFlare has a "target on its back" and has faltered against DDoS in the past, causing outages for all of its customers. AFAIK: It's been a while.
To CloudFlare freeloaders like me: CloudFlare doesn't have much incenctive to protect its free-tier users from DDoS.
Ah, thank you for the detailed reply. I started using CF more extensively yesterday, due to their free CDN (which is working great), but I agree that their MITMing the internet is worrisome. Maybe I should switch to MaxMind, if it's cheaper than CloudFront.
Like Ghostery, it is important to be aware of the cons but I'm still using CloudFlare.
In my book CloudFront easily ranks ahead of had-been "do no evil" Google's irrevocably merging it's entire history on me ex post facto. https://news.ycombinator.com/item?id=12760003
2. This sounds off to me. Imagine if a restaurant's menu said they don't know where their ingredients come from or what they may actually consist of - that's probably true a lot of the time, but it makes the customer wonder why the restaurant brings it up but doesn't do anything about it...
1. May we retrieve common libraries from third party CDNs? Doing so helps support this site by saving on our bandwidth costs, but may expose information about you to those third parties.
2. This site allows commenting through Disqus. We have no control over what Disqus does with your data, and so your information may be exposed to Disqus and any third parties they communicate with. Would you like to enable comments?
3. (Similar for tracking, if I decide to do something other than log parsing.)
Default 'no' to all, and I still need to find a way to ask the questions in a way that doesn't disrupt simply viewing a blog post that someone linked. Perhaps if someone returns, I'll prompt then.
Anyone have thoughts on if this sounds sane?