Github's website is very easy to cache: read heavy, non-realtime. Especially if you consider they can go into a cache-friendly mode where they disable push notifications ("you just pushed to this branch", etc).
They look real time now, but that is best effort: they don't have to be. Nobody will lament Github suddenly saying "we're under heavy load, changes will take a minute to propagate and real time notifications are turned off".
Note that this attack is read-only; there's no creating new issues or PRs or any other write operation (that would be different).
It's comparable to Wikipedia, which has close to 100% cache hits on popular pages. (sorry can't find the source for this right now)
The git repositories are another story, but that's not so easily attacked through JS.
Still, with an attack of this magnitude, no matter how cacheable, you're going to feel it.
PS: I forgot about one thing; their HTTP interface to diffs. That's a huge surface of fresh data to request which will have to go to the backend. Like you could do with Wikipedia history diffs. Perhaps they would have to cut that off for users who do not have a cookie set from a project or user home page... Okay, I spoke too soon. Github has a huge amount of fresh data to request and a targeted attack on things like git diffs (let every user request a different diff) can't just be solved by HTTP caches.
> I forgot about one thing; their HTTP interface to diffs.
Right -- pretty much any page that uses pjax / turbolinks to load segments of the page: each is an expensive query going to the backend.
GH recently added a timeout for the diff page if it's too large which probably also caches the "too large" status. The sweet spot for an attack would be the pages that don't time out but still create a 95th percentile request.
They look real time now, but that is best effort: they don't have to be. Nobody will lament Github suddenly saying "we're under heavy load, changes will take a minute to propagate and real time notifications are turned off".
Note that this attack is read-only; there's no creating new issues or PRs or any other write operation (that would be different).
It's comparable to Wikipedia, which has close to 100% cache hits on popular pages. (sorry can't find the source for this right now)
The git repositories are another story, but that's not so easily attacked through JS.
Still, with an attack of this magnitude, no matter how cacheable, you're going to feel it.
PS: I forgot about one thing; their HTTP interface to diffs. That's a huge surface of fresh data to request which will have to go to the backend. Like you could do with Wikipedia history diffs. Perhaps they would have to cut that off for users who do not have a cookie set from a project or user home page... Okay, I spoke too soon. Github has a huge amount of fresh data to request and a targeted attack on things like git diffs (let every user request a different diff) can't just be solved by HTTP caches.