> Google can choose today which site (original or AMP cache) to show in their se...

DevKoala · on May 27, 2020

Because now every asset you download from the web is a Google tracking resource. Is it really unclear what’s going on? When you perform a GET request for these assets you are being monitored. These requests end up being part of the profile built for you which is used for advertisement targeting and content recommendation.

PS: You work for Google. Do you work on this project?

joshuamorton · on May 27, 2020

> When you perform a GET request for these assets you are being monitored.

You'll only ever retrieve Google AMP cache results from the Google search page, where they were already able to track if you made such a request, since the link you clicked has trackers in it.

So from that perspective, nothing changes.

> PS: You work for Google. Do you work on this project?

No, I work on mostly internal infrastructure. My interest in AMP is simply that I don't dislike the AMP "experience", it's fine. But more importantly, I legitimately don't get the HN hysteria around AMP. Returning to your concern, literally nothing changes with AMP vs non-AMP.

I don't get it. The most compelling concern I've heard is that it's annoying to have to couple parts of your infra to AMP-standard stuff. And I sort of understand that. But even that isn't different than previous SEO/ranking changes that required changes to the page.

zwaps · on May 27, 2020

Assume every website would use AMP. Like on mobile. How often do you get redirected to AMP already when you just click a random link? I sure do!

From then on, every asset it loaded via Google servers. Google now controls the entire internet. Google does this so it can serve its ads and track all users. It's as if I would only use Google for my internet surfing.

I don't use Google because I strongly believe it is an evil company, but if websites use AMP, then I am forced to hand over my data to Google even though I don't want to. Right now, I can block Google servers entirely. But if the entire web is served via AMP, I can't do it. And that's the whole reason AMP exists. So everything I do (or at least as much as possible) goes through Google servers.

You really do not see our concern? Really?

DevKoala · on May 27, 2020

> You'll only ever retrieve Google AMP cache results from the Google search page, where they were already able to track if you made such a request, since the link you clicked has trackers in it. So from that perspective, nothing changes.

I am not affected, I don’t use Google search. The problem is for individuals who use Google search and now don’t have an option to avoid in deep tracking. The difference between regular pixel trackers and multiple data points associated with every resource a site serves is immense. I work in ad-tech, not particularly in the identification side, but I started multiple projects in that end. From experience, a regular tracker can be fooled, but you cannot fool every resource request. One of the things I did to identify ad fraud bots was actually drive them to a site in which I controlled every resource. The resource request fingerprint for bots was easily distinguishable from real people. Moreover, some humans exhibited navigation patterns that were distinguishable from other humans. I remember I caught a QA person doing a shoddy job of testing the front end once due to it. That is the kind of power that Google is acquiring as more and more sites choose to use AMPs. It is scary to think that a single identity has that power.

joshuamorton · on May 27, 2020

Nothing beyond the initial page load is served by the AMP cache. If you request additional dynamic resources or navigate to other pages, you'll go to the originating site.

I'm confused.

DevKoala · on May 27, 2020

In the example I saw, you could go through a mini site experience; you can “visit” the site without leaving the AMP. I don’t believe this one change, serving images from the AMP cache, is the issue. My concern is with the proliferation of AMPs. A lot of individuals will browse AMPs thinking their browsing is between them and the site publisher without realizing Google is in the middle. Honestly, Google has an okay track record of respecting people’s data. In their AdExchange, they are one of the few that obfuscates the IP address. However it is still concerning that a single entity continues amassing all those browsing patterns from billions of individuals. It can be abused easily, with intent or not.

gregable · on May 27, 2020

@DevKoala, do you have an example where you encountered the "mini-site" experience? I haven't seen it, but it could be a bug that would be worth fixing.

monadic2 · on May 27, 2020

AMP also offers traffic after the load, so this is in no way comparable to a click tracker (which is also none of their business, btw). This will also offer in-band ad obfuscation eventually.

I’m curious, what do you value about the services you consume? I like transactions where I know what I am giving to the service provider. Do you really want to push away from this reality for the benefit of a few MS load time leaving a search page? That’s essentially what you’re arguing for.

0x0 · on May 27, 2020

> You'll only ever retrieve Google AMP cache results from the Google search page

Uh, I keep seeing Google AMP URLs shared on social media, emails, etc etc. Which is quite annoying :(

gregable · on May 27, 2020

The AMP team doesn't prefer these URLs shared either:

If you click the browser share icon, or trigger the browser native share intent, the origin URL will be shared, not the AMP Cache URL. Only if you explicitly copy the URL bar will the AMP Cache URL be shared.

The Signed Exchange spec that AMP has offered sites for a year now allows them to have their own URLs displayed in browsers that support it. In that case, the google.com URL will never be displayed and thus can't be accidentally shared.

All AMP documents on the AMP Cache contain `<link rel=canonical href={origin url}>` and Google recommends that social media prefers the canonical URL. This is useful outside of AMP as there are often multiple URL variants for any article. The sharer and sharee may not ideally get the same version. As an example, a mobile vs. desktop article.

0x0 · on May 28, 2020

That's really quite useless. I rarely share links by clicking weird "share link" buttons. I usually have half a message already composed in mail/messages/slack, and I just want to cmd-L cmd-C in the browser and cmd-V in the message I'm writing.

Also, "the google.com URL will never be displayed" is a world with an internet I don't want to be a part of.

gregable · on May 28, 2020

Google never displays AMP documents on desktop (sans mobile emulation), this won't be an issue.

0x0 · on May 28, 2020

The workflow I described goes for mobile and tablets just as much as desktop; for the cases where a keyboard is not connected please mentally replace "cmd-l cmd-c" with "tap in address bar to select, tap copy".

Also, others might share an amp link from their mobile devices, which I then end up clicking in a desktop slack/mail/messages app, and there we go again with the amp virus even on desktops.

taveras · on May 27, 2020

Exactly this. As a feature, social media platforms should automatically look up the canonical URL for a linked page.

labster · on May 28, 2020

I’m sure IRC v3 will get right on that. /s

_tw9j · on May 28, 2020

If someone wants to detect whether a website is using amp, what would be a good way to do it?

gregable · on May 28, 2020

It's more of a question of if a specific document is using AMP, the site can be a mix. Just like a site using jquery as an example.

An AMP page can be identified by examining only the first few bytes of the HTML. The `<html>` tag will contain either the `amp` or lighting-bolt emoji attribute, ie: `<html amp>`.

Technically an AMP document must pass AMP Validation to be truly AMP, so there are documents that match the above condition which aren't valid AMP. There are multiple ways to validate. A starting place is https://validator.amp.dev/

robin_reala · on May 29, 2020

If you turn off Javascript the site won’t render for 8 seconds if it’s an AMP.

yyyk · on May 27, 2020

>You'll only ever retrieve Google AMP cache results from the Google search page

I see quite a few AMP cache results being shared on twitter (for example).

>where they were already able to track if you made such a request..

It's still possible to get raw Google results (with the right extensions/browsers), though I wonder for how long.

kohtatsu · on May 27, 2020

Would you bat an eye at Google acquiring CloudFlare?

What they're pushing with Amp and the related technologies grants them a near-unavoidable man-in-the-middle position.

(FWIW I am careful to avoid Google properties at a pretty high cost.)

joshuamorton · on May 27, 2020

> Would you bat an eye at Google acquiring CloudFlare?

Not anymore than I already do bat an eye at cloudflare.

(It's probably worth noting here specifically that I do work at Google, so my risk profile is probably different than yours, for me personally and speaking solely from a trust perspective, I'd probably prefer it if Google acquired CloudFlare since I would get a net increase in transparency, but I can understand why that isn't a general position, and there are other reasons I don't think Google acquiring cloudflare would be good).

kohtatsu · on May 30, 2020

Thank you for taking the time to engage. Google is a scary beast at the end of the day, and I firmly believe it's an organism that should not remotely resemble what it is right now. Splitting it has potential to go a long way thinking about it.

I share these fears to a lesser degree with Microsoft and of course Facebook. Apple seems to do a great job of safeguarding, but they could become sour if they don't remain careful. Stuff like Clearview crosses the line into directly-dangerous. CloudFlare is currently innocent in my eyes, but they've managed to centralize a lot more channels than I'd like to think about.

thoraway1010 · on May 27, 2020

DevKoala,

I don't think you may realize how much of your online activity is already tracked by google / facebook / instagram.

Google's javascript is everywhere, including explicit tracking with analytics, and lots of CDN loads for endless lists of things (js libraries, fonts etc).

Their properties also track you, google search, youtube, email. They also make software you might use (chome / android / google maps / google play store).

If you think something about signed exchanges let's google track you, and they can't now... please examine these assumptions.

Folks who come up with these super complex schemes (google will use javascript loaded into AMP to take over and track you) ignore that google ALREADY tracks them.

And folks who say they don't use any google products (no android / google maps/ play services / chrome etc etc) are often either lying or don't understand how many third parties load google analytics into websites, or load recaptcha bot protection etc.

DevKoala · on May 27, 2020

I work in ad tech, with years of experience doing identification and targeting. I understand exactly what is happening.

I wrote a reply to joshuamorton were I expressed my concerns.

thoraway1010 · on May 28, 2020

Just realize that AWS / GCP / Azure have already gobbled up vast swaths of website hosting in all forms and are growing along with some free CDN and DNS providers.

If google said, we want to track people, and brings android, chrome, dns resolvers, network infrastructure, google cloud compute, AI systems, google analytics which these media sites voluntarily, google play services etc to target and track you - they probably could.

EVERY single person (including you) who claim they don't use google, if you dig down, they often are lying and do. And if you don't, some of the people you email or interact with do, so indirect profiles can be built.

AMP solved a need for a lot of users, which is the janky, slow ad filled websites that media sites in particular had become. So there is an actual end user reason people like AMP - it's a better user experience in many cases. This is where AMP is ruining the web gets hard to support. For most folks they don't perceive they are giving up a lot more in terms of privacy, and they are getting a lot.

DevKoala · on June 11, 2020

> If google said, we want to track people, and brings android, chrome, dns resolvers, network infrastructure, google cloud compute, AI systems, google analytics which these media sites voluntarily, google play services etc to target and track you - they probably could.

Their data governance team wouldn’t allow it. You are basically describing a system they could only introduce with the permission of the government. I don’t care if the government is tracking me honestly, I can’t fight that. I just don’t want Google tracking me for the purpose of influencing my spending habits, emotional state, or perception of the world. That is my main beef with their advertising capabilities.

wizzwizz4 · on May 27, 2020

Centralised collection of metadata, without the users having any way of knowing it's going on.

tyingq · on May 27, 2020

Combined with that second thing I mentioned (required Google hosted JS), it is total control by Google with no straightforward way for me to detect it, block it or go around it, as I can today.

joshuamorton · on May 27, 2020

So if I understand correctly, your threat model is "Google will inject unwanted JS into a JS blob they host (like the amp.js from Google's CDN) and this will do nefarious (for some definition of nefarious) things to me without me knowing."

How is this different than today, where many sites use js from google, either as a cdn or part of the ads infrastructure? I guess you can block some of those, but blocking the jquery provided by google's CDN isn't going to work too well.

(And further, what kind of nefarious thing do you fear Google will do? How likely is it that they will do so, in your opinion?)

yyyk · on May 27, 2020

Today, many (most) sites do not use js from Google or hosted by Google. Google is pushing them to use Google infrastructure by way of AMP and that's the wrong direction.

kbenson · on May 27, 2020

> How is this different than today, where many sites use js from google, either as a cdn or part of the ads infrastructure?

It's different because it's a requirement. nytimes.com is moving to phase out all third-party advertising data, so presumably they could design their page such that it only accesses their resources.

With a signed exchange, that would allow them to nicely compartmentalize and contain privacy to their site, if they aren't required to load and run some Google supplied JavaScript. The argument that Google already knows that someone visited the page so it's no big deal is not compelling, since there is a big different in knowing someone clicked to visit a page, and having carte blanche over loading your own code on the page in question.

Can you include the AMP rquiers JS inline such that it implement an AMP spec version, or do you need to load it externally? If you can supply it inline, that's great, and what people would want (as long as it doesn't load additional third party resources). If you can't then you're providing Google with an extra level of control that's not really needed, and that's what people are against.

> (And further, what kind of nefarious thing do you fear Google will do? How likely is it that they will do so, in your opinion?)

If we go forth only considering what we think people will do, and not limiting what they can do, we're destined to be upset with the outcome. If not from Google itself, then in twenty years when someone buys Google, or Google sells off a division that houses information, or there's a breach and it's exposed, or some other company rides on Google's coattails and uses the same precedence to get data but is less trustworthy.

The point is that some people don't want to share this information, and would choose not to do so if there was an easy way to tell when it was being gathered. Fighting against new methods that seek to make it implicit instead of explicit is the only real way to do that.

1: https://www.axios.com/new-york-times-advertising-792b3cd6-4b...

joshuamorton · on May 27, 2020

> With a signed exchange, that would allow them to nicely compartmentalize and contain privacy to their site, if they aren't required to load and run some Google supplied JavaScript.

Then I'd direct you to Gregable's comment (who is a person who actually works on AMP) that

> the AMP project is actively working to move the origin (control/host) of the AMP Javascript to the publisher's own domain, as well as allow a version served on an origin owned by the OpenJS Foundation, rather than Google.

So while this isn't supported yet, the people working on it do ant that.

> If we go forth only considering what we think people will do, and not limiting what they can do, we're destined to be upset with the outcome. If not from Google itself, then in twenty years when someone buys Google, or Google sells off a division that houses information, or there's a breach and it's exposed, or some other company rides on Google's coattails and uses the same precedence to get data but is less trustworthy.

I'm unconvinced by such slippery slope arguments, given that the pushback were Google to do something like inject nefarious js would be swift. They've had the ability to do so for, well, 20 years now. They haven't yet.

kbenson · on May 27, 2020

> So while this isn't supported yet, the people working on it do ant that.

Good! For what it's worth, I'm slightly pro AMP based on the idea, I'm just not entirely happy with the current implementation. Fixing it to be less dependent on a Google resource is a good change, IMO.

I use copious Google services, such as Gmail and Drive, and Hangouts (or whatever it's called this week), and Android, but I'm leery of becoming more dependent on Google. It's to everyone's benefit if there's healthy competition between all parties, and to my personal benefit if I don't find that someone's gotten access to my google account and literally everything is open to them (which is why I always use a username/password combination for sites I create accounts for instead of linking my Google account... even if I know my email is @gmail.com so it's of limited use, for now. Baby steps).

> I'm unconvinced by such slippery slope arguments, given that the pushback were Google to do something like inject nefarious js would be swift. They've had the ability to do so for, well, 20 years now. They haven't yet.

First, it doesn't have to be nefarious. The bar for Google deciding they deserve analytics for content they "serve" is much lower than the bar for actually doing something illegal. I prefer not to place options to do what I consider the wrong thing for business gain in front of companies when it can be helped. Hope for the best, plan for the worst, and all that.

Second, that was a single one of the scenarios I listed. The others notably did nt rely on Google doing or not doing the right thing, because the decision is no longer in their hands. If Google is no longer the authority deciding (because they are gone, or have a new parent, or the data was taken), what Google would choose to do is irrelevant. That's why it's important to some people to reduce the information being collected. It's impossible to know what it will eventually be used for in the long term, so the prudent thing is to limit it, and/or compartmentalize it (that is, maybe I'm happy with nytimes.com knowing where else I clicked in their article, but I would prefer Google only know I loaded that first article).

biddlesby · on May 27, 2020

I'm not convinced by your argument that since nothing evil has happened in the last 20 years, we are safe in the future.

It's about power dynamics. If you get a consolidation of power, that's going to be open to abuse. Maybe not now, maybe in the future, who knows. Democratic systems have checks and balances in the public domain. Google doesn't have this.

akoncius · on May 27, 2020

> They've had the ability to do so for, well, 20 years now. They haven't yet.

as we don’t have proof that google did not do nefarious things, we don’t have proof that they haven’t. with such monopoly and power distrust is useful thing.

joshuamorton · on May 27, 2020

> as we don’t have proof that google did not do nefarious things

We do have proof that they don't do the specific nefarious things being discussed here: injecting nefarious js into otherwise useful things. That's easy to determine.

tyingq · on May 27, 2020

>>what kind of nefarious thing do you fear Google will do

Well, the headline is one good example. That google controlled JS is EXACTLY how they removed access to the original URL...on somebody else's page that isn't theirs. "Signed exchanges" doesn't fix that either. It's also how they hijack the back button and swipe events for carousel navigated pages.

joshuamorton · on May 27, 2020

> That google controlled JS is EXACTLY how they removed access to the original URL...on somebody else's page that isn't theirs.

No, the Google AMP cache adds the header bar. That isn't added by the Google controlled AMP js. Let me repeat this: The AMP js didn't change. Google's AMP cache implementation changed. (if you disagree with this, please post the diff of the AMP js that removed the url bar, the js is opensource at [0])

> "Signed exchanges" doesn't fix that either.

Yes it does, in two ways:

1. It would prevent Google from mucking with the embedded page at all, like they do now.

2. It would remove the need for me to have the url redirect, since the url bar would point to the original site.

[0]: https://github.com/ampproject/amphtml/commits/master

tyingq · on May 27, 2020

Apologies, you're right in that they aren't using it that way. They could, and the examples of the top bar, swipes, and back button hijacking seem to indicate they aren't averse to it. Those things caused me to lose trust in AMP. That they chose to do it via a "proxy" instead of their hosted javascript doesn't make me trust them more.

kovac · on May 28, 2020

You asked how and he answered. Whether it's an issue or not is a separate matter that none of us are going to agree on.