I've noticed this before as everytime I copy-paste a search URL, I manually delete the parameters as even the non-readable ones probably contain some metadata about me.
Yeah, it'd be nice if Google search pages came with a handy "Link to this" button as Google Maps and Google Books do...but in reality, that would raise even more issues. If I understand Google's search mechanism...the results are tailored to each user...my search for "burger place" is going to be much different than someone doing it in New York. Yet if I see a "Link to this" button -- especially if it's a link shortener -- then I would expect it to direct all users to results that I've seen. OK, I don't expect that...but I bet 99% of other users do.
So it's not a clearcut decision about whether such a feature should exist, given the ephemeralness and personalization of SERP. So given that it's not an intended common use case to pass around URLs to results...what is it that Google should do to fix this? Re-engineer their system so that every URL is a non-human-readable URL that obfuscates all possible metadata? Remove the ability to tailor search results based on past results of that same session? I can see why they might choose: "If a user wants to send around a URL by manually copying and pasting it, we should trust them to read it before they send it out as the relevant parameters are human readable"
Fact-checking is good, but it takes time and it's not always worth the trouble. I'd rather see someone use an honest qualifier than either claim certainty they don't have, or not post at all.
1) I'm not a web dev, I'm a backend and networking guy.
2) Given that -AIUI- any non-host, non-scheme, or non-path part of the URL can be programatically altered in a newer browser without triggering a page reload, I don't see how one can call the behavior a "basic fact about how web browsing works". I would call that sort of thing an "implementation detail". :)
> [What should] Google ... do to fix this? Re-engineer their system so that every URL is a non-human-readable URL that obfuscates all possible metadata?
Remove the non-search-string metadata from the URL?
No, it's not a "no-brainer." Copying and pasting the URL is, AFAIK, not a common use-case for the majority of browser users. I'm reminded that right now as I type this in Safari, which some time ago made the decision to hide virtually all of the current URL except for the domain name. If you think back a few years farther back, then you remember when URLs and search boxes had different inputs, rather than the omnibox that is now standard feature across the major browsers...which means that the conscious recognition of the URL is even less likely among most users.
Furthermore, the design trend is such that in most consumer-friendly websites, users have been trained/encouraged to share via buttons -- this is something that Google itself explicitly encourages in Google Maps -- again, deemphasizing the fact that URLs are (well, generally) available in the omni-box.
So I think it's a reasonable assumption that users who copy-paste from the omnibox are relatively rare. And those that do do it have something that button-pushers don't: the ability to examine what they've just copied-pasted.
Again, I don't think this is a case of "Oh, but everyone should know better than to blindly do such-and-such", a sentiment that many of us agree is wrongly applied to the hitting "OK" on agreeing to indecipherable TOS screens. But I can see why it's not an obvious design decision given the feature set that Google wants in its search functionality. The more I think about it, the more I think it should be more incumbent upon the user to notice what they of their own volition have decided to send out as text.
How is that different than the expectation required of users who send out screenshots? When I want to screenshot a webpage on my desktop -- even though I intend just to show the webpage, I know not to use the fullscreen-capture option because it will capture lots of metadata, even if the browser page itself is in full screen. Same deal on an iphone, in which screen-capture will leak metadata about you as a user (when the photo was taken and thus, possibly your time zone. And also your carrier) unless you intentionally crop it out before sending. And cropping out photos is very unintuitive on iphones (i.e. it is not easy to do without jumping into a third party application)
Weird...when I was at Google, very significant engineering effort was expended (we're talking man-years) to prevent exactly this case, and it was a major design constraint on the features we could launch. Something must've changed in the thinking of the higher-ups.
I also wonder why they don't just use pushState, which was proposed as a solution way back in 2010 but didn't have the browser support necessary. Now it's got the browser support:
There was a thread a few weeks back about something Google-related and an ex-Googler was saying how many hours they spent making the home page and search results pages as light as possible, but that recently they stopped caring about that.
I'm not actually sure whether they don't care about it, BTW - the SRP is about 1/3 as heavy as it was when I left, so it looks like someone's been cleaning it up. These things tend to move in cycles - I was hired at the very end of a "latency & performance" cycle, then spent most of my career there during a "moar featurez!" cycle, and it wouldn't surprise me if the focus is again latency and performance.
Doesn't that bring in to question all the research that says the faster your page loads the more money you make?
If Google can swing between adding a load to the page load and then whittling it down making it faster, the money difference involved must be fairly trivial.
Not exactly. It's more that making the page faster is fairly predictably going to get you $X in additional revenue. Adding a new feature is going to get you anywhere from $0 to $Y in revenue, but you can't know what $Y is until you've launched it and given users some time to learn about it. So the only way to avoid getting stuck in a local maxima - your current feature set, as optimized as possible - is to periodically try to shake up the page, add some new features, and measure their effects. After a couple years or so, the features for which $Y < their cost in latency & developer maintenance are killed, and a new round of optimization & code cleanup starts based on the current feature set.
Similar behavior when you search for something in Amazon, then click on something. The original search is in the URL of the page that you're on. Be careful sharing that last link.
Not Safe for Work:
Here I searched for "blow up doll" (yup, they sell them), then clicked on a cute alien inflatable doll. The original search is at the end of the martian doll URL, that I so thoughtfully shared with my kid, or grandkid, or spouse, or employer's decorating committe ...
In another time and place, and if Google wasn't involved, I doubt this would even hit the front page of HN. People have been embedding tons of state into URLs for years without obfuscating/encrypting them. Web apps should be idempotent with respect to URLs, and bookmarking any screen you're looking at should allow you, the user, to return to that state.
Maybe there should be a fix for this, at least to encrypt the old query, but realistically, this behavior is widespread in web apps, and it's only newsworthy because Google.
Unnecessary state should not be present in the URL. Yeah, people do it all the time, but it's sloppy coding and bad for many reasons, including those described in the article.
How does this contradict not keeping state unnecessary to return to a place in the app? (as in this case, past state). I'd think most users expect that a URL leads them to the same view again, but not that it preserves their history to get there.
While I understand why people are irked (to say the least) by this, I can understand the technical reasoning behind it.
The hash part of the URL, (#andmorestuff) is not sent to the server by the browser, however the query string is. (?var=stuff&another=otherstuff)
Using this to send the search to the server is smart, since the server can respond with data without an additional round-trip AJAX request. However, everything in the URL, other than the hash is read-only by client-side scripts. So, if you want the URL to reflect the search, you can either a) reload the page using the query string, or b) follow the single page app methodology, and manipulate the query string with the data, and change it using an AJAX request.
Google's interactivity, with the smoother feel of not seeing an entire page refresh (plus potentially less data being sent over the wire after the initial load) could be simply stored in an internal state. However, should the user refresh, the search would be lost. If you try going to a Google search with a search query hash, you'll notice a brief delay before the search results are displayed.
The page loads, but there's an AJAX request to the /search route with the hash-specified query, with query string parameters to specify how the data should be served. Still, the client-side script cannot strip the query string of the previous search without a full page reload.
While the info leakage is annoying, it's unavoidable in this strategy.
Note: The browser history, and subsequently the URL can be manipulated through the History API's pushState, which could be a direction Google could go in, but IMHO, the hash approach is viable as well.
Sorry, I'm not a web dev. Couldn't the previous search string be stored in a cookie instead of the query string? That way it's still sent with the first request but it isn't visible in the URL.
Yea. Quite viable as a strategy for now thanks to your insight on this. But asides the tech done, do you think this method is plausible enough when you think "privacy"?
Because that would make the search results load slower, since it would first have to load google.com and then do a second request for the actual search.
Mozilla Addons to the rescue once again. I am always surprised at how under appreciated plugins like these are. If I ever stumbled into money, I would pay it forward to all the developers who feverishly code these plugins for the betterment of humanity and the web.
I don't recall needing to copy and paste a search URL to someone in a long time, but maybe other people use it in their workflow. (maybe in jest I'll use http://lmgtfy.com)
Interestingly enough in Safari on 10.11, the URL I see in the address bar when searching for "$x" is just "$x" but when I copy and paste "$x" into a text box I see my previous search in the url. I've noticed safari shortening to just the base recently but this feels a bit off.
Not quite search-related but in my most common usage, I run across similar practices with copy/pasting URLs in general.
Some friends and I have a running multi-person Hangout that we use as a sort of simplified chat room. It's obviously got none of the control of something like IRC but it's easier and more convenient for most of them. About half of us already have Hangouts on our phones by default (Android) and the iOS folks can install it and alternately, many of us keep Gmail open in a browser tab at work anyway.
Getting to the point, we use the chat for just general BS throughout the day and we post a lot of links. I have a habit of seeing what I can strip off an article or other link before it doesn't resolve. But some chat participants are primarily phone users and don't often sit down with a laptop or desktop. Almost all of the links they post are these elongated strings that take up 4+ lines in the chat box and contain all sorts of Facebook references and other bits and pieces of the "path" they took to get to the actual article.
I've mentioned it before but usually just get groans about how I'm paranoid/anal retentive. It's probably no big deal but on principle I don't like clicking on anything that essentially lets Facebook or some other network track the spread of links even when I'm not on Facebook. If I'm actively on that site or something similar, sure, they can go right ahead and track what I do but when I'm just sitting at work and want to click some news story a friend forwarded, I don't need all the referral links and tracking.
The problem is (as has been mentioned above) a lot of people just don't think about what's in a URL and sorta just look at it like "that mess of gibberish text that points to the article or funny picture". The problem has only gotten worse as more people are primarily phone/mobile app users.
I don't think google search URL's were ever designed to be shared nor linked directly so I don't really see an issue here.
The original query is stored in case you want to track back on the auto-suggest would you rather they to store those in cookies or non-volatile browser storage?
And while it's true that if you link some one an entire Google URL search URL you might leak the original query, but given all the other dark magic that is in the URL which can include your client type, version, location and other things (e.g. inter-service referrals if you go there from another google service like Google Maps) i think the original query is the least of your worries.
> I don't think google search URL's were ever designed to be shared nor linked directly...
This is incredibly silly. URLs are how you get to a page. It's how The Web works!
> The original query is stored in case you want to track back...
That's what the History API is for. When you add an anchor-tag query, store the original state in the tab's history.
> i think the original query is the least of your worries.
Infoleaks are BAD. The least Google could do is use their URL rewriting powers to remove the original query when they add a query stored in the anchor-tag.
> This is incredibly silly. URLs are how you get to a page. It's how The Web works!
That's a very general and simplistic view of things, and we both know it's not true.
While URL's were designed originally to only provide an address for a specific resource we all know they are also used for storage, user state management and many other things, again not ideal but in some cases unavoidable.
Not to mention that URL's are not implicitly intended to be shared outside of the scope of your application you wouldn't link the 3 mile long URL that you get every time you login into your bank account would you?
> That's what the History API is for. When you add an anchor-tag query, store the original state in the tab's history.
Technically (best kind of not true) not true for the Google History API and even if it would be applicable for this it doesn't work in cases where you for example weren't signed in or explicitly blocking google tracking API's and services.
Not to mention that making a API call every time you need to do something as basic as backtracking on your search is just pure insanity when it comes to resourcing.
And if you are talking about the Chrome Page History API then why in god's name would i want a site to access it, not to mention it's not applicable in this case either.
> Infoleaks are BAD. The least Google could do is use their URL rewriting powers to remove the original query when they add a query stored in the anchor-tag.
Life is bad, if this was in a link which was automatically shared (although all those pin URL generators link 10 fold more data than Google) I would agree with you, otherwise not really this isn't a use case for the application.
I know no such thing. When I want to share a search result or anything else -that doesn't require authentication- with a friend, I copy the URL in the address bar and paste it to them. This is how The Web works.
> ...if this was in a link which was automatically shared... I would agree with you [that this is a bad thing.]
So, it would be acceptable to you if Google put your Google Account login and password in URI-encoded plain text in the URL of your search results? Why or why not?
> Not to mention that making a API call every time you need to do something as basic as backtracking on your search...
1) Google already does exactly what I described when you make a search on *.google.com with a Javascript-enabled browser. It's what lets your back button work with their JS-based page updates. ;-)
2) A person driving his User Agent doesn't use the History API to go to a previous search result. He uses his back button or equivalent key combo.
Putting your password in plaintext is unacceptable because exposes it to anyone who may be sniffing your traffic. Your search history is already visible to anyone sniffing your traffic, so it's presence in the URI is moot.
I guess my point is that there's a continuum here. There's already other ways for this data to leak (it's already plaintext on the wire, and it's already there in your browser history) and this particular attack vector is easy to mitigate if you put just a little effort into it.
> Putting your password in plaintext is unacceptable because...
I know that. :)
dogma1138 said:
> ...given all the other dark magic that is in the URL which can include your client type, version, location and other things (e.g. inter-service referrals if you go there from another google service like Google Maps) i think the original query is the least of your worries.
There are people for whom and situations in which having a search query exposed is utterly disastrous. I asked the question I asked in order to determine if dogma1138 held the opinion that
* "URL's are not implicitly intended to be shared outside of the scope of your application" [0], and therefore any sensitive information in them is 100% okay, because -in his world- noone ever shares hard-to-read URLs anyway
or
* URLs can have potentially disastrously sensitive information in them, as long as it's not username and password
or
some other opinion.
In short, this was an information-gathering question designed to test the bounds of a hypothesis that I was forming, but did not yet have enough information to put any faith into. :)
> Putting your password in plaintext is unacceptable because exposes it to anyone who may be sniffing your traffic.
I thought query strings are encrypted in a https request.
>This is how The Web works.
This is one of many ways the web works, URLs were designed way before there was an easy way to share them outside of the scope of the website it self.
Looking at my history 50-60% of the URL's in it aren't humanly readable and are 10 miles long, many of them do not require authentication it would be nice if people would adhere to URL's are meant to be humanly readable and easy to share but you know...
...life.
>Google already does exactly what I described when you make a search on *.google.com with a Javascript-enabled browser. It's what lets your back button work with their JS-based page updates. ;-)
No it doesn't my back button works just fine and there isn't a single request made when i click it, just tried it know both with the network parses in chrome dev tools and fiddler.
Yes info leaks are bad, but this isn't even an info leak if you start a completely new query it resets the original query variable
e.g.:
You search for query
you'll get ?q set to query and ?oq set to query
you search for query history it will set ?q to query history and keep ?oq set to query
you search i got a rash on my butt it will reset both the ?q and ?oq URL parameters to the new query because it's unrelated.
So you won't leak the fact that you searched for manliness enhancing spa treatments to your friends if you link them a query of a pole dancing cat. Your honor is safe my friend.
"So, it would be acceptable to you if Google put your Google Account login and password in URI-encoded plain text in the URL of your search results? Why or why not?"
This is actually a pretty important question.
> Looking at my history 50-60% of the URL's in it aren't humanly readable and are 10 miles long...
It doesn't matter how long the URL is. It doesn't matter if you can easily read it. The Web works through URLs. URLs (and URIs) are how you access resources.
* Data in the href attribute of the a tag? A URI.
* Data in the src attribute of the img tag? A URI.
* Data in the src attribute of the script tag? A URI.
* Data in the 200 response to an HTTP GET request? A URI.
* Data in the 30[1|2] HTTP response? A URI.
* Data in the address bar of your User Agent? A URI.
> [T]he history API... has been deprecated and turned into the App Activities API...
No. It has not.
> ...my back button works just fine and there isn't a single request made when i click it...
Right. That's what the History API does. It's a strictly client-side thing. I guess you're dreadfully confused. Here's [0] the first result for "History API". It also happens to describe exactly what I'm talking about.
> this isn't even an info leak if you start a completely new query it resets the original query variable ... you'll get ?q set to query and ?oq set to query ...
> you search for query history it will set ?q to query history and keep ?oq set to query
Cannot repro. Here's what I see:
* Use omnibox to search for "thing": o=thing&oq=thing
* Use google search page that loads with results for "thing" to search for "things": o=thing&oq=thing#q=things
* Use that same page to search for "dingus dongus": o=thing&oq=thing#q=dingus+dongus
It's a pity it redirects to add "&ia=definition", but it's nearly pretty. And if this didn't blow your mind yet, if you right-click a search result to copy the URL, you get the actual URL with DDG, not an abomination of an URL that redirects you via Google to the URL you thought you copied (and which, to add insult to injury, gets masked when hovering over the link in the status bar, like a malware page might do it).
Sure, that's "unavoidable" if you want to track more stuff, but wanting to track more stuff in itself is not unavoidable. It'a choice.
> Not to mention that making a API call every time you need to do something as basic as backtracking on your search is just pure insanity when it comes to resourcing.
That's exactly what Google does though. It would be perfectly possible to let stuff be cached in the browser at least for a few minutes, but there is always requests being made. They throw away those user and server resources to track hits. It's a bit like YouTube used to buffer and allow skipping back and forth without having to load one more byte, and now doesn't. There's ads to display and usage statistics to gather. Which you might argue they need that to "improve their product", but I think looking at, thinking about and using a product can get you a long way, too. If you need to see how real people use it, pay them to use it while you record them. Yeah, I know, it would ruin everything and we'd be left with Pong, right? Baby food manufacturers could also improve their products if they could experiment on thousands of babies without any restrictions, and I bet if that was the norm, any reform would be met with visions of doom about toxic baby food.
Right now, nothing companies like Facebook, Google, Apple and Microsoft could invent would be nearly as cool as a simple, efficient web with users educating other users how to use it responsibly. If the web was plumbing, we'd be looking at people making all sorts of weirdly shaped pipes out of fancy materials, and people would sell their plumbing skills by talking about their religion and family, instead of time and materials needed, and capacity and stability of the result. We're still at the level of people burying slaves at the foot of a new aqueduct, and I can't wait until this hysterical gold rush is a mere footnote in the history of an actual information age. Right now we live in the marketing age, and the mediocrity is a direct result IMO.
"...curious to hear HN's thoughts on if this should be fixed or is acceptable."
I consider it unacceptable. Why does Google do this? Is it for the purpose of tracking users?
This feels like a common problem with URLs. More and more sites send you onto other links while apppending some identifier to the URL. I frequently come across links shared by friends and colleagues where I can immediately see the referral site that was the source of the link (even though the destination of the link is different). Presumably this is yet another layer to the endless (but rarely questioned) tracking that has become the norm on the web.
In my opinion this is acceptable. It's probably even functional: the content of the first search might very well affect the results of the second search.
For me, it's mainly an issue when trying to copy and paste a link to a file like a Word document that downloads instead of being viewable in the browser.
>Note: This information has been disclosed to
>Google appropriately, they have chosen to not
>fix this behavior.
>The other day, my friend sent me a link
So did Google respond with a "No" or is 2-3 days with no fix considered too long?
I'm not diminishing the seriousness of this problem but it just got a whole lot worse being on HN after only 2-3 days head start on a fix to a major part of the Google product line.
Can you post this dialog? I have trouble understanding how they gave a solid "No" on something which warrants much consideration.
Most companies won't even respond if they don't intend to fix something, that way they can claim they're working on a fix when it all blows up in their face.
A safe(r) way to search Google is using startpage.com, which searches Google anonymously on your behalf. I just confirmed this bug doesn't translate over to Startpage; this is the URL you get when you follow the same procedure as the author:
Slight OT, but it took me 4 hours to realize that it's Google who's crawling the URL of pages (mostly cron jobs, or admin sites) that nobody should have never stumble upon.
I'm not sure how this works (the URL I paste and use in Chrome is sent to Google?), but it came back few hours later from an agent called "Google favicon". The IPs in 66.249. range strongly suggest it's a legitimate Google bot.
Why they do it - no idea, but just to be save I blocked all 66.249. and 66.102. traffic across my systems.
EDIT: yes there is robot txt that dissalows all traffic. Google ignores.
This is very internal site - no need to have everyone from 66.249 acessing it.
You should look into robots.txt for telling Googlebot (and other crawlers) about where they should and shouldn't crawl on your sites.
Blocking by IP isn't the way to go, especially since Google's actual IP space in 66.249 and 66.102 is much smaller than the entire block (a /19 and /20, respectively) so you're causing a bunch of collateral damage to other networks in that address space.
So if people send urls with queries (they should just send actual target urls) then it's quite relevant that the search results are the same for the both links sent.
>Google’s automatic inclusion of prior search terms is a similar violation of a user’s privacy expectations, and they should fix it.
How? That's information you are voluntarily sharing. Besides you can easily see what you are sharing since it's not obfuscated nor encrypted in any way.
The lack of ability to share search results is a conscious decision by Google to make their search engine feel private and personal.
You have your own Google that knows a great deal about you. It's is tailored to you, your location, your life. Google has become an extension of self.
Making search more social by adding simple sharing capability means that you would be less likely to use Google for personal and private search queries.
Google is the most successful advertising company on the planet.
Yeah, it'd be nice if Google search pages came with a handy "Link to this" button as Google Maps and Google Books do...but in reality, that would raise even more issues. If I understand Google's search mechanism...the results are tailored to each user...my search for "burger place" is going to be much different than someone doing it in New York. Yet if I see a "Link to this" button -- especially if it's a link shortener -- then I would expect it to direct all users to results that I've seen. OK, I don't expect that...but I bet 99% of other users do.
So it's not a clearcut decision about whether such a feature should exist, given the ephemeralness and personalization of SERP. So given that it's not an intended common use case to pass around URLs to results...what is it that Google should do to fix this? Re-engineer their system so that every URL is a non-human-readable URL that obfuscates all possible metadata? Remove the ability to tailor search results based on past results of that same session? I can see why they might choose: "If a user wants to send around a URL by manually copying and pasting it, we should trust them to read it before they send it out as the relevant parameters are human readable"