Google featured snippets are worse than fake news

ronack · on March 6, 2017

I've also wondered why Google isn't held responsible for publishing libelous claims and hoaxes as facts. Examples:

Is Hillary Clinton a pedophile? https://www.google.com/search?q=is+hillary+clinton+a+pedophi...

Is John Travolta gay? https://www.google.com/search?q=is+john+travolta+gay

Does Lil Wayne have HIV? https://www.google.com/search?q=does+lil+wayne+have+hiv

Perhaps worse, this is what drives Google Home's question answering. Yes, they say "according to so-and-so" first, but if Google is responsible for "organizing the world's information", they are essentially endorsing that answer as the best response. They've gone too far in favor of recall over precision/reliability and need to dial it back. Otherwise you end up with crap like this:

Is Earth flat? https://www.google.com/search?q=is+earth+flat

dmboyd · on March 6, 2017

Wow, I've mostly converted to DuckDuckGo for the past two or so years, so are these "google assistant" style placements now part of their core product? or is this a setting that you've enabled?

If it's part of the core... Wow, what a useless cess pool google has become.

ronack · on March 6, 2017

Yes, those are default search results and the same content that drives Google Home's voice responses when you ask it a question. Originally Google stuck to Wikipedia/Knowledge Graph types of answers but have expanded this in recent months to try to answer just about any question you search, however dubious the answer's source.

TylerH · on March 6, 2017

I think it's part of the core, but I haven't tried to disable it before; that may be possible.

However, it is dependent upon the search query: https://www.google.com/search?q=flat+earth doesn't show it, for example.

Perhaps only queries in the form of a question show it.

But yes, it is a piece of shit and they should get rid of it if they aren't going to curate the kind of content that can get put in those boxes.

voxic11 · on March 6, 2017

Because they can't be under US law.

> No provider or user of an interactive computer service shall be treated as the publisher or speaker of any information provided by another information content provider.

https://en.m.wikipedia.org/wiki/Section_230_of_the_Communica...

benmcnelly · on March 6, 2017

Are you kidding me?! We need to get this law changed ASAP.

Look at this search result. This fake news stuff cannot be allowed to continue.

https://www.google.com/search?q=best+taco+place+in+San+Fansi...

voxic11 · on March 6, 2017

It's a pretty essential law for maintaining the internet as we know it. If providers could be sued for the content, provided by others, that passes through their services then we can't really have an open internet. ISPs would be afraid to allow anyone to connect because they could be sued for what their costumers do. Search engines would be afraid to index sites because they could be sued for any misinformation on them. Github would be afraid to host projects because they could be sued if any of them violates copyright. It would be a very different internet.

benmcnelly · on March 6, 2017

You are right, and I was agreeing with it.

One downside to being sarcastic on a comment thread to make a point is I can't tell if people are down-voting me because they clicked the link, or because they didn't.

pavel_lishin · on March 6, 2017

Another downside is that it doesn't add much to the discussion.

benmcnelly · on March 6, 2017

Sure it does, one of the most effective forms of provoking thought is satire.

People just don't like it when you don't think the way everyone else is thinking.

segmondy · on March 6, 2017

Did Google publish that? It didn't, it "answered with what it found" You asked Google, and it replied with data. There was no person at Google who decide that any of the following is true and hit a publish button.

For Google to be held liable, Google must know what is true and false for all it's knowledge base.

tyingq · on March 6, 2017

Try searching: Which presidents are rapists?

On mobile at least, it really does look like Google is presenting a fact: http://imgur.com/a/DZP1H

Attribution at the top, and maybe more context would help.

Note that the article Google scraped this from is NOT calling those presidents rapists. It mentions one of them was accused, but not prosecuted.

ronack · on March 6, 2017

It is being represented as the answer to my question - visually on google.com and explicitly from Google Home. This is different than a forum, Reddit, etc. hosting a variety of content for which it is not responsible. Google is effectively endorsing a specific piece of content in response to my question. This will either need to be refined or I suspect it will be challenged in court eventually.

patmcguire · on March 6, 2017

Wow, those are bad.

DanielBMarkham · on March 6, 2017

Yikes. More of this "People only want to be told what to do" stuff.

I can easily think of a dozen political questions for which no simple answer would be correct -- the language is simply too fuzzy. For many, many things, this is a good idea. The date of Easter is the date of Easter. But the glaring danger here is that for many things this is a freaking evil dystopian nightmare. Why? Because Google will keep tweaking these impossible questions until they look real enough to most people that nobody complains At that point, millions, perhaps billions of people are asking complex questions to a little box that has designed itself to give an plausible but incomplete or wrong answer.

Epistemology, Google. There are things you can know and things you cannot. Please do not treat them all the same.

deong · on March 6, 2017

It's not even limited to this kind of question for which there's no simple answer to give. Last week, my wife was joking around with our 9-year-old, leading to him trying to convince her that mermaids aren't real. Cue, "OK Google, do mermaids exist?" To which our Google Home announces, "According to Weekly World News...<yes basically>".

If "do mermaids exist" can't be classified as a thing you know, I'm not even sure you're at the point where epistemology is your most glaring problem. You know so little that you should stop volunteering to answer questions.

DanielBMarkham · on March 6, 2017

A common complaint that pg and many of us have about popular forums is that you always end up in some kind of semantics game.

That's true, and that's frustrating, but that's also the basis of shared human understanding, the dialectic. I wish I could just say "There are no mermaids", but the definition of "is" and "being" here is complex. In many other cases this definition is not complex.

We programmers get dissed a lot because we can tend to be on the arrogant side and think any problem can be fixed with more CPUs or algorithms. For many situations, like HN perhaps, if it works for 99% of the folks it's good enough.

But if you're the world dominant leader in answering questions? And you want to give the immediate, correct answer to anything somebody types in? Those little 1% cases pile up. There are a crap ton of situations where the minority opinion is the only thing that advances the species. Existence is much more than simply making the correct or best-guess decision based on a bunch of facts. Most times the facts and basis for definition of terms in the facts are in flux. Gad, I wish it were as easy as I thought it was when I was a kid.

deong · on March 6, 2017

Context is important. For a general question-answering bot, you have to synthesize information and make a decision. The answer to a question posed by my uncle on Facebook isn't necessarily the same as the answer to that question posed by an audience member at a talk I'm giving at an academic conference. In the latter case, there's an enormous shared understanding that lets us skip the basic-agreed-upon facts and get right to the nuance. My uncle doesn't need nuance. He needs to know whether climate change is a Chinese conspiracy or vaccines cause autism.

Minority opinions are sometimes the thing that advances the species, but that's a job for a time and place that isn't "answering random questions for arbitrary users".

The answer to "do mermaids exist" is "no". It just is. That doesn't preclude a group of 200 experts devoting their research careers to investigating the question, and if they happen to prove otherwise, then we alter the textbooks and move on. Ideally, Google would then start answering "yes" instead. But if you want to be that world's dominant authority at answering questions, you can't just always say "well, it's complicated. People disagree. The world is a complex place and the definitions of terms and the facts are in flux", because you're no longer answering a question.

rdlecler1 · on March 6, 2017

There are ways to address this. If you ask for the diameter of the earth there will be an abundance of authoritative and information information. If you ask for something like is Obama imposing martial law and you return a passage from some conspiracy blog then maybe you shouldn't be highlighting this as a OTA. At the very least they could put some kind of confidence bar and a link to a directory of competing sources.

maxerickson · on March 6, 2017

The date of Easter depends on which calendar you use. The second largest christian church uses a different calendar than the largest one.

DanielBMarkham · on March 6, 2017

It's a great point. It shows how easy it is for somebody to think wow, that question obviously has a clear answer when the opposite is true.

Take the lead example: Was President Warren Harding a member of the KKK?

We know the KKK said he was. We know he vigorously denied it. We know that biographers of him and experts in his history do not think he was.

But how could you ever prove that somebody was not the member of a secret society? The entire purpose of secret societies is that the outside world will never know if you're a member or not.

Look, I'm happy to assume he wasn't, and at some point only kooks continue to chase things down past all reason, but just assume for a second that he was, and that somehow that fact could be determined but the research has not been done yet. So history student X starts wondering about it one day and types the question into Google. Never fear! Google has the answer: he wasn't. And this is Google's answer because this answer has the benefit of looking the most plausible to the greatest number of people. Student X has an answer, gives it no more thought, and wanders off to think about other matters.

Student X is now more stupid. The world as a whole is now less productive. And some preponderance of data that academics have assembled somewhere around the year 2015 has become truth for all time. That's amazingly fucked up, Google. Please stop.

theseatoms · on March 6, 2017

Google should report a probability distribution for each truth claim.

jerf · on March 6, 2017

While an interesting idea in theory... and I mean that!... in practice that's saying they just shouldn't provide answers. Even saying for the sake of argument that Google could assemble a meaningful and correct probability distribution, itself a rather bold statement, the average Google user won't know how to correctly interpret such a thing.

Bartweiss · on March 6, 2017

And realistically, asking for probability distributions over unbounded sample spaces is trouble even for human experts. I suppose the space for "are mermaids real?" is workable (there are caveats like "yes but extinct", but yes/no is basically sufficient), but for anything subjective or political it's almost hopeless.

That said, this is a really interesting idea for behind-the-scenes work. The public might not want percentages, but I suspect experts could get some interesting results by diving into confidence levels and secondary answers.

forgottenpass · on March 6, 2017

But google isn't even making a truth claim. Google fundamentally doesn't understand the claim, or how to grade it. All the google engine knows is the statistical properties of words and phrases involved and associated with the claim.

maxerickson · on March 6, 2017

Yeah, that was my intended meaning.

There probably is an argument to be made that in most English speaking countries "Easter" refers to the date calculated using the Gregorian calender, with some qualifier being used to specify the other date (Google answers searches for "eastern easter 2018" and "orthodox easter 2018" with the correct information).

Bartweiss · on March 6, 2017

This is a really good example of the problem. We aren't just struggling with subjective questions or machine learning getting tricked by fantasy (e.g. mermaids). We're also up against the fact that even very simple questions can't necessarily be answered cleanly.

Does Google show different Easter results based on location? I'll bet it does.

scandox · on March 6, 2017

Yes he stumbled into a particularly thorny theological matter there. The Quartodecimans are going to hunt him down.

Sharlin · on March 6, 2017

Plus, of course, it depends on which year you're talking about, in a very nontrivial fashion. Googling "when is easter" gives the date of the current year's Easter, which is reasonable, but might lead to an incorrect assumption that the date of Easter is fixed.

deong · on March 6, 2017

That's true, but I don't think it's a problem. That's a question that just has two separate intentions, and even the world's foremost expert on Easter could give you the wrong one.

If Google were as good as the best human experts at answering any question, then I think we'd have to declare victory and live with the occasional miscommunication, just as we do today.

DanielBMarkham · on March 6, 2017

I think the crux of the issue is that real experts in any area, when presented with a question in their field, will answer "That depends" and then engage you in a conversation that ends with you being more knowledgeable about the specific question you should have asked, and why.

This is because most areas of human endeavor aren't solid sciences like physics, although for most day-to-day activity we treat them as if they were. I do not care about the nuances of Easter as long as I can get the flowers delivered to my grandma on the correct day.

But I do care about the nuances in a ton of other situations, and the dividing line isn't so clear. Simply BSing an answer that's good enough for most cases actually ends up hurting the rest of us more than it helps us. For lack of a better term, it's fake facts.

thwarted · on March 6, 2017

I think the crux of the issue is that real experts in any area, when presented with a question in their field, will answer "That depends" and then engage you in a conversation that ends with you being more knowledgeable about the specific question you should have asked, and why.

While I actively encourage this method of inquiry and information acquisition, I've found that people often don't care to know why their question isn't even wrong, and challenging it leads to a defensive posture. This conversation is only keeping them from getting an answer and the expert risks being labeled as "uncooperative" because they refuse to provide an answer. This is a social problem, and it won't have a technical solution.

rdlecler1 · on March 6, 2017

Yes, but Wikipedia recognizes this fact.

forgottenpass · on March 6, 2017

Epistemology, Google. There are things you can know and things you cannot. Please do not treat them all the same.

There does not exist a category of things the google can know when a question with semantic meaning is answered by trawling the web and looking for statically connections to the words and phrases in the question.

I don't even think we have to take a peek into philosophy of what "knowledge" is to categorically deny that any output of that process could ever be called knowledge.

My words fail me, the best explanation I've seen on machine learning 'facts' is: "The output of the algorithm is not a human-understandable fact. It is the result of complicated and unintuitive statistical properties of the input material. However, that data looks like facts we can easily understand" [0]

"Don't be evil"? Cleaning up the result of such an algorithm so that it looks like a fact to a layperson is in a very real sense evil.

[0] https://www.reddit.com/r/programming/comments/5tovk1/on_the_...

TheGRS · on March 6, 2017

While hiring a ton of people to fact-check would be one solution, that obviously wouldn't be very scalable. I think the problem lies in google's algorithm. They seem to be pulling answers from well-visited sites that may purport to have an answer to these questions. Quantity of visits does not equate to truthfulness. What Google should at the very least do is whitelist certain publications for answers to pretty simple stuff like encyclopedia britannica or wikipedia. More complex stuff maybe they could source academic journals and certain newspapers. But throwing caution to the wind and hoping that the web crawler knows best really will hinder their ability to be a source for gaining knowledge.

What's weirder to me is that it seemed like they were going with my proposed route for a pretty long time and only recently starting providing dopey answers. Maybe its part of a grander experiment they're doing to vet question answering AI?

abandonliberty · on March 6, 2017

Metrics are destroying the web.

Once upon a time it was possible to identify "good" pages by how often they were linked - effectively crowd-sourcing the problem.

Natural selection: sites have learned to manipulate the game and the crowd.

Under pressure to optimize metrics that lead to SEO and better valuations, the internet is getting less useful from a user perspective.

I don't want to watch a video/slideshow, download an app, register for a forum, or read through a 2000 word fluff piece with interspersed ads and links to more information in which site has buried the one-sentence answer to a 2-second question in order to maximize my time on the site.

This garbage is what makes it to the front page of google these days. I guess the poor sites that aren't user hostile just aren't SEO'd enough.

65827 · on March 6, 2017

Metrics are destroying lots of departments in lots of companies. Everyone seems to fixate on the analytics for this week or maybe this quarter, there is little or no attempt to understand the why or think long term. So many companies have cut back severely on the sorts of risky investments they used to make regularly, because the data is being conflated with reality

gregmac · on March 6, 2017

What's strange is this is not at all a new problem, and Google has historically been quite good at fighting it.

The original search engines ranked pages by their content. Naturally this led to gaming by including keywords (remember huge invisible sections of pages that just repeated keywords thousands of times?).

Google's original PageRank algorithm was a complete breakthrough for this, almost completely disregarding content and instead ranking results based on the text other pages had used to link to the page. This was so good, in fact, that most of the other search engines from the time didn't survive.

This once again led to large-scale gaming with techniques like link farms, and a small arms race as Google came up with ways to squash these new techniques. The quasi-legitimate "SEO" industry spun out of this.

I think now we're in the same cycle again, and at this moment scam sites are winning. What's to be seen is if there's a looming breakthrough, or the arms race will continue.

joshuamorton · on March 6, 2017

I think a big problem now is that the sites are capable of tricking people. That is, if you have a "turing test" for reliable content website, blogspam.forums.net with 2000 backlinks and such obviously doesn't pass. On the other hand, Reuters.com does. But so does theonion, and Breitbart and random_russian_conservative_propoganda_site.us, and so it becomes a lot harder to differentiate the good from the bad without explicity curating by experts, because the sites can trick non-domain-expert humans, so what hope does a bot that hasn't tried to learn domain expertise of the topic have?

throwaway2048 · on March 6, 2017

A good example of this is mailing list archives. Pages and pages of google links searching for mailing list posts are filled with maximum ad-cram (often dressed up like a fake forum) garbage, meanwhile the clean, no bullshit mailing list achive everyone links to (marc.info) is nowhere to be found at all.

This is likely because marc.info isn't playing the SEO game, and spamsite #43254 is.

hobofan · on March 6, 2017

> will hinder their ability to be a source for gaining knowledge

CAUTION: subjective experience ahead

I feel like in the last few years Google's utility for knowledge discovery has decreased. Instead it seems to be geard more and more to drive you towards purchase and shallow content sites. I think the growing popularity of "awesome" lists and other forms of curated content discovery is also driven in part by Google's lack of "good content" discovery.

bigbugbag · on March 6, 2017

Google stopped providing relevant search results more than 10 years ago. It used to be several pages of relevant search results, now you are lucky if you have more than 2 or 3 somewhat related results in the first page and that's about it.

You can tell there was a paradigm shift at some point and the search engine became a second (third? fourth?) class citizen.

ccozan · on March 6, 2017

Actually google has become ( for me, at least) lately some kind of proxy to stackoverflow and wikipedia, since both this two services have atrocious own search solutions and I have to rely on google to get interpret my keywords the right way.

cdubzzz · on March 6, 2017

This really does seem like a big problem. What are some sites that do site search well? Is there currently a good way to implement something even half as good as Google locally without a major lift?

ccozan · on March 6, 2017

I was senior engineer with Autonomy ( before being bought by HP ) and we did really good projects for some big companies, it worked well. Lucene comes a long way, with enough time, some cool stuff would emerge.

But if you need to replicate google, you need bootloads of user data and in generally data to operate on ( i.e. to be able to extract the equivalence of certain terms and expressions, and so on ). And generally some kind of feedback system ( like user clicking a link on a SERP, is adding that question/answer combo a certain boost for the future ).

In short: it's complicated and after 10 years of experience I couldn't tell if a solution is so unversal as google.

cdubzzz · on March 6, 2017

Pardon my ignorance, but would such a service need to be specific to a site? I'm wondering if there is a business (or perhaps better yet, non-profit) opportunity to provide data to drive something that can be tacked on to individual sites. Would an organization like Mozilla have an opportunity here?

fixermark · on March 6, 2017

You've just described Google search.

If your business pitch is "Like Google search, but Google doesn't control it," good luck getting investors.

ccozan · on March 6, 2017

It depends. Organizations have different data, so you need a lot of abstractions in order to fit into a certain search pattern (meaning, what are the users looking for).

For example, current solutions implies there is a "document" with some metadata. But what is a document in wikipedia or SO context? So is quite hard to get a an answer which satisfies both cases and you end up with cramming all kind of data together just to make a fit. So in the end, you still have to use the google model, of using the URL as a key for (data, metadata). So in this case, just let google/bing/yandex do it's job.

On the other side, I saw Google Search Appliance fail spectacularly in enterprise, because the google model couldn't fit and you end using real custom search solutions. There are quite a lot of companies filling this niche at this moment and so I think Mozilla wouldn't make a dent, if decides to enter this market.

cdubzzz · on March 6, 2017

Interesting. Thanks for your insight.

My line of thinking was on working with "what is the user asking" vs. "what is the user looking for" in a way that could be applied regardless of the information being searched (by, for example, a non-profit with some existing insight - hence the Mozilla thought). But I know almost nil about the space so I suspect that is somewhat naive.

fixermark · on March 6, 2017

Why bother when Google will index your site for you?

It's one fewer engineering task for a busy content provider to take on.

cdubzzz · on March 6, 2017

I can think of many reasons to bother. In the context of the article being discussed, it can be argued that Google is presenting sites' content without clear attribution. Preventing the need for the user to click-through also limits ad revenue (where applicable). Either would be a perfectly good reason to consider alternatives. It's a massive trade off, of course, given Google's position but I don't think decentralized search is a bad goal.

testudovictoria · on March 6, 2017

I think there are two things at play here.

This first is that you were probably a power user back when good search results required you to be good at search syntax. Back when Google wasn't the only search engine on the web, you could ask Google questions. Your results would be limited in relevance. If you tailored your searches by ensuring specific words or phrases were included, excluded irrelevant phrases, and added wildcards where necessary, you'd get pages and pages of great results. Google takes your search history and tries to predict what you're actually looking for.

The second is how Google filters your results to try to tailor what is relevant to you. Most people who hang out here on HN or StackOverflow would probably get results for the software framework filtered to the top when searching for "electron". Chemists, physicists, and other scientists of the like are probably going to get results for the particle filtered to the top. Adding this type of customization probably has ramifications beyond simple terms like electron. I don't have any evidence whatsoever, but I imagine that this customization would give biased results when searching political, economic, or social topics based on the sites that are visited.

hobofan · on March 6, 2017

Your first point definately sounds plausible.

Another theory I have is that my perception of good results changed over time. It could be that for the first time in my life I know enough about the domains I'm looking up that I can differentiate the good from the bad. On the other hand even with that, when I enter a new domain, e.g. AI it still feels "less discoverable", even though there are definately good introductions out there that you can find when going beyond Google.

mythrwy · on March 7, 2017

Maybe there is a lot more junk then there was 10 years ago.

radicsge · on March 6, 2017

I feel referring to (or hiding behind) scalability is a bit salt in the wound. These huge companies are destroying so many jobs (by scalability) - ok unavoidable - but they should do sometimes things that are not scalable and give jobs back to people in trouble. Im not sure how long they should be allowed to go away with it. [just dreaming...]

blowski · on March 6, 2017

I agree with your sentiment. Google is abusing its position of market leader to dominate other markets which it's not very good at, which results in providers that _are_ good at those markets going out of business.

If Google wants to make money by being the endpoint for every question ever asked, they need to accept the consequences when they get the answers wrong.

NoGravitas · on March 6, 2017

> What's weirder to me is that it seemed like they were going with my proposed route for a pretty long time and only recently starting providing dopey answers.

I feel like the article addresses this. Google has basically two different products that produce those top of the page, set-aside answers. One is "Knowledge Graph", which basically does what you are suggesting - grabs straightforward answers to simple questions from Wikipedia and similar. The second is "featured snippets", which is the one causing the outrages that the article highlights.

tbrowbdidnso · on March 6, 2017

I'm no wizard, but in writing blog articles I've found ways to fool Google into believing me.

The problem with their algorithms.... is that all that statistics in the world can't help you when you're listening to a guy telling the truth vs an equally good liar.

I can tell you what Google doesn't have, a strong AI. It thinks it knows "facts" but these are merely patterns, and these can be gamed.

Because Google still lacks a strong, truly thinking AI, they rely extremely heavily on statistical models to rate content.

So how do you cheat google search?

Google's systems attempt to figure out the topic of your writing, the style, and quality. Is it scientific? An opinion piece? News? Is it a technical topic? A playful one? Fiction or nonfiction ?

The quality classifiers are much easier to game than topic and style analysis. They determine things like reading level of text but also things like the number of rare nouns, number of technical words, number of typos. Readability as far as font and formatting. Trustworthy signal of your domain and possibly the company and people they determine to be linked with it.

I also have a feeling google uses sneakier signals as well. These include your DNS registrar, phone number, email, and address listed on the site. Who you host with and what technologies you're using. Your mail servers and how trustworthy they are. Geo location, and visitor traffic info as soon as you put analytics on the site (or use amp)

Basically when Google says they have tons of signals, they do. They have a dataset that amounts to every site on the internet for the past 15 years, and they regularly run automated and manual "theory provers" much like quants do with historical stock market data. They find new signals constantly, and run tests to see if their new algorithms are better.

You know how sometimes google randomly takes a bit longer to load search results? My tin foily theory? They'll occasionally guinea pig you on prototype search results to see if they're better. I noticed their response time getting really bad a couple months before the public rollout of new AI powered search for example.

So gaming google? Do exactly everything that a large, legitimate, no-nonsense company would do. From where you host, what you host with, to who you link to. Bonus points if you have significant real looking mail and other traffic from your domain. Extra bonus points if you actually sell something real as cover and do it for at least a few years.

Once you've done enough convince google you're a big important thing IRL...write a ton of really subversive bullshit. Make it sound a real as possible, hell make 90% of it real, just with a single unverifiable fact. Keep pumping this shit out and make sure your garbage is never fake enough to get called out on. Or just make the fake part so hard to verify that nobody will waste the time, kinda like half the science world does when publishing papers.

jwl · on March 6, 2017

Gaming Googles image search is a popular activity on some subreddits. Basically telling people to upvote a picture of something with a misleading title. Like if you search for images of "gaming console", a picture of a potato will appear. It is very clear that their "AI" is not that clever yet.

malisper · on March 6, 2017

> The problem with their algorithms.... is that all that statistics in the world can't help you when you're listening to a guy telling the truth vs an equally good liar.

While that's true for sites in isolation, Google published a paper a few years ago[0] that describes how you could estimate the trustworthiness of a website. The basic idea is you assign a trustworthiness score to each website. Then, you determine how likely a fact is to be true based on the trustworthiness of the sites that state that fact. You can then recalculate the trustworthiness of each site based on whether it agreed with the fact or not.

[0] https://arxiv.org/pdf/1502.03519v1.pdf

tbrowbdidnso · on March 6, 2017

What we seem to be running into, is that any strategy based on trusting some sites and not others breaks when very large groups of sites have opposite opinions. Depending on your starting set you will end up with wildly diverging trust scores

danmaz74 · on March 6, 2017

> I can tell you what Google doesn't have, a strong AI

Did anybody think the contrary?

tbrowbdidnso · on March 6, 2017

If anyone had one it would be Google :)

danmaz74 · on March 6, 2017

Fact is, nobody is even near having one ;)

bigbugbag · on March 6, 2017

Fact is no one wants a strong AI, skynet leading to matrix future is not that desireable. On the other hand soft AIs are everywhere and replacing humans in many jobs.

ceejayoz · on March 6, 2017

> Fact is no one wants a strong AI, skynet leading to matrix future is not that desireable.

I want a strong AI, because I'm more a fan of the depictions along the lines of Iain Banks' post-scarcity Culture Minds.

Vinnl · on March 6, 2017

A whitelist is sure to get them anti-competitiveness suits, and rightly so.

marcosdumay · on March 6, 2017

What is the difference between a whitelist and an algorithm with unknown but discoverable biases?

Vinnl · on March 6, 2017

They're both undesirable, but a whitelist is easier to fight.

TuringTest · on March 6, 2017

Why is it that hard for engineers to rely on good old attribution?

If every Google's featured snippet started its reply with "Breitbart says..." or "Trent Online, the leading Internet Newspaper in Nigeria said...", it wouldn't matter so much for those inevitable cases when the reply is taken straight from a white-supremacist or radical anarchist forum. The problem comes when the same reply is provided as "Google's true answer to the question" without further caveats.

interpol_p · on March 6, 2017

Prepending the source won't stop people from giving weight to the answer. Because it's the first and only one.

You would probably risk giving more credibility to sites like Breitbart by crediting them with the top result, as people will come to associate those site names with answers given "by Google."

If Google gives you a top search result, it means Google trusts that result to be a good one. There's no getting around the fact that people trust Google to give them good information, and Google's algorithms are failing to provide trustworthy information.

DougWebb · on March 6, 2017

If Google gives you a top search result, it means Google trusts that result to be a good one.

Google doesn't "trust". That kind of thinking is the problem. Google has computed a ranking of the search results based on a bunch of factors, none of which involve any kind of cognitive analysis or judgement.

The rest of your statement is absolutely right; people trust Google's ranking to imply an authority that it simply can't provide, because it's magic to them.

deong · on March 6, 2017

Google isn't just an algorithm. It's a corporate entity with user-facing software, and while the algorithm has no notion of "trust", I would argue the company is demonstrating trust in any source that it presents as an answer in one of these bubbles. We're all agreeing that they shouldn't in many cases, but I do think they're doing so today.

I don't care what your internal reasoning is. If I ask you a question and you present your answer as a fact, you're either trusting your information is correct or you're willfully lying to me.

I think Google's defense here would be that they intend these answers to be interpreted by an intelligent adult. If it tells me that mermaids are real, I'll have a laugh about it and move on with my day. If it tells huge groups of apparently functioning adults that President Obama is instituting Sharia law, and those people believe it unquestioningly, well, I'm not sure Google is equipped to operate in that world.

TuringTest · on March 6, 2017

> people trust Google's ranking to imply an authority that it simply can't provide, because it's magic to them.

More the reason to expose the relevant parts of the "magic" as much as possible, rather than working as a black box and presenting the result as appearing out from nowhere.

interpol_p · on March 8, 2017

When I say Google "trusts" the result to be a good one I am saying that I believe their engineers and managers think their top answers algorithm is good enough, and trustworthy enough, to roll-out on a large scale like they have done.

pjc50 · on March 6, 2017

http://www.weeklystandard.com/trumps-wiretap-claims-what-we-...

"White House sources acknowledge that Trump had no idea whether the claims he was making were true when he made them."

Here we have great attribution and a high-profile source (Trump, the source of claims about being wiretapped by Obama). It's just that the source doesn't care whether the claims are true or not.

tannhaeuser · on March 6, 2017

Fully agree that Google could present cues related to from whom a snippet was received (for example, by putting the snippet in a speech bubble), along with "what others say" complementary/discourse links.

In fact, they should change their entire search result layout so as to add an "epistemological" (ie. related to how the presented result was obtained) knowledge dimension in the visual presentation, and enrich the search result with eg. links to employees, parent companies, and other known affiliations.

That would be a strong move and help fight fake reviews, astroturfing, and fake news campaigning. And it would be much easier to implement compared to strong AI, if the latter is possible at all. In any case, it would be much more useful than the "searched 3.123e8 pages in 0.3 ms" non-information we get now.

Not sure whether posts by verified/accredited people (real names) should be ranked higher in search results. Traditionally, hackers have preferred pseudonyms, and there are very good reasons to stay anonymous in oppressive regimes. OTOH, using pseudonyms/fake accounts at scale got us into the kind of political and other mass manipulation we have now.

BearGoesChirp · on March 6, 2017

Why does the source where the reply is taken from matters?

I only see a few ways this works.

The answer is correct. In this case, the source doesn't matter.

The answer is incorrect. In this case, the source doesn't matter, there is a flaw in the entire concept.

The answer's correctness depends upon the source. In this case, we have a deeper problem with the entire meaning of what a fact is.

Personally, I actually go with option three, but I think the deeper discussion needs to happen before we begin to even consider if something like featured snippets is worth doing. The best way for me to sum up on feelings on this is that while facts are absolute, any piece of information is either true or false (at least within the realm of normal human interactions), it is impossible to separate any fact from a human's perception of the fact which will automatically introduce biases.

An example: A few weeks ago I read online that <CNN anchor> did <very bad crime>. (I'll leave out the details because it only detracts from the point.) This is a factual statement, I did read it online, as did many others. But if many people who read it online states they read about it online, it creates a perception of truth to it, when the original post and the original replies were all showing how easy it is to create a false character assassination story. Regardless of how factual the statement is (it is true that I did read it online), my stating of such would still be an attempt to mislead even though I was using 100% factual statements.

I do not see any automatic truth determiner being possible until our society realizes the biased nature of human known facts.

TuringTest · on March 6, 2017

> I only see a few ways this works.
> The answer is correct. In this case, the source doesn't matter.
> The answer is incorrect. In this case, the source doesn't matter, there is a flaw in the entire concept.

> The answer's correctness depends upon the source. In this case, we have a deeper problem with the entire meaning of what a fact is.

Again, why is it so hard for engineers to grasp the concept and importance of graceful degradation?

A service doesn't need to be perfect to be useful, but if it's not perfect, it shouldn't burn your house when it fails.

> The best way for me to sum up on feelings on this is that while facts are absolute, any piece of information is either true or false (at least within the realm of normal human interactions)

People from Humanities would like to have a talk with you.

askmike · on March 6, 2017

Because prepending that doesn't solve the problem Google wants to solve: once Google figures out that you typed in a question, they want to give you the answer to your question. Not tell you what other people think the answer is. (I agree that it is broken in a lot of cases now, but prepending that is a very bad workaround).

TuringTest · on March 6, 2017

> once Google figures out that you typed in a question, they want to give you the answer to your question

And that's the problem that the article rightfully denounces. Having the answer to every question is not something that you can automate, not without strong AI.

> Prepending that is a very bad workaround

I strongly disagree; for me it's a very sensible thing to do - and in fact is how we do it in a social context ("I'm totally not making this shit up, I've read it in a scientific article / this morning newspaper / a tweet message"). We rely on the context where the knowledge came from as a proxy for how accurate its content might be.

The default posture of engineers to a problem of inaccuracy is "make it more accurate". That's the logical method when you're building bridges, where we can describe the physics to an extremely accurate degree and where it would fall out or quickly degrade if you make a mistake.

But for problems that rely on inaccurate or uncertain information, and where no exact models exists, the only reasonable way to build them is to degrade gracefully. If you don't handle failures, it will fail in spectacular ways - because it will fail sometimes.

The system should provide an escape hatch, and allow the user to be in control. The user of an automated system should always be able to inspect how the system came to the provided result, and be able to override the given results in those cases where it will get it wrong (at least by some users in a supervising role, which other users could notify to).

Unless a revolution happens, Artificial Intelligence will only ever work in the large if we put humans in the decision loop.

amarant · on March 6, 2017

They do always append a link below their answer to the site where they got it. Personally I find this to be the exact same solution as yours (not literally). The problem is some people don't clicks links anymore. It's a problem people are stupid, but its not googles problem, even if idiots uses google (like everyone else).

TuringTest · on March 6, 2017

> Personally I find this to be the exact same solution as yours (not literally).

It may be the same for you (if you remember to follow the link for every single result that you get from Google, ever), but it's not enough for a mass-targeted product; the way in which the information is laid out and its relative prominence is essential to the way it will be perceived by the majority of the people using the service.

This is not people being stupid, is how people's brain is hardwired to be efficient and only care about information that looks important from the way it's presented - rather than having to make a rational analysis of the relative importance of every information snippet that you find in your way.

An upfront warning "this fact is brought to you by..." will have much more impact than a small, semi-hidden link that you have to remember to press, even if both convey the same information. (Moreover, the link is not available in the voice interface.)

askmike · on March 6, 2017

> not without strong AI.

You might be right if Google wants to answer all questions ever, but they can just limit this feature to situations where they do know what the answer is (you already see this when you type in math, conversions between different units, etc).

> and in fact is how we do it in a social context ("I'm totally not making this shit up, I've read it in a scientific article / this morning newspaper / a tweet message").

That's not how I converse with people around me at all, I only talk about the source if there is some disbelieve about whether it is true.

----

When someone asks me for some information, I will answer and that is the context the person places me in. I won't tell that person "I heard this from Craig, he heard if from Judy and she was told by Bob, etc..". Google wants to become this person / context.

The whole idea is that you don't have to click / verify / think about the potential webpage, because Google has already ranked all webpages that mention your query, and it will only snippet the top one (which is the result Google thinks will most likely answer your question).

Adding the source is:

- extra noise (the user now needs to look at the text & look at who said it and determine whether it might be true or not). - easily spoofable: you either rely on some meta tag for an Author, which websites can spoof. Or on a URL, which will lead to people doing stuff like "nytimes.com.quotefromgoogle.com".

> The system should provide an escape hatch, and allow the user to be in control.

Below the snippet are the 10 links that you can still click, functionality available since the mid 90s.

> The user of an automated system should always be able to inspect how the system came to the provided result

Just click the top link?

TuringTest · on March 6, 2017

> You might be right if Google wants to answer all questions ever, but they can just limit this feature to situations where they do know what the answer is (you already see this when you type in math, conversions between different units, etc).

That argument was valid when they limited answers to simple calculations, coins exhange and the current weather. They are now trying to answer why fire cars are red and whether Obama is part of a conspiracy. For that kind of content, it's essential to be aware where the answer came from.

> That's not how I converse with people around me at all, I only talk about the source if there is some disbelieve about whether it is true. When someone asks me for some information, I will answer and that is the context the person places me in.

When you converse with people, there's a context of what you were talking about and your previous relations. Plus, you trust that the person will filter the sources that might interest you, and that they will apply common sense when selecting the sources and will warn you if they have little confidence in the answer. And you could further inquire them about the sources if the information looks fishy.

Nothing of this applies to Google. The question and answer are devoid of any context in a conversation. Plus it doesn't have common sense, it takes sources from the whole web, and it won't tell you how it selected the sources used to find the reply. The answer being provided by a secret algorithm should make you always disbelieve whether it is true.

> The whole idea is that you don't have to click / verify / think about the potential webpage, because Google has already ranked all webpages that mention your query, and it will only snippet the top one (which is the result Google thinks will most likely answer your question).

Right. And it should be clear that doing it this way is a sure recipe for spectacular failures.

> (the user now needs to look at the text & look at who said it and determine whether it might be true or not)

The user always needs to (be able to) do that. Pretending that they don't is deluding themselves, both the company and the users. Just ask Wikipedia what happens when you build a large scale knowledge repository and you don't follow this principle.

> Below the snippet are the 10 links that you can still click, functionality available since the mid 90s.

> Just click the top link?

You don't have any of those in the conversational interface, which is the default mode and primary interface for which this functionality is being created.

conjectures · on March 6, 2017

However prepending is possible. Figuring out 'the true answer' is in many cases impossible (what is the meaning of life, is x beautiful etc) and in many others merely several orders of magnitude harder than current tech can cope with (is there a polynomial time solution to TSP, why is public discourse so vacuous these days). So if that's what Google is gunning for it's going to be broken for a long time.

return0 · on March 6, 2017

> Not tell you what other people think the answer is.

Actually google and pagerank started like that.

tyingq · on March 6, 2017

I suspect they tested and tweaked what sort of format kept the most number of clicks and page views on Google owned sites.

Prominent attribution wouldn't help in that regard. These kg entries are great for Google. They present the most likely sought out info, so little reason to click through. As a bonus, the vertical space they use pushes down any organic results that might be better answers.

coffeefirst · on March 6, 2017

Well, for example, "According to The Vaccine Research Institution..."

Now I just made up that organization. But how can an algorithm tell if it's a legitimate medical institute that develops vaccines or anti-vaxxer propaganda?

If the algorithm actually had a way to distinguish the two, it could blacklist bullshit just as easily as it could disclose it.

Karunamon · on March 6, 2017

The problem there is that, at the end of the day, it's a fallible human putting stuff in the "bullshit blacklist". It also means that you're evaluating a statement based on its source rather than its merits, which is.. bad.

When you're the #1 source of information on stuff in the world, that's an awful lot of power that can be trivially abused (e.g. "let's just blacklist unfavorable coverage of this lawsuit against us").

TuringTest · on March 6, 2017

> you're evaluating a statement based on its source rather than its merits, which is.. bad.

If you don't know the source, you have no way of knowing its merits, as you won't be aware of how the people came up with that knowledge.

Karunamon · on March 6, 2017

By all means, show the source - just don't blacklist things based on the source. Truth is truth regardless of who speaks it.

findateamfirst · on March 6, 2017

The attribution comes at the end

Ajedi32 · on March 6, 2017

And when you ask Google Assistant, it audibly credits the source up-front, just like the parent comment was suggesting: "According to <site here>... <snippet text>"

tyingq · on March 7, 2017

It uses the site title versus a URL/domain. Which tends to favor shady sources.

And, it's often below the fold.

jacquesm · on March 6, 2017

This is all mostly because google went from search engine indexing other people's stuff to site that you go to to get answers (whether those answers are based on copyright infringement or not is another matter).

A similar thing happens in Google news where stuff from sites like breitbart.com are mixed in with reputable news sources making it look as though they are of a similar degree of quality.

varjag · on March 6, 2017

On Google Now, "news" sites like infowars, globalresearch.ca and on reign supreme, unless you manually filter them out. This hoax about "staged" MH17 crash is on my newsfeed why.. oh right "because you displayed interest in Ukrainian conflict". Fair enough then!

It is scary how much the default app on Android serves to legitimize the fringe.

ino · on March 6, 2017

It's also on Google Home, and having a voice say the lies makes it even scarier:

https://twitter.com/ruskin147/status/838445095410106368/vide...

edit: I saw now this video lower on the article, I thought it ended on "SIGN UP FOR OUR NEWSLETTER"

varjag · on March 6, 2017

Yikes.

Google held onto releasing their Chaffeur because of the "uncanny valley" problem, where they thought the tech would do more harm than good in its "sort of working" stage. Perhaps they should have done the same with this.

bigbugbag · on March 6, 2017

Welcome to your very own filter bubble.

varjag · on March 6, 2017

I'm not sure what to make of your comment. It it some new clothes of middle ground fallacy? Should I give equal consideration to every opinion, no matter how outlandish? Do I have to read from Moon hoax people just because I watched a SpaceX launch or flatearthers since I clicked on NatGeo piece last week?

benmcnelly · on March 6, 2017

Oh look another "the moon landing was real" guy.

^ See what I did there? I made a comment on the Internet, and called into question the legitimacy of your comments.

Anyone can do it! So what we really need is HN to moderate our comments to filter out comments like yours so that we can just read the truth without having to filter out the garbage. I don't want to hear about how you think the the earth is round just because I'm on a NASA thread. One thing for sure is this is HN's fault.

varjag · on March 6, 2017

Edgy a.f., but what any of that wonderful rant has to do with my original point?

benmcnelly · on March 6, 2017

Sorry, I assumed a lot of things there I think. I gathered from what you were saying that you were falling with the popular opinion that Google should filter out "fake news" from its search results (or at least the focused results). So from that perspective, I was trying to draw parallels on how silly it is to demand such a thing.

driverdan · on March 6, 2017

I don't see any of that garbage.

danmaz74 · on March 6, 2017

Problem is, can you algorithmically determine that breitbart.com and the likes can't be trusted, if they get tons of links, mentions, etc. just like the reputable news sources?

jacquesm · on March 6, 2017

Indeed, and that's the heart of the problem. Something like Google news needs more than just algorithms.

danmaz74 · on March 6, 2017

OTOH, that's also a problem, because any editorial choice is politically debatable... I'm afraid there is no easy solution.

pjc50 · on March 6, 2017

I'm afraid there is ultimately no way to not be political.

Having these kind of anti-news sites appear as "news" on Google is not just poisoning their reputation but making them complicit in whatever disaster finally results from it. The worst case scenario is genocide e.g.: https://en.wikipedia.org/wiki/Radio_T%C3%A9l%C3%A9vision_Lib...

raverbashing · on March 6, 2017

Before we get to political leaning there's a small issue of journalistic integrity and the difference between editorials and news articles

Tempest1981 · on March 6, 2017

Maybe all linkers shouldn't be treated as equal. Random FB/Twitter posts reveal interest in the topic, but not accuracy of the article. But I assumed Google was accounting for this already.

CM30 · on March 6, 2017

That's exactly what page rank was about to begin with.

However...

1. Various 'credible' news sources also link to dodgy sites and articles, either when criticising it, posting a story said sites first ran or simply because they've fallen for 'fake news'.

2. A lot of trust signals are easily gamed, like .edu links going for a fortune on SEO forums.

3. At some point, quantity will probably win out over quality.

mythrwy · on March 7, 2017

"breitbart.com are mixed in with reputable news sources"

"reputable" news sources have generally devolved into being the ideological counterpart of breitbart.com so this makes sense.

lloydde · on March 6, 2017

This is brutal. I'm often thankful to the quick answer to measuring or history questions, but even for these easy questions I've seen Google share somewhat incorrect or confusing information as authoritative.

jjeaff · on March 6, 2017

Ya, I really don't see how they can justify displaying other people's content. They did the work, but they don't get the clicks or ad revenue or even a chance to keep the user on the site.

I wonder how Google would like it if someone launched a site that when you search, it doesn't display results from its own database, but perhaps just a selection of the best results from Google.

megablast · on March 6, 2017

It is funny you say that.

I remember when you would search for an answer, and a small answer would be included in the description under the link, so you wouldn't have to click on it.

Then they started to go away, because no one was clicking on them because they didn't need to.

Now you rarely get the answer without clicking on it.

So Google has made the internet worse.

pixl97 · on March 6, 2017

>So Google has made the internet worse.

By not including the answer?

Or, by including the answer and putting the sites that gave reliable answers out of business in the first place?

jazoom · on March 6, 2017

There are sites like that. Startpage.com is an example.

yabatopia · on March 6, 2017

I always thought that services like Startpage or DDG paid for access to search databases, it maybe some form of revenue sharing. It's not that they're stealing the search results from Bing or Google and repackage them.

jazoom · on March 6, 2017

Perhaps you're right.

killwhitey · on March 6, 2017

I suspect most of those history questions are answered via Wikipedia. From what I've seen if the snippet isn't using Wikipedia it's not useful and/or accurate.

lloydde · on March 6, 2017

Now, that you point it out, I think that is the situation. When the answer were not easily verifiable it has been websites I wasn't familiar with.

benmcnelly · on March 6, 2017

Google is a search engine for finding webpages, not facts. Thats where this "fake" news story should end. This is not a political problem, its a societal one, and you are barking up the wrong tree.

Wikipedia is a free online encyclopedia, created and edited by volunteers around the world. It is not authoritative, and anyone who treats it as such, should be educated to the facts of what it is, and is not. Just because its moderated to be supported by links to facts, doesn't mean that every bit of content is free of bias and the whole truth. Its generally accepted that this is the case. Same thing goes for Snopes, and some other fact checking sites. They are generally looked to for a reasonable amount of truthfulness based on reputation. Same could be said for various media outlets, based on your preference and bias.

Social media sites and search engines are not responsible to tailor their content to fit your expectations of what truth or reality are. Stop being a child.

"but, people expect to be able to Google something and results and snippets be the truth!"

Well, thats a problem for sure.

How about we try and fix that expectation instead? Feel free to teach your children and anyone you know who uses the internet, that (surprisingly) anyone from anywhere can still get online and post anything at any time, and you should fact check multiple places and resources instead of trusting the first result from your favorite search engine.

"but by offering snippets of results from programmatically generated search results (which is super handy 99% of the time) Google is publishing false truths!"

Right, and you can probably still Google bomb the image search to show an image you want based on a certain search term, its the internet not your personal fact butler, no matter how its advertised. Its 2017 and common sense and intelligence are still recommended for most tasks.

peeters · on March 6, 2017

> Google is a search engine for finding webpages, not facts.

How can you justify that claim? It seems pretty obvious to me that it's trying to be both. Search for "current time utc" and tell me Google is just a search engine for finding webpages.

ygaf · on March 6, 2017

I admittedly do that - every now and then typing 'time' into google, knowing it goes beyond being a search engine. I also sometimes put in e.g. 'starbucks' and I don't have to qualify the search any further, google knows I want open-close times.

However these snippet boxes, to the computer-savvy at least, are clearly just another automated search result.

benmcnelly · on March 6, 2017

And that is the real issue. I am as to blame as anyone. Whenever a friend or family member needs a computer fixed, or computing thing explained, I have been there to do it for them. We live in a more connected world, full of non-savvy users.

imh · on March 6, 2017

it's not just "current time" you can ask "(12+431)/42" or "how tall is a poodle" or "when does san francisco city hall close today" or "directions to nearest gas station." These are each totally different kinds of facts (compute something for me; answer a general question; tell me something specific and context/time dependent; give me directions to something context dependent). Google is way beyond just providing a list of web pages and can't have it both ways. The general question-answer factual stuff is leaking in to user expectations in the rest of their search business.

benmcnelly · on March 6, 2017

You don't have to convince me that Google has intentionally led people to become more reliant on them, that is their business model, where user is the product and Google becoming as integral to your day to day life as possible is important; as their business model is literally "people use us the most and we know them better so advertise to them through us".

But I think we should have a clear divide between responsibility of the user and the content aggregates. Just like with editorial news and Journalism we of course can differentiate between the two, and there are ethics and integrity issues we should hold news outlets to, but we need to be careful of confusing a search engine with a news outlet just because people use it to find news and facts from various places. That does not make them arbitrators of the truth. The responsibility lies on the end user to use their judgment and wisdom and research to come to informed decisions. And the responsibility lies on the news agency to report factual things. Google just needs to help me find relevant information, and let me decide what is good ethical journalism and what is fictional drivel.

You might think "But doesn't that mean that anyone can scream as loudly as they want and people will hear them and might listen to them?" Well yes it does. And it seems like everyone here has a problem with that, because they think people have terrible common sense. People have been trying to sway public opinion through media since there was written word. But you should criticize the work of the book author, or maybe even the publisher who is constantly publishing crappy books, but not the book store or library who is just helping you find whatever you are looking for no matter who you are or what you like.

I don't think you people know what you are asking for when you want to put it in peoples heads that because someone or something is popular or used by many people it somehow has a responsibility to curate the truth.

fudged71 · on March 6, 2017

Right, google created what they call the "Knowledge Graph", which is explicitly meant for facts.

benmcnelly · on March 6, 2017

They have been pretty clear about their goal to have search respond to you as humanly as possible and use AI not just algorithms to find things for you, and I think they have been clear on where they are getting the data they dig up on things.

ygaf · on March 6, 2017

Thank you. Like me you probably read the article and wondered why blame for the incident instantly bypassed the kid with the phone.

(before someone replies, I know that educating the whole media-consuming public is easier said than done.)

pm24601 · on March 6, 2017

Google is presenting the fake content as factual. Google is asserting the truth and putting its own brand behind the "truthiness" of the fake page.

That is the line that has been crossed.

yummyfajitas · on March 6, 2017

No, they aren't. They are extracting a quote from a source that seems directly related to what you search for. The only difference between this and a regular search result is that it comes in a grey box, and the text is extracted via deep learning rather than pulling the first sentence or some html metatags.

Interpreting a grey box as "truth" is entirely on you.

peeters · on March 6, 2017

The problem is that it uses largely the same treatment as it does for actual facts.

In other words, I don't think there's anything wrong with trusting Google when you search for "25 divided by 5", "33rd president of the US", etc. But when Google extends that response structure to things that, on further research, are not established fact, it shares some blame in spreading false knowledge.

benmcnelly · on March 6, 2017

Thats a stretch and you know it. Your comment shares some blame in spreading false knowledge yet you don't see me clamoring for HN to moderate their comments for truthfulness.

And no its not different, maybe I am not Internet savvy and believe the top rated comment on HN as the most informed opinion and take it as fact instead of researching something for myself.

To be clear, just because Google has some helpful widgets and search helpers built in does not mean it asserts its search results as facts or even the best result. Google is trying to get you the best answers it can, and makes no claim otherwise.

edit: I stand corrected, I had no idea about the Google TOS here that admits they are committed to only returning search results that are factual, and to that end factual to you specifically: https://google.com/things-that-are-not-there/

dellsystem · on March 6, 2017

I came across this issue just now when searching for "ec2 pricing" - the featured snippet links to https://aws.amazon.com/emr/pricing/ instead of https://aws.amazon.com/ec2/pricing/, which looks close enough to the correct URL that I didn't notice it was the wrong page until I realised I couldn't find costs for i3 (which just came out). I'm surprised that no one at Google has fixed it yet; surely there are at least some Google engineers that use AWS for personal projects.

scandox · on March 6, 2017

A fantasy:

The idea takes hold in the collective Google-consciousness that a fact is whatever the majority of their users believe to be a fact. There is a precedent for this, after all. They defined Spam as whatever their users said was Spam.

At first there is some friction between the knowledge of internal Google personnel (especially down in that pesky engineering department) and the new shared reality developing out in Userland. However, once an appropriate terminology is developed (in-facts and ex-facts) even the engineers are satisfied - after all their main concern was being able to draw nice clear lines between things.

Meanwhile in Userland while the boiling point of water is largely unchanged, the death toll in the Biafran blockade has shrunk to approximately 17 persons (with some arguments over whether to count people who were over 80 when it started).

Then one day an Engineer has a neatO idea which will eliminate the need for storing two sets of facts and thus save valuable Petabytes of storage and, more importantly, significant code complexity. He calls this idea Authority. Pitching it to his superiors in the marketing department he explains: "You see opinions are like assholes" - (there's a collective wrinkling of marketing noses) - "everybody's got one!". He goes on to explain that some people have more knowledge and are more scrupulous about accepting new facts than other people. Authority would identify such users and give greater weight to their activity and feedback. A profound silence falls over the room. A voice comes over the meeting room Intercom:

"Are you saying...are you suggesting...that some of our users are better than others?"

"Not better, " says the cowed Engineer. "Just more...um...authoritative. Purely in an informational sense."

"Authoritative." The voice draws the word out as if profoundly contemplating its meaning. "OK. We're going to submit this idea to our Hard AI (Larrey) - the secret one we're afraid to network."

Everyone waits. After a minute the voice says:

"Larrey has a question for you, Engineer. It is this: If you're so smart, how come you ain't rich?"

jim-jim-jim · on March 6, 2017

When my city was hit by a nasty earthquake last year and I was using Google to keep track of aftershocks, announcements, etc. I remember seeing some random weirdo's blog entry about the earthquake being caused by a government superweapon interspersed with all this otherwise valuable information.

This was in the special element at the top, not the normal search results.

saurik · on March 6, 2017

The "why are firetrucks red?" one is really interesting as the page result itself is great, but Google is pulling the wrong snippet of the page to feature as the answer.

stctgion · on March 6, 2017

Or the recent "how long do snails sleep" https://www.reddit.com/r/funny/comments/5woiu0/units_of_meas...

lucb1e · on March 6, 2017

For the lazy: https://snag.gy/b9Zvlw.jpg

And for those outside of the US (like me) who don't get this result even if they try to go to google.com: https://www.google.com/ncr (No Country Redirect)

tbrowbdidnso · on March 6, 2017

You can see from this that google considers text earlier in the page more important.

This page is odd because the false answer is presented first, so google fails to see it. It could tell that the form of the sentence was an answer to the question, just not that it wasn't the right one

raphman · on March 6, 2017

for another example of this, try searching for "speed of USB 3.0"

rtpg · on March 6, 2017

This is a great set of examples of this problem.

Of course, the offered solution is "hire a bunch of people to check the facts", which seems to be underestimating the scale of this issue for humans, and perhaps overestimating the difficulty of classifying credibility.

Considering that Google's bread and butter is general site credibility, having some "truthiness vector" added into the mix doesn't seem impossible?

Someone might say "why does Google decide who is credible", but they already do this through the search results anyways. They just don't seem to be able to differentiate between something matching a search, and something being factually accurate.

tbrowbdidnso · on March 6, 2017

Fake news is just a new version of spam sites. They mostly exist to make money or enforce an agenda.

People figured out that Google's algorithms could tell when things were generally factually false, unless those things were recent news.

This leads me to believe their algorithms largely relied on news as a source of truth. They're going to have to do something about that

tyingq · on March 6, 2017

Some more fun ones.

What foods cure cancer? https://www.google.com/search?q=What+foods+cure+cancer%3F

What spice cures diabetes? https://www.google.com/search?q=What+spice+cures+diabetes%3F

Which presidents are rapists? https://www.google.com/search?q=Which+presidents+are+rapists...

Can carrots cure cancer? https://www.google.com/search?q=Can+carrots+cure+cancer%3F (apparently this one is fixed, try the search below)

Can carrot juice cure cancer? https://www.google.com/search?q=Can%20carrot%20juice%20cure%...

raverbashing · on March 6, 2017

Whenever someone says how humanities are "useless" point them at this example

Whenever there's a measure of something, people will optimize for that measure. But trust is not directly measurable.

But even then Google should have been better than weighing fringe sites the same as Wikipedia as an example

sonthonax · on March 6, 2017

Is Breitbart a fringe site anymore?

raverbashing · on March 6, 2017

They still keep putting out made-up stories so I don't see why not

valuearb · on March 6, 2017

I'd like to read the article but it's so irritating to have it only load a page at a time on my iPad that it's unreadable.

awqrre · on March 6, 2017

Very bad page on Firefox too, if you use the scrollbar (and probably in other ways)...

jim-jim-jim · on March 6, 2017

I have javascript off on my phone, but scrolling through the article feels like it's "on ice" somehow.

Definitely a distracting design.

navs · on March 6, 2017

I have javascript turned off on my iPad and the article is refreshingly quick to load and easy to read. The rest of the site however is another story. They're really abusing those CSS transitions.

chatmasta · on March 6, 2017

Perhaps we need a meta tag for "citations" in news articles. That would give the bots a way to determine how "factual" a news piece is (vs an editorial), but would probably just end up getting gamed like every other meta tag.

And then there's the problem that CNN would be "fake news" because you can't exactly provide a citation for "anonymous high ranking intelligence community officials." But then... maybe that's a good thing.

DaUR · on March 6, 2017

Google wants to avoid editorializing, and be able to throw their hands up and say "not our fault!" due to the plausible deniability of "algorithms".

They can't pull that off anymore.

There needs to be a shift towards a Web-of-Trust-like system, where some sources are recognized as authoritative. Government websites, like the CDC, for example. Big media outlets. Scientific associations (IPCC, APA, etc). Not all information is equal. Even if, for example, government dietary recommendations are outdated in some ways, that should be the authoritative answer Google provides, unless there is an equally-reputable but more accurate source (here, prioritize medical sources over governmental ones).

A high school dropout doesn't know more about academic subjects than a PhD grad. A blog isn't as authoritative as a reputable source. This is easy stuff.

Then the question becomes: "some people don't see reputable sources as reputable". That isn't Google's problem, and it can't be fixed at their level.

tomjen3 · on March 6, 2017

That wouldn't work when somebody leaks something that is true but the cia denies (as an example).

DaUR · on March 6, 2017

There's a difference between "government sources" for hard facts (CDC), facts about the world (e.g. DoJ travel warnings, CIA factbook, DoE energy savings tips), for facts about the government (e.g. for Trump's EO, the suggested snippet would be to the text of the EO), and claims from the government. These are easily distinguished.

tomjen3 · on March 6, 2017

Most, if not all of those, are payed for by lobbyists.

skocznymroczny · on March 6, 2017

In other words, censorship. Would CNN be considered more authoritative than Fox?

ridiculous_fish · on March 6, 2017

Calorie counts are another that google often gets wrong. Try "Calories in corn" or "calories in black beans."

elcapitan · on March 6, 2017

Strange. Would imagine that's something that could be obtained from the wikidata api or a similar resource. Is it usually to high or low? I regularly use those info boxes for quick calories information.

username223 · on March 6, 2017

Calories should be easy: https://ndb.nal.usda.gov/ndb/

nommm-nommm · on March 6, 2017

Calorie information varies wildly between different sites, I have no idea who to believe.

losteverything · on March 6, 2017

<Was President Warren Harding a member of the KKK?

We ask questions now because we can get "an answer" so quickly.

I would have liked the prof to push back on the student and say Why do you want to know?

The answer is yes. Now what?

The answer is no. Now what?

Are you asking for what reason? What will you do with a yes answer and a no answer?

Then I would break into a short lesson in decision making. That decisions are just based on future probabilities. Future probabilities are based partially in the questions you answer.

If the question is academic?purposes only then good... But I believe we ask dumb questions just because we can and get an answer.

Start by not asking questions where the answer is useless.

Aissen · on March 6, 2017

Another one, courtesy of marcan: https://www.google.com/search?q=what%27s+my+user+agent

JorgeGT · on March 6, 2017

Well, sometimes that's my user agent so I guess broken clocks...

jccalhoun · on March 6, 2017

I was going to share this story on facebook but when I pasted in the URL, the headline that came up was "Why does google think Obama is planning a coup d'etat?" (screenshot: http://imgur.com/a/keXTz ) I'm assuming that it is theoutline.com that is feeding facebook this headline. I think twice about sharing a story with such a linkbait headline - especially when the actual site doesn't have that headline.

lostphilosopher · on March 6, 2017

Google offers sources with its claims. I'd like the ability to Pandora style thumbs up / thumbs down these sources so that Google won't keep giving me answers from them. I'm tempted to think that if enough people did that Google could start making generalizations, but that has its own set of pitfalls... (Even if it only generalized for me.) Still, as an individual user of Google I would like the ability to tell it not to give me answers sourced from Weekly World News.

askvictor · on March 6, 2017

Surely part of the problem is that these sorts of stories only exist on fake news/ult-right sites, so that's the only information point that Google has?

matthewmacleod · on March 6, 2017

Google's 'featured snippets' are universally fucking shit.

It's a bit of a rant I guess, but I'm annoyed but the extent to which Google has become less useful as a search engine while trying to do all of this 'extracting knowledge from the web' stuff. Because it's really not doing a great job of the latter.

k_sze · on March 6, 2017

This would be a non-problem if everybody studied ToK and consistently used, you know, their brain.

But, unfortunately …

superasn · on March 6, 2017

Google has developed advanced NLP algorithms, so it should be easy for them to corroborate such data before it gets pushed to featured snippets. On second thought, maybe they're already doing it and sites including authority are just copying off each other.

rspeer · on March 6, 2017

What makes you think that "advanced NLP algorithms" can distinguish true statements from false ones? They're not even close.

You'd have to dramatically simplify the scope of the problem to make it not require artificial general intelligence.

The "Fake News Challenge" [1] is the first attempt I know of to seriously evaluate NLP that could eventually be able to fact-check real-world statements, and the current task is about stance detection, because the goal of fact-checking is not considered attainable yet.

[1] http://www.fakenewschallenge.org/

elcapitan · on March 6, 2017

Not false or true, but it would already help to check if the text is an opinion piece. Lingustic hints for that should be relatively clear, as opposed to for example encyclopaedic style. That way they could both exclude most Breitbart stuff as well as editorials from actual newspapers, which shouldn't make it into quick-info fact boxes either.

superasn · on March 6, 2017

They were building a knowledge graph for this purpose[1]

[1] http://searchengineland.com/demystifying-knowledge-graph-201...

rspeer · on March 6, 2017

I'm pretty sure these horrible snippets come from the Google Knowledge Graph.

davidgerard · on March 8, 2017

Cuil crashed and burned on pretty much this ten years ago. Anyone else remember Cpedia? They eventually had to start taking stuff down due to threats of defamation. (And ran out of money shortly after.)

I mean, at least Google's search is any good, that's something.

return0 · on March 6, 2017

This is the revenge of the news sites i guess. Google can only be trusted by people as long as it keeps giving valid responses. If this continues there will be growing mistrust from its users leading to lower revenue etc etc.

bakhy · on March 6, 2017

people trust Google, which IMO makes them responsible here. IANAL, but if they would for example be caught giving Holocaust-denying answers in Germany, where that's illegal, could they not be sued?

bigbugbag · on March 6, 2017

Google PR is good, because there are not many reason to trust google with anything.

Usually when in that kind of situation google plays the "it's an algorithm that did that not us so we're not liable" also known as we don't read your gmail emails it's an automated algorithmic process so there's no privacy issue here.

transfire · on March 6, 2017

> (It should be noted that Wilson was still a notorious racist.)

See that's the funny thing about fake news, not even this article which seeks to set us all straight on the matter could avoid it -- just because you get your facts from a source you consider reputable doesn't make it so. In fact, Wilson was actually no more "racist" than Abraham Lincoln, but no one would ever call Lincoln a racists.

thomasjudge · on March 6, 2017

The point of Google is supposed to be quality search results. If we're not getting that...

themark · on March 6, 2017

More often than not, Google "Alerts" are from very questionable sites.

wry_discontent · on March 6, 2017

I don't think that's a Monty Python joke.