I wonder how things would look like if the 'semantic web' (the actual web3?) had taken off and we had regular and rich, machine-readable metadata for just about everything rather than having to rely on what is largely subpar scraping and 'AI' systems.
People have pointed out recently how Google search seems to struggle as sites on the internet turn more and more into apps rather than standardized documents and they just go and search on reddit. Having a standard to encode semantics seems honestly necessary at this point if you want to keep things interoperable.
Google "won" in the first place by being less semantic than their competitors (who used meta keyword tags etc. which made them very vulnerable to spam) and just reading the text of the page (and especially of links to the page) instead.
This is not how I remember Google winning. In those days of search it wasn't spam clogging up search results, it was just plain hard to find things. Google won because you could find things on Google you couldn't find elsewhere.
You could probably recreate exactly that phenomenon today, however complicated by the fact that legitimate good-intentioned creators have to behave in much the same way as spammers.
At that time google had zero profit motive. They were proving product market fit by returning as the top result the page you were most likely looking for. You would have to go back to original pagerank and ignore SEO and ad sales optimization to deliver the same value today.
I highly, highly doubt it, primarily because Google already does this[1]. They support formats defined on schema.org.
Google correctly realized early on that this idea that content creators would correctly tag and structure everything is a software developer's pipe dream. The "semantic web" failed precisely because the real world is much messier than that.
The semantic web came out of a very academic tradition where scammers and spammers weren't lurking around every corner. Yeah, in addition to its other problems, it wasn't especially attuned with the Internet as it actually came to exist.
Fwiw, this at least partially not google’s fault. Just about everybody uses one of a few companies for restaurant menu data, and the biggest one is far and away SinglePlatform. SinglePlatform is still powered largely by scanning menus and having contractors enter them by hand, which is where many of these errors happen.
More forward-looking restaurants manage this all themselves as part of their digital strategy, but it’s still a small percentage and disproportionately located in the US.
I’m not saying Google doesn’t also scrape or use other sources, I’m sure they do, but this is one of those situations where the whole system is broken. Tbh one of the bank shot benefits of having all of these digital delivery services is that some restaurants are using aggregators that can also publish menu data.
As far as the authors idea about markup for menus, that’s great, but highly improbable for a bunch of reasons: most restaurants don’t update their menu frequently, dishes are often difficult to represent structurally, POS systems are often modeled differently than the printed menus, etc.
The most notable restaurant near me, (Garden Restaurant) can't even spell its own name correctly on its expensive professionally produced menus in the restaurant, your chance of scraping good enough menu data from their web site is negligible.
I actually went to the web site just to see, and it's worse than I thought. Even their Western menu, the stuff random Westerners think is "Chinese food" is presented as JPEGs of photographs (sometimes out of focus) of the physical menu, which is itself strewn with typographical errors and mysterious annotations.
So, to get even the bad text an actual patron has in the physical restaurant you need to scrape the site, download the images, and successfully OCR from low resolution out-of-focus photographs. It's not impossible but good luck to you, and at the end the results will still be pretty unsatisfactory. "Frind pok" is actually what they wrote, they meant Fried Pork, but that's not an OCR error it's really what they paid to have printed.
My friend Chris spends a lot more time in their restaurant and is at least as pedantic about this sort of thing as I am, also he knows the people who own it fairly well, so I'm confident he's mentioned it.
However while the restaurant's manager might care, as I understand it her husband is the hard core chef who ensured it's a success, why should he give a shit? Presumably the errors are just in the text for stupid barbarians like me - many of them don't even order from his real menu anyway. His taste is what matters, nobody comes to the restaurant because of the typography or web site design, they come to eat his food.
If my family is any indication, the menu won’t get corrected until the restaurant owner sends their child to American schools and the kid gets old enough to fix it. Just give it 11 or 12 years.
Why? I can understand cutting Chinese restaurants some slack because of a possible language barrier with owners, but what's wrong with ones who are fluent in English? Or have the resources to hire a good agency to put together their menu and proofread it?
Not GP but I have the same red flag system. If it's perfect english (or whatever native language you use) it's more likely that not the restaurant is run by a native, not a real chinese family. So warning flags, most likely not the best the chinese kitchen can deliver :-)
It's a way to find family-owned restaurants. The other ones are "good Asian restaurants have bad service" (though I don't think that's true anymore) and "good restaurants will never be expensive".
The last one is a problem because real family restaurants do want to raise prices/be more upscale too, but none of their customers will let them because they expect banh mi to be $3.
Additionally, I have it in my head that not bothering to fix menus shows a certain admirable pragmatism. "Frind pok" is not correct but it is correct enough.
This is 100% Google's responsability. If you claim to have a feature but it is broken it's your fault, you should just not claim that you can actually do this. The restaurant guys provide exactly what they want: a pdf menu, if google can't parse it correctly it should show the raw information instead of trying to do something fancy
Except that Google isn't doing the parsing, it's a third-party, who is then providing the data to Google (and a bunch of others). Sadly, mixed in with the parsed/PDF-scanned data is accurate data that was hand-edited or auto-uploaded from a restaurant POS or kitchen management system.
So, Google and these other companies, the option is - build it yourself and try to do better, or buy data from the companies that do this at varying degrees of quality, or don't have menu data at all. Except the last option, people _want_ menu data, it's one of the most common things people want to know about a restaurant.
I think it's a strike against our industru thay we failed to define a way to publish standard platform independant menu and price list file that any apply application could parse and represent
Instead eveyone is busy building their own little feudal kingdom and they call it platform even if it's actually just a toll booth
The issue isn’t one of format, it’s distribution. Menu data is usually in hard copy form, and the utility to a restaurant in going to the trouble to duplicate it online and keep it up to date is almost certainly not worth it (in their eyes).
So I’m sure you could sign up yelp and google and Uber eats and everyone else for a common data standard, but you’d then still have to go chasing the restaurants to go put that information in a system somewhere.
We haven’t even really been able to convince businesses to put their opening hours online, it’s still such a problem that one of my interview questions at google in 2018 was “name as many ways as possible you might be able to discover a business’s opening hours online”
Menus are about an order of magnitude more complex than that - it’s a tough thing to get restaurants to do.
My local pharmacy has a website, their hours are avaliable on that website, and the automatic answering machine tells you their hours when you call them
Yet their hours are wrong, and different, on google and bing.
if I own a business there is no singular place I know of where I can publish my opening hours. My laundrette here can't publish the fact that they will be closed due to sickness except by placing a sheet of A4 at the door.
This reminds me of a story my father used to tell us. Once he travelled to Finland (70s) with my grandfather and they ended up in some restaurant. Having no clue what was in the menu, as neither of them spoke Finnish, they decided to go for the cheapest thing on the menu first.
The waiter brought them heated plates.
Turns out that the cheapest thing on the menu was a heated plate service.
While travelling around China and speaking none of the local languages I resorted to either pointing at other peoples food or picking random items on the menu. I had learned how to ask for rice and beer so that worked well with what was usually a bunch of random but very interesting and tasty dishes.
One day while in a border town in the south near Laos, my wife and I were in a suitability weird and humid restaurant with a slow ceiling fan keeping us a bit cool. On the only other occupied table sat a bunch of police with what looked like the local police chief due to the hat on his head but otherwise naked torso. They were just getting drunk so we couldn’t point at food and order. We asked for a menu. I pointed at 6 random things. They gave me a funny look. I then asked for 2 beers in Chinese. They confirmed “two beers?” with an inquisitive look. I confirmed. Then I asked for rice and remembered how to ask for spicy cucumber, a delicious side in china I had come to love. Eyebrows were raised.
Shortly thereafter out came two beers, spicy cucumber, and 6 mocktails in tall sundae glasses with umbrellas and curly straws.
The police table almost died laughing. Good times :)
The biggest problem with Google and restaurant menus IMHO is the fact that the actual restaurant website (if the have one) is often several links below or often off screen to a bunch of review sites and other sites like Google maps with unreliable information.
Google likes to push you into a textual/parsed format of the menu, which is often incomplete and difficult to navigate. Just show me the pictures of the menu, please.
And make it obvious when the menu is from. Too often menus are years old. And with prices going up overnight everywhere pricing has gotten really hard to figure out.
I’ve found that these sorts of annotations are a lot better in the US than in other industrialized countries. In foreign countries, points of interest in Google Maps are often missing basic info like opening hours or even telephone numbers. Often times the actual location is misplaced on the map.
When it comes specifically to restaurant menus in the US, most seem to be manually transcribed by the restaurant staff. The items and prices are correct (but often out of date), and food descriptions faithfully reproduce non-native spelling/grammar mistakes. In addition, I almost always see user-uploaded photos of the menus.
This does not point to a difference in Google’s automatic parsers or in the level of Google-generated content; it seems that US users contribute to map and PoI content a lot more.
I wonder why this is. My guess is that there are far fewer staff members at Google curating crowdsourced content outside of the US, which makes non-American users much less likely to contribute, since their contributions will appear much more slowly, if at all. I’ve contributed my own corrections to PoI data in the US (e.g. opening hours updates) and seen it reflected on the map in a few days. This probably wouldn’t happen elsewhere in the world.
> points of interest in Google Maps are often missing basic info like opening hours or even telephone numbers.
To be fair, sometimes neither of these exist. A place in some parts of the EU may be open whenever the owner feels like it. You can't even approximate opening hours and holidays unless you actually ask the person over the counter when they're actually open.
I imagine this often gets treated with the same disdain I treat Google's emails asking me if my family's structural engineering business is open on Christmas Day. (Funnily enough, no human has ever asked this!)
> it seems that US users contribute to map and PoI content a lot more.
I can't point to an example or cite a source (as this is just a guess), but maybe US users unknowingly contribute via Google Photos doing OCR (and other analysis) and combining it into Maps data, while Google is more careful about running AI against every photo taken by EU users (and using it to help in ways that go beyond exclusively the UX of the photographer) for data privacy compliance reasons?
Outside of curation, I think Maps lacks polish from non-US devs, and that results in weirdly unconvenient maps for a lot of cities, leading to people using it and contributing less to it.
For instance train station mapping (where are the entries/exits) is a feature available in some local map services and is a big quality of life improvement in europe or SEA cities, but never made it to Google Maps.
Same for the lack of multi-story building mapping, where there’s only a single shop for a single address, which can still work out for shopping malls (they have their own site), but is crazy for densely packed neighborhoods. Looking for restaurants in Paris or Tokyo through Google Maps is just frustrating.
If I get public transport directions in Tokyo using Google it advises on exits, best train boarding position, and fares. In London none of these are available. Admittedly, I think the first two are deliberate, but surely London can provide fare data.
Another bias I've noticed is US sites "collapsing" opening hours as if there is no siesta, ie opening hours are just displayed as 10am-11pm.
Even in Germany pretty much any pharmacy, doctor’s offices, and even phone repair shops will close for lunch. It’s not like this only affects a few people.
It's not just Google Maps. Google gives absolutely zero shits about any country that is not the US. Actions for Google doesn't let you make custom intents if it's not in en-US. The Pixel for the longest time released US-only/US-first. Features are always locked and only available for the US.
I worked at Google helping news publishers add metadata to extract live/evolving news coverage for the real time / breaks coverage carousel above the search results. Google will never believe what authors use for meta data. It is just a hint and that will always be the case. There is too much opportunity for deception.
I've worked in this field. In every sector there will be someone trying to game the system one way or another. Someone will publish a menu including items they don't actually have in order to try to rank higher in a search ("oh, we don't have that any more, but since you're here, why don't you try X..."). Someone will publish a menu with lower prices to get someone in the door ("oops, Google must have an old menu of ours"). Someone will publish a menu including something with their competitor's name in hopes of hijacking their searches. A steakhouse will mark a steak as vegetarian in hopes of tricking someone into thinking they have a vegetarian entree.
You'd be lucky if 50% of the restaurant supplied data is accurate, 40% is out of date, and 10% is actively incorrect. Personally I'd guess that the ratio would be more like 20/60/20.
I don’t know, but maybe the metadata prices are inaccurate so that the query “burgers under $8” surfaces that menu. There are just all kinds of reasons to game the system to get your results in the page. Trusting meta data over user visible information just opens a portal to these kind of things where you tell the search engine one thing for SEO and show the user something else. You could police this of course, but it’s much easier to just extract the info from the page since there is more incentive to be accurate for readers.
I'm not really convinced by the "intentionally inaccurate" arguments (if they want to deceive, then surely they could also just serve a fake PDF to Googlebot). But I suppose it's reasonable to assume that restaurants would be less likely to keep their metadata up-to-date than they would their PDF menu. Unintentional inaccuracy, in other words.
The whole web runs on the principle of taking crazy tag soup and extracting as much as you can out of it. I wish XHTML had succeeded but the market has spoken.
Yes that is a problem. And the worst part is because the heuristics are a black box there are no developer controls to fix it yourself. You are the whim of Google’s ability to correctly interpret what it sees. Sometimes its wrong
Even if the restaurant has bothered to create the appropriate machine-readable descriptions, google doesn't bother doing anything with them. Even if the descriptions literally mirror the visual display on the page. I see it all the time, like, on this page (https://www.anthonyspizzabelmar.com/menus/menu), which is easily parsed as valid menu schema through the schema.org validator.
If restaurants were rewarded with actual updated menus on google, you can bet the restaurants would care about creating the micro data, but it's a waste of time.
The restaurant owners don't need to go through the effort. I'm in a consortium of companies that use ML in their business. One of the companies is a competitor to GrubHub. They use ML to: read scanned menus, understand items, look at pictures... then: predict what ingredients items have; populate if the item is one of entree, meal, desert, or snack; populate if the item is one of gluetn-free, vegetarian, or some other similar stuff
All of this without the mom-and-pop restaurant owners lifting a finger. It gives them a competitive edge over their competitors. All of this to say: Google doesn't care - but GrubHub, UberEats, and the ilk do care
Warning: potentially biased opinion. Speaking only for myself, but informed by my job.
There are lots of problems with scraping-based approaches.
One, yes, you need some really good tech to scrape data from menus, which, even though they are “structured”, next time you’re at a sit-down restaurant, pay attention to all the subtle discrepancies in formatting between different sections/categories on the menu.
Two, if the menu isn’t html, but is an image or a pdf upload, now you need some strong OCR on top.
Three, the website is generally not likely to be current with what’s actually on offer in the establishment itself. Specials, seasonal dishes, or items that are out of ingredients (“86’d”) will still appear on the menu. That’s going to lead to complaints, refunds, or generally bad customer experience from whoever’s consuming your data / using it to buy food.
Four, you’re going to want to to be paid for all this tech and customer support you’re electing to intermediate between the end purchaser and the restaurant, as a service, and so you’re going to tack on some fees and either jack the price up on the consumer or try getting the restaurant to pay you a finder’s fee, cutting in to their already narrow margins.
Five, if you’re trying to provide ordering service and not just menu data, you still need to submit the order into the store itself, somehow. Which either means calling it in, robo-submitting an online order (if you’re lucky), or sending a courier to place the order and wait. And then, on the other side, whoever’s taking orders for the restaurant has to punch in the request to the register to actually complete the transaction. Which means the system you really want to talk to isn’t the website, it’s the point-of-sale.
Good luck with all that.
Source of bias: I work for a company that helps restaurants enable online ordering and POS integration so they can pay much less in fees and focus on making exceptional food.
I would be happy if Google and Yelp could even get the opening hours of restaurants correct. With Covid, and even now that things have been re-opening and getting busier, so many places have incorrect hours or are outright closed for good, despite active listings. I've defaulted to at least trying to call first if I'm set on going to a particular place, because inevitably if I don't, the restaurant isn't open or is about to close. Of course then the issue is half the time no one answers the phone anymore at restaurants and they don't bother with voice mail. If they do have voice mail, it's still probably got an announcement about the special meal that they offered for Valentines day or Christmas. Sometimes the only solution is just to go see if they are open.
To some degree the real issue is that each restaurant can change hours (or menus) at a moments notice, and at many places, the staff and management is not super computer savvy. So no one thinks to update these sources of info, and/or they don't know how. Google has the added data (from tracking phones) of how busy the restaurant is at a given time, but that is presumably some sort of moving average over time, and not necessarily current or accurate.
When I first saw the headline for this post, I thought it was going to be about a related issue: even if AI is really good at understanding general spoken/written languages, the names and wording of menus is its own weird thing. If you're then trying to auto-translate that to another language it can be next to impossible. Ethnic restaurants in different places which supposedly speak the same language can have all kinds of spellings and ways of describing the same dish. In the US we call a long sandwich a sub, grinder, hero, or hoagie (to name a few), depending on where you live etc. Or the same name can mean wildly different things.
I see a CAPTCHA opportunity here. Show the line item and ask people to type both the dish name and price in two boxes. Or may be give a couple of options that the end user must confirm before they can proceed.
Yep, anyone remember having to key in house numbers from street view photos? Similar idea. But I think human/bot differentiation no longer demands so much effort of users lately, so this labor pool could be a relic of the past.
It seems to be unable to parse word order, like a "bag of words" model from 2000.
For example, if I search "similar pairs of sentences", it replies with "similarity between two sentences". No, I want a pair of sentences (A1, B1) that is similar to another pair (A2, B2). I am not talking about similarity between sentences, but between pairs. The distinction is lost on Google. And they claim to be using transformer neural nets for search. Pfft!
>In the glorious future, every website will be chock-full of semantic metadata. Restaurants won't have a 50MB PDF explaining the chef's vision for organic cuisine
That is a good idea, especially for visually impaired users - but why can't we have both? I sometimes like seeing the fancy fonts, and images if I have never eaten that dish before, etc.
So actually, it’s either fixed or its a broken clock is right twice a day. I google regularly to get to the page at the beginning of each month and it doesn’t extract based on time of year. It just picks one. But maybe it’s been fixed! :D
> Google has discovered that it takes 90% of the effort to get 90% of the way there - but the last 10% takes the next 90% of the effort.
I can confirm, I am working on information extraction and 90% is the glass ceiling for current models. With much labelled data you can get higher scores but they don't generalise from one vendor to another.
Highly unlikely, as Google would say it was a mistake and point out that they correct details when venues notify them.
On a slight tangent: bear in mind also that Google is not party to whatever agreement there is between venue and customer, which is why it's so foolish that one frequently sees people asking questions on Google Maps as if they are directly communicating with the venue. Eg people ask "Can i get a child's cot in my room" - all it takes is for some joker to say yes, the naive asker to proceed to the venue and then find out that they offer no cots, at which point naive person has no leg to stand on (venue quite rightly says we've no idea what you're talking about).
The articles not strictly right that there's no way for users (ie the public) to contact Google in this regard; Google Maps Local Guides certainly can contact them and anyone can become one of those.
However, putting that aside: Venues/business owners can contact Google and i suspect would get a slightly better response rate (not saying it'll be perfect or even super easy for an owner to get through but they can as there are often links indicating "Are you the owner of this business?")
They could possibly register a trademark, but few restaurants seem to.
I can't imagine this creating much of a barrier - Google would stop if told to by the restaurant but then they might not list the restaurant, and what restaurant would see that as worth it?
Even with a trademark, you can't prevent people from talking about the facts related to it. Despite the crazy reach of IP law, they still can't silence you from uttering "Big Mac(tm) costs 4.99$ here."
Regarding the price discrepancies, why can't they program some kind of outlier check into this type of system? If an average burger costs $12.00 in an area, but the system interprets the price as either $1,200.00 or $0.12, it should be relatively easy to flag that as a likely error.
The Big Mac Index simplifies McDonald's into a perfectly efficient burger producing machine, in all possible countries, and then evaluates purchasing power parity between currencies by normalizing against a Big Mac.
An average burger price in a given city would be different. There the intent would be to assume profit margins are consistent (albeit perhaps not hyper-thin) from city to city, and thus get a sense of the relative costs of burger-related inputs (chiefly: labor, real estate, utilities, taxes, and transportation).
New problem: invert this menu and sort it by the third letter converted to lower-case, filtering out seafood. Present all item descriptions with infix notation.
I'm sorry, no, you must be mistaken about the sheer scale at which google operates. Sanity checks simply don't scale, you see.
By necessity, it'll be "pour it into this big stew of linear algebra" or nothing. Any kind of sensibility is simply impossible when you're as big as google!
That last bit is a bit too real. We deserve better than having a monopolistic force controlling the web but with all the poor products of a rookie shop simply because they're all hand me downs from engineers climbing the internal google ladder.
I've never bothered looking at Google's interpretations of menus on Google Maps. When available, I'm just always going to go to the actual restaurant website and look there instead.
What I have found occasionally useful is the (presumably automatic) categorization of photos into a menu category. Those at least generally let me know whether something's up to date (especially useful the last couple years as restaurant concepts have fluctuated wildly), and are very helpful for places that don't have much online presence - street food stands, pubs, local fast food places, that kind of thing.
I worked at a place that was trying to do this for thousands of restaurants, only they wanted to configure the POS system (all the items, variations, modifications, combos, etc.) using just the PDF menu. We never got very far with automating it, but part-time high school employees were usually pretty decent at it.
Menu layouts are all over the place, so using OCR was trivial, but trying to figure out what went together and what modifications were valid on which items was a complete mess. Shrimp cocktail w/ cheese, anyone?
The best part of these automations is you can end up with hilarious orders - like the McDonald’s app will let you order a cheeseburger with no cheese, no bun, no meat, no onion, no pickle, and no sauces. I suppose they should give you an empty wrapper.
Actually, "plain paper" could easily be a "plain poppadum", since the Hindi word is pronounced papaD, with a deep D that is close to an R (https://en.wikipedia.org/wiki/Papadam#Spelling_and_pronuncia...). I've seen this mistake before on restaurant menus, so it may not be Google AI at fault.
FWIW there is a nice solution to this, schema.org. I do not know if there's a restaurant menu schema yet, likely there is. In the great jungle of the Internet often you end up with both: a) attempts to process/parse the content as the user sees it, as it is in HTML or some other format, which get b) overridden when the site owner starts to publish the schematized data.
The above applies to all web walking robots with non trivial capabilities.
Seems people rediscover every few years that despite how awesome the AI is now, its still just the same statistics under the hood.
This is not at all to detract from the hard work and accomplishments made... its just still fairly easy to confuse even the most advanced AI. I guess its impressive that its only easy if you know how they work though, might take the average person awhile to find a way to trip them up.
They can't because grokking restaurant menus is a moving target. Restaurants are constantly looking for new ways to describe new things and new ways to describe slightly updated things.
To me it points out that they could put additional filters (something to flag entries for the £2000 meal, or USD being used in the UK), or greater manual review, maybe when an algorithm detects such absurdities.
Fortunately, misreading a menu won’t drive you into a tree. A friend of mine told me that he can’t use his un-named supposedly intelligent car on the side roads near his home because it reads the house numbers and thinks they’re speed limits. It’s okay going up the long hill to his place when the numbers go down, but going down hill the last number before a down hill sharp curve is house number 65!!! The many joys of our modern AS (as in AI, but not so much) world.
People have pointed out recently how Google search seems to struggle as sites on the internet turn more and more into apps rather than standardized documents and they just go and search on reddit. Having a standard to encode semantics seems honestly necessary at this point if you want to keep things interoperable.