Readit News logoReadit News
Barrin92 · 4 years ago
I wonder how things would look like if the 'semantic web' (the actual web3?) had taken off and we had regular and rich, machine-readable metadata for just about everything rather than having to rely on what is largely subpar scraping and 'AI' systems.

People have pointed out recently how Google search seems to struggle as sites on the internet turn more and more into apps rather than standardized documents and they just go and search on reddit. Having a standard to encode semantics seems honestly necessary at this point if you want to keep things interoperable.

lmm · 4 years ago
Google "won" in the first place by being less semantic than their competitors (who used meta keyword tags etc. which made them very vulnerable to spam) and just reading the text of the page (and especially of links to the page) instead.
colechristensen · 4 years ago
This is not how I remember Google winning. In those days of search it wasn't spam clogging up search results, it was just plain hard to find things. Google won because you could find things on Google you couldn't find elsewhere.

You could probably recreate exactly that phenomenon today, however complicated by the fact that legitimate good-intentioned creators have to behave in much the same way as spammers.

hn_throwaway_99 · 4 years ago
I highly, highly doubt it, primarily because Google already does this[1]. They support formats defined on schema.org.

Google correctly realized early on that this idea that content creators would correctly tag and structure everything is a software developer's pipe dream. The "semantic web" failed precisely because the real world is much messier than that.

1. https://developers.google.com/search/docs/advanced/structure...

warkdarrior · 4 years ago
I think Google Search struggles because of SEO/spammed, not webapps. Not sure why SEO will go away in the semantic web.
warning26 · 4 years ago
If anything, the semantic web would probably have been easier to do SEO spamming with.
IanCal · 4 years ago
Realistically the metadata would regularly be at odds with the visible data.

> Having a standard to encode semantics seems honestly necessary at this point if you want to keep things interoperable.

The trick is with the incentive structure.

Metadata is very common on academic paper pages, because Google scholar doesn't index them if they don't have it.

However it's also commonly just wrong.

Dead Comment

mattzito · 4 years ago
Fwiw, this at least partially not google’s fault. Just about everybody uses one of a few companies for restaurant menu data, and the biggest one is far and away SinglePlatform. SinglePlatform is still powered largely by scanning menus and having contractors enter them by hand, which is where many of these errors happen.

More forward-looking restaurants manage this all themselves as part of their digital strategy, but it’s still a small percentage and disproportionately located in the US.

I’m not saying Google doesn’t also scrape or use other sources, I’m sure they do, but this is one of those situations where the whole system is broken. Tbh one of the bank shot benefits of having all of these digital delivery services is that some restaurants are using aggregators that can also publish menu data.

As far as the authors idea about markup for menus, that’s great, but highly improbable for a bunch of reasons: most restaurants don’t update their menu frequently, dishes are often difficult to represent structurally, POS systems are often modeled differently than the printed menus, etc.

tialaramex · 4 years ago
The most notable restaurant near me, (Garden Restaurant) can't even spell its own name correctly on its expensive professionally produced menus in the restaurant, your chance of scraping good enough menu data from their web site is negligible.

I actually went to the web site just to see, and it's worse than I thought. Even their Western menu, the stuff random Westerners think is "Chinese food" is presented as JPEGs of photographs (sometimes out of focus) of the physical menu, which is itself strewn with typographical errors and mysterious annotations.

So, to get even the bad text an actual patron has in the physical restaurant you need to scrape the site, download the images, and successfully OCR from low resolution out-of-focus photographs. It's not impossible but good luck to you, and at the end the results will still be pretty unsatisfactory. "Frind pok" is actually what they wrote, they meant Fried Pork, but that's not an OCR error it's really what they paid to have printed.

passivate · 4 years ago
Since its near you, have you tried going there and explaining it to the manager?
onurcel · 4 years ago
This is 100% Google's responsability. If you claim to have a feature but it is broken it's your fault, you should just not claim that you can actually do this. The restaurant guys provide exactly what they want: a pdf menu, if google can't parse it correctly it should show the raw information instead of trying to do something fancy
mattzito · 4 years ago
Except that Google isn't doing the parsing, it's a third-party, who is then providing the data to Google (and a bunch of others). Sadly, mixed in with the parsed/PDF-scanned data is accurate data that was hand-edited or auto-uploaded from a restaurant POS or kitchen management system.

So, Google and these other companies, the option is - build it yourself and try to do better, or buy data from the companies that do this at varying degrees of quality, or don't have menu data at all. Except the last option, people _want_ menu data, it's one of the most common things people want to know about a restaurant.

ClumsyPilot · 4 years ago
I think it's a strike against our industru thay we failed to define a way to publish standard platform independant menu and price list file that any apply application could parse and represent

Instead eveyone is busy building their own little feudal kingdom and they call it platform even if it's actually just a toll booth

mattzito · 4 years ago
The issue isn’t one of format, it’s distribution. Menu data is usually in hard copy form, and the utility to a restaurant in going to the trouble to duplicate it online and keep it up to date is almost certainly not worth it (in their eyes).

So I’m sure you could sign up yelp and google and Uber eats and everyone else for a common data standard, but you’d then still have to go chasing the restaurants to go put that information in a system somewhere.

We haven’t even really been able to convince businesses to put their opening hours online, it’s still such a problem that one of my interview questions at google in 2018 was “name as many ways as possible you might be able to discover a business’s opening hours online”

Menus are about an order of magnitude more complex than that - it’s a tough thing to get restaurants to do.

sgustard · 4 years ago
There is a standard: https://schema.org/Menu

"A structured representation of food or drink items available from a FoodEstablishment."

Clearly the issue is that no one has convinced FoodEstablishments of the business case to spend tech dollars publishing their menus.

mellavora · 4 years ago
maybe menus and price lists have edge cases?
lemursage · 4 years ago
This reminds me of a story my father used to tell us. Once he travelled to Finland (70s) with my grandfather and they ended up in some restaurant. Having no clue what was in the menu, as neither of them spoke Finnish, they decided to go for the cheapest thing on the menu first.

The waiter brought them heated plates.

Turns out that the cheapest thing on the menu was a heated plate service.

jahnu · 4 years ago
While travelling around China and speaking none of the local languages I resorted to either pointing at other peoples food or picking random items on the menu. I had learned how to ask for rice and beer so that worked well with what was usually a bunch of random but very interesting and tasty dishes.

One day while in a border town in the south near Laos, my wife and I were in a suitability weird and humid restaurant with a slow ceiling fan keeping us a bit cool. On the only other occupied table sat a bunch of police with what looked like the local police chief due to the hat on his head but otherwise naked torso. They were just getting drunk so we couldn’t point at food and order. We asked for a menu. I pointed at 6 random things. They gave me a funny look. I then asked for 2 beers in Chinese. They confirmed “two beers?” with an inquisitive look. I confirmed. Then I asked for rice and remembered how to ask for spicy cucumber, a delicious side in china I had come to love. Eyebrows were raised.

Shortly thereafter out came two beers, spicy cucumber, and 6 mocktails in tall sundae glasses with umbrellas and curly straws.

The police table almost died laughing. Good times :)

moss2 · 4 years ago
Thank you for sharing this, I had a good laugh
idk1 · 4 years ago
I did a similar thing, I assumed a cheaper item was the side, and ordered four Dosa as a side in India. Needless to say breakfast was taken care of.
faddypaddy34 · 4 years ago
The biggest problem with Google and restaurant menus IMHO is the fact that the actual restaurant website (if the have one) is often several links below or often off screen to a bunch of review sites and other sites like Google maps with unreliable information.
jthrowsitaway · 4 years ago
Google likes to push you into a textual/parsed format of the menu, which is often incomplete and difficult to navigate. Just show me the pictures of the menu, please.
dawnerd · 4 years ago
And make it obvious when the menu is from. Too often menus are years old. And with prices going up overnight everywhere pricing has gotten really hard to figure out.
MontyCarloHall · 4 years ago
I’ve found that these sorts of annotations are a lot better in the US than in other industrialized countries. In foreign countries, points of interest in Google Maps are often missing basic info like opening hours or even telephone numbers. Often times the actual location is misplaced on the map.

When it comes specifically to restaurant menus in the US, most seem to be manually transcribed by the restaurant staff. The items and prices are correct (but often out of date), and food descriptions faithfully reproduce non-native spelling/grammar mistakes. In addition, I almost always see user-uploaded photos of the menus.

This does not point to a difference in Google’s automatic parsers or in the level of Google-generated content; it seems that US users contribute to map and PoI content a lot more.

I wonder why this is. My guess is that there are far fewer staff members at Google curating crowdsourced content outside of the US, which makes non-American users much less likely to contribute, since their contributions will appear much more slowly, if at all. I’ve contributed my own corrections to PoI data in the US (e.g. opening hours updates) and seen it reflected on the map in a few days. This probably wouldn’t happen elsewhere in the world.

yeputons · 4 years ago
> points of interest in Google Maps are often missing basic info like opening hours or even telephone numbers.

To be fair, sometimes neither of these exist. A place in some parts of the EU may be open whenever the owner feels like it. You can't even approximate opening hours and holidays unless you actually ask the person over the counter when they're actually open.

elijaht · 4 years ago
Tangential but Google will call restaurants with their automated assistant to verify hours
hunter2_ · 4 years ago
> it seems that US users contribute to map and PoI content a lot more.

I can't point to an example or cite a source (as this is just a guess), but maybe US users unknowingly contribute via Google Photos doing OCR (and other analysis) and combining it into Maps data, while Google is more careful about running AI against every photo taken by EU users (and using it to help in ways that go beyond exclusively the UX of the photographer) for data privacy compliance reasons?

Deleted Comment

ramchip · 4 years ago
In Japan Tabelog is a lot more popular than Google Maps for restaurant reviews, photos, etc. I think it caters better to the local market.
makeitdouble · 4 years ago
Your guess is probably right.

Outside of curation, I think Maps lacks polish from non-US devs, and that results in weirdly unconvenient maps for a lot of cities, leading to people using it and contributing less to it.

For instance train station mapping (where are the entries/exits) is a feature available in some local map services and is a big quality of life improvement in europe or SEA cities, but never made it to Google Maps.

Same for the lack of multi-story building mapping, where there’s only a single shop for a single address, which can still work out for shopping malls (they have their own site), but is crazy for densely packed neighborhoods. Looking for restaurants in Paris or Tokyo through Google Maps is just frustrating.

lozenge · 4 years ago
If I get public transport directions in Tokyo using Google it advises on exits, best train boarding position, and fares. In London none of these are available. Admittedly, I think the first two are deliberate, but surely London can provide fare data.

Another bias I've noticed is US sites "collapsing" opening hours as if there is no siesta, ie opening hours are just displayed as 10am-11pm.

ohgodplsno · 4 years ago
It's not just Google Maps. Google gives absolutely zero shits about any country that is not the US. Actions for Google doesn't let you make custom intents if it's not in en-US. The Pixel for the longest time released US-only/US-first. Features are always locked and only available for the US.
easrng · 4 years ago
There are also features they release India-first.
enos_feedler · 4 years ago
I worked at Google helping news publishers add metadata to extract live/evolving news coverage for the real time / breaks coverage carousel above the search results. Google will never believe what authors use for meta data. It is just a hint and that will always be the case. There is too much opportunity for deception.
edent · 4 years ago
I can understand that problem for some sectors. But what's the advantage to a restaurant in publishing an inaccurate menu (fake foods!)?
splonk · 4 years ago
I've worked in this field. In every sector there will be someone trying to game the system one way or another. Someone will publish a menu including items they don't actually have in order to try to rank higher in a search ("oh, we don't have that any more, but since you're here, why don't you try X..."). Someone will publish a menu with lower prices to get someone in the door ("oops, Google must have an old menu of ours"). Someone will publish a menu including something with their competitor's name in hopes of hijacking their searches. A steakhouse will mark a steak as vegetarian in hopes of tricking someone into thinking they have a vegetarian entree.

You'd be lucky if 50% of the restaurant supplied data is accurate, 40% is out of date, and 10% is actively incorrect. Personally I'd guess that the ratio would be more like 20/60/20.

enos_feedler · 4 years ago
I don’t know, but maybe the metadata prices are inaccurate so that the query “burgers under $8” surfaces that menu. There are just all kinds of reasons to game the system to get your results in the page. Trusting meta data over user visible information just opens a portal to these kind of things where you tell the search engine one thing for SEO and show the user something else. You could police this of course, but it’s much easier to just extract the info from the page since there is more incentive to be accurate for readers.
wzdd · 4 years ago
I'm not really convinced by the "intentionally inaccurate" arguments (if they want to deceive, then surely they could also just serve a fake PDF to Googlebot). But I suppose it's reasonable to assume that restaurants would be less likely to keep their metadata up-to-date than they would their PDF menu. Unintentional inaccuracy, in other words.
madisp · 4 years ago
perhaps instead of nefarious usecases it could just be out of date
vander_elst · 4 years ago
Maybe fake prices... It could help lure customers in.
rightbyte · 4 years ago
> Google will never believe what authors use for meta data.

Then why trust the site at all if it fakes metadata?

This fuzzy we-know-better algorithms has wrecked Google search.

lmm · 4 years ago
The whole web runs on the principle of taking crazy tag soup and extracting as much as you can out of it. I wish XHTML had succeeded but the market has spoken.
mitchdoogle · 4 years ago
Chances are the restaurant doesn't want to mislead people looking at their website. After all, that will just lead to disappointment and lost revenue.
marban · 4 years ago
Google News is a strict whitelist AFAIK so what's the problem?
RicoElectrico · 4 years ago
Sorry, but Google is so stupid at times that it mistakes date-like strings in URLs for publication dates. Heuristics my ass.
enos_feedler · 4 years ago
Yes that is a problem. And the worst part is because the heuristics are a black box there are no developer controls to fix it yourself. You are the whim of Google’s ability to correctly interpret what it sees. Sometimes its wrong
bigwheeler · 4 years ago
Even if the restaurant has bothered to create the appropriate machine-readable descriptions, google doesn't bother doing anything with them. Even if the descriptions literally mirror the visual display on the page. I see it all the time, like, on this page (https://www.anthonyspizzabelmar.com/menus/menu), which is easily parsed as valid menu schema through the schema.org validator.

If restaurants were rewarded with actual updated menus on google, you can bet the restaurants would care about creating the micro data, but it's a waste of time.

geoduck14 · 4 years ago
The restaurant owners don't need to go through the effort. I'm in a consortium of companies that use ML in their business. One of the companies is a competitor to GrubHub. They use ML to: read scanned menus, understand items, look at pictures... then: predict what ingredients items have; populate if the item is one of entree, meal, desert, or snack; populate if the item is one of gluetn-free, vegetarian, or some other similar stuff

All of this without the mom-and-pop restaurant owners lifting a finger. It gives them a competitive edge over their competitors. All of this to say: Google doesn't care - but GrubHub, UberEats, and the ilk do care

mattcwilson · 4 years ago
Warning: potentially biased opinion. Speaking only for myself, but informed by my job.

There are lots of problems with scraping-based approaches.

One, yes, you need some really good tech to scrape data from menus, which, even though they are “structured”, next time you’re at a sit-down restaurant, pay attention to all the subtle discrepancies in formatting between different sections/categories on the menu.

Two, if the menu isn’t html, but is an image or a pdf upload, now you need some strong OCR on top.

Three, the website is generally not likely to be current with what’s actually on offer in the establishment itself. Specials, seasonal dishes, or items that are out of ingredients (“86’d”) will still appear on the menu. That’s going to lead to complaints, refunds, or generally bad customer experience from whoever’s consuming your data / using it to buy food.

Four, you’re going to want to to be paid for all this tech and customer support you’re electing to intermediate between the end purchaser and the restaurant, as a service, and so you’re going to tack on some fees and either jack the price up on the consumer or try getting the restaurant to pay you a finder’s fee, cutting in to their already narrow margins.

Five, if you’re trying to provide ordering service and not just menu data, you still need to submit the order into the store itself, somehow. Which either means calling it in, robo-submitting an online order (if you’re lucky), or sending a courier to place the order and wait. And then, on the other side, whoever’s taking orders for the restaurant has to punch in the request to the register to actually complete the transaction. Which means the system you really want to talk to isn’t the website, it’s the point-of-sale.

Good luck with all that.

Source of bias: I work for a company that helps restaurants enable online ordering and POS integration so they can pay much less in fees and focus on making exceptional food.

s1mon · 4 years ago
I would be happy if Google and Yelp could even get the opening hours of restaurants correct. With Covid, and even now that things have been re-opening and getting busier, so many places have incorrect hours or are outright closed for good, despite active listings. I've defaulted to at least trying to call first if I'm set on going to a particular place, because inevitably if I don't, the restaurant isn't open or is about to close. Of course then the issue is half the time no one answers the phone anymore at restaurants and they don't bother with voice mail. If they do have voice mail, it's still probably got an announcement about the special meal that they offered for Valentines day or Christmas. Sometimes the only solution is just to go see if they are open.

To some degree the real issue is that each restaurant can change hours (or menus) at a moments notice, and at many places, the staff and management is not super computer savvy. So no one thinks to update these sources of info, and/or they don't know how. Google has the added data (from tracking phones) of how busy the restaurant is at a given time, but that is presumably some sort of moving average over time, and not necessarily current or accurate.

When I first saw the headline for this post, I thought it was going to be about a related issue: even if AI is really good at understanding general spoken/written languages, the names and wording of menus is its own weird thing. If you're then trying to auto-translate that to another language it can be next to impossible. Ethnic restaurants in different places which supposedly speak the same language can have all kinds of spellings and ways of describing the same dish. In the US we call a long sandwich a sub, grinder, hero, or hoagie (to name a few), depending on where you live etc. Or the same name can mean wildly different things.