Court dismisses Genius lawsuit over lyrics-scraping by Google

This is a bizarre take. The most substantial point is buried near the end of the article: Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it. At best, you could claim your copy is a derivative work, but that only grants you protection for your additional creative contributions on top of the original work, which for a straight transcription is... well, nothing.

Genius knows this, which is why they didn't file a copyright suit. Instead, they claimed other things like unfair competition and breach of contract. However, Title 17 Section 301 of the US Code says that "all legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright [...] are governed exclusively by this title". To avoid this, Genius needed to prove that their claims weren't "equivalent" – ie weren't just copyright claims dressed up as something else. They failed to do this, and so their case was thrown out.

laughinghan · 6 years ago

You seem focused on whether this case was the legally correct decision, which it sure seems to be. This article, like many readers, is more focused on whether this was a fair result. Nothing bizarre about that.

The judge may have done the correct thing, but readers may feel that Congress didn't. This case will doubtless be used in the future to argue for sui generis database rights like the EU has.

(My view is that in principle, some form of sui generis database rights makes sense, but for the things that US copyright law already covers it is currently far, FAR too restrictive and lasts too long, so I would vehemently oppose expansion of existing US copyright law to cover sui generis database rights.

However, if US copyright law were reformed such that it mandated blanket licensing (see [the EFF proposal]), strengthened fair use protections, and shortened copyright duration, then I would totally support similar rights for sui generis databases.)

[EFF proposal for blanket licensing]: https://www.eff.org/deeplinks/2020/05/plan-pay-artists-encou...

Deleted Comment

mav3rick · 6 years ago

The law decides fair. It's not subjective.

dastx · 6 years ago

> Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.

Not really all true though. Genius started out by stealing lyrics from other sites. In the early days many of the lyrics had the exact same errors as other more establish sites. That may have changed since.

icebraining · 6 years ago

Yes, they did, because they got a takedown from the actual copyright holders: https://www.billboard.com/articles//5785701/nmpa-targets-unl...

ratww · 6 years ago

You are correct about Genius also scrapping in the beginning.

It's much better now, but that's only because of unpaid volunteer editors who do most of the corrections and annotations in their site.

Polylactic_acid · 6 years ago

Aren't collections of facts copyrightable? So google has copyright over google maps and I can not copy that but I can go out and record exactly the same data since I collected it myself.

laughinghan · 6 years ago

No, in the US you cannot copyright facts, only expression. So you have control over word-for-word copies of your article about a bird; but you have no control over dissemination of the facts you discovered about the bird. SCOTUS decided this in 1991, Feist v. Rural: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....

You might be thinking of sui generis database rights, which DOES cover collections of facts. The EU, Russia, and Brazil recognize this right, but the US doesn't: https://en.wikipedia.org/wiki/Database_right#United_States

nl · 6 years ago

Compilations of facts are copyrightable in the US, but they can't be just raw collections - there has to be a choice made what to include.

> The Act also provides copyright protection to compilations, but only to the extent that there has been a contribution of originality in assembling that compilation.

Map copyright is based on the idea there are decisions made around what to include and how to display it.

You can't photocopy a map and claim copyright. However, a human can trace the same map and claim copyright.

See City of New York v. GeoData Plus, and the discussion in https://wiki.openstreetmap.org/w/images/6/6f/Protection_of_C...

PeterisP · 6 years ago

Regarding compilations of facts, the general doctrine is that copyright would protect the semi-arbitary choices of what to include in that compilation (e.g. judgement of relevance - which words to put in dictionary, what detail to include/exclude in a map) and disallows copying that compilation; but it explicitly does not protect "work and sweat" required to gather that data, and allows people to copy particular facts out of that compilation, for example, if they are making their own selection with different criteria, as the underlying facts are not protected no matter how much effort it took to obtain them.

In this regard, copying lyrics of some particular song does not violate the rights of Genius - they don't have copyright to that particular song and the compilation-of-facts rights don't apply for that particular single item.

Semaphor · 6 years ago

I would guess maps is a different case because those are their own works. There are decisions on design being made, how to show overlays. But that’s just my assumption.

gnicholas · 6 years ago

> Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.

Apparently they license the lyrics now:

> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.

It's not a case of someone copying without permission and then suing another person who copied them. It's a valid licensee suing someone who is copying them.

Imagine if a McDonald's franchisee sued someone running a rogue/unlicensed McDonald's around the corner. Would we have no sympathy for them also?

Legally speaking, it appears the right to sue requires at least some exclusive copyright rights, [1] which Genius surely didn't have (and a McDonald's franchisee also would not have). This is presumably why they didn't bring a copyright suit.

https://www.lexology.com/library/detail.aspx?g=7d4ea127-0fb0...

icebraining · 6 years ago

Google says they also license: https://blog.google/products/search/how-we-help-you-find-lyr...

learnstats2 · 6 years ago

Where I live, there is a database right.

The fact that Genius collated those works is meaningful work in its own right.

Wandfarbe · 6 years ago

I'm not sure what your background is but your analysis sounds quite strong opiniated.

There are examples in law that 'work' can be protected; Just because you don't have the copyright doesn't mean that someone else is just allowed to use your work results.

Apparently in this specific case its not protected.

Regardless of what you think about the lawsuit, you have to give them credit for their watermarking method:

https://imgur.com/IGs0sg7

JMTQp8lwXL · 6 years ago

If I was going to scrape this data and re-purpose it, I would've absolutely cleaned up those apostrophes. The pivoting between straight and curly would certainly be a pet peeve. Unless there's a semantic difference between the two I'm unaware of.

jdub · 6 years ago

The semantic difference would be important in a song like Baby Got Back by Sir Mix-a-lot, which includes both speech quotes and imperial measurements.

a-nikolaev · 6 years ago

Yeah, makes sense, but this is still a pretty good approach. Inserting invisible or unusual Unicode symbols would prompt the scraper to carefully cleanup the read files (maybe even fixing these apostrophes as a result). Unusual whitespace is also likely to be removed and cleaned up. On the other hand, these alternating apostrophes have a chance to stay unnoticed (or neglected), falling through the cracks.

3pt14159 · 6 years ago

There is a semantic difference between the two. The straight quote is a superset of the curly one.

So "rock 'n' roll" is correct. And "rock ’n’ roll" is correct. But "rock ‘n’ roll" is not correct, since the wrong apostrophe is used. We're not quoting the letter n, we're showing that the letter a was removed.

djur · 6 years ago

The vast majority of content consumed by this scraper is never closely inspected by a human, though.

learnstats2 · 6 years ago

On this quantity of data, you wouldn't be able to do this manually.

If you hope to avoid being caught this way, I'm going to assume you noticed this without the benefit of hindsight and plan to correct all out-of-place Unicode characters automatically. How will you avoid over-correcting?

There's also no reason to believe this is the only fingerprinting Genius has done (they only need to publish the most obvious fail). For example, I can use the same fingerprinting technique but switch between American and British spellings.

This is not a straightforward problem.

CydeWeys · 6 years ago

I'm not disputing that they proved their point, but this is triggering one of my pet peeves about common misunderstandings of Morse code.

Timing is critical in Morse code. You can't just write out a bunch of dashes and dots to transcribe it without clearly transcribing the rests between dots and dashes as well. They haven't given us the rests at all, so all the info they end up having is:

dot dash dot dot dash dot dot dot dot dot dot dot dash dash dot dash dot dot dot dash dot dot

And that can be interpreted in any number of different possible ways besides "REDHANDED". E.g. it could also be "AU5EWRFE", or any of thousands of different interpretations (actually probably a lot more than that; this would be a fun programming problem). They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED". Once you include the short rests that are needed, we're talking 44 binary bits or 22 ternary bits. And if you want the long rests to distinguish properly the spaces between words, then 22 ternary bits won't do it; you need the full 44 binary bits.

Arnavion · 6 years ago

>They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED".

The fact that the sequence can be interpreted as REDHANDED with a particular way of grouping the input is just being cute. Regardless of the grouping, it is a binary encoding of a 22-bit number, and so would have a one-in-2^22 chance of being reproduced at random.

Edit: To clarify: You're saying they should've mentioned 22-bits in the context of binary digits without mentioning Morse code, and if they did want to bring up Morse code they should've used trits or more bits to encode the stops. I'm saying that the fact that their 22-bit sequence can be interpreted in Morse code as a relevant word is just dressing, and does not detract from the point that the sequence was likely copied. Put another way, if someone tried to counter by saying their sequence could've been generated independently because "AU5EWRFE" and many other strings also encode to the same sequence, it would not affect the facts at all.

DoubleGlazing · 6 years ago

You'd think Google would be wise to that since they do that themselves.

E.G. When they caught Bing copying them... https://www.wired.com/2011/02/bing-copies-google/

And they definitely do it with maps. There is a tiny little village I visit in rural Roscommon each year. Each year a new major retailer appears to have opened in this 500 population village, well according to Google Maps that is. At the moment there is a branch of New Look situated on a farm down a single track country lane.

shagie · 6 years ago

This is a variation on the trap street https://en.wikipedia.org/wiki/Trap_street

I recall coming across this in my travels. There was a named "town" at the intersection of two streets - upon passing through there, nothing. Later wondering where the town went I found that it was not ever there and was just present to identify people copying that map.

akersten · 6 years ago

This is very clever! Also, take a look at Claim 2 of this patent[1]. Do you think these are similar enough to constitute infringement?

[1]: https://patents.google.com/patent/US9881516B1/en

(Software patents should be abolished. I just like to point out their absurdity and how it's easy to independently develop a technique (steganography in a search engine result) that someone has already grubbed a "patent" on.)

efreak · 6 years ago

This bit at the end is the best part, I think:

> while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. _Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims._

If I'm reading this correctly, this patent is claiming things that are "apparent" (obvious?) to those "in the know". Computer or not, how did this get granted?

jcranmer · 6 years ago

If it's not implemented in PHP, it's not infringing!

The patent claim requires the program to "load available PHP server header information"

dehrmann · 6 years ago

Reminds me of a much older practice for the same reason: https://en.wikipedia.org/wiki/Agloe,_New_York

mikejb · 6 years ago

That's pretty clever! How the tables have turned since hiybbprqag

SilasX · 6 years ago

Wasn’t sure if you were having a seizure there, so for anyone else who was wondering:

https://www.urbandictionary.com/define.php?term=hiybbprqag

https://www.cbsnews.com/news/hiybbprqag-how-google-tripped-u...

teddyh · 6 years ago

pgnttrp!

sien · 6 years ago

Does anyone know if ebooks are watermarked in a similar way?

Polylactic_acid · 6 years ago

They almost certainly are for some sources. And probably with your unique user id to work out who leaked the copy on to torrent sites.

The way to check this is pretty easy though, get 2 users to bit for bit compare their books to work out if they are identical.

efreak · 6 years ago

Absolutely. It usually tends to be either more visible, or less visible though - some pdf files have a literal watermark on the pages, while other formats like epub contain a guid or other watermark content in the source (epub is zipped xhtml)

https://gizmodo.com/harpercollins-is-now-using-digital-water...

https://goodereader.com/blog/e-book-news/everything-you-need...

Amusingly, I believe if you use calibre's ebook conversion to replace stylesheets and add toc, it may also actually remove those markers that have no actual content and only exist to provide a unique ID.

rwmj · 6 years ago

Not ebooks, but interesting technique used to find who is leaking your sensitive documents: https://en.wikipedia.org/wiki/Canary_trap

worker767424 · 6 years ago

Not sure about ebooks, but I assume every workplace email sent to all does this to catch leaks.

lefrenchy · 6 years ago

Can someone explain to me the benefit here? Is it making it less likely for the google scrape to get a search hit?

ehsankia · 6 years ago

There's no "benefit", they were just looking for a nice unique way of watermarking textual content, to prove that what shows up on a Google search is indeed sources from them and not some other transcription of the lyrics.

loa_in_ · 6 years ago

Watermarking, while similar is not the correct word, though the purpose of watermarking and this steganographic embedding is the same.

zozbot234 · 6 years ago

You have to admit, it's genius!

sgentle · 6 years ago

jonas21 · 6 years ago

HJain13 · 6 years ago

> LyricFind. LyricFind is a Google licensing partner, and may be the source of the Genius content appearing in Google’s search results. LyricFind published an explanation on its web site Monday, saying, “Some time ago, Ben Gross from Genius notified LyricFind that they believed they were seeing Genius lyrics in LyricFind’s database. As a courtesy to Genius, our content team was instructed not to consult Genius as a source. Recently, Genius raised the issue again and provided a few examples. All of those examples were also available on many other lyric sites and services, raising the possibility that our team unknowingly sourced Genius lyrics from another location. As a result, LyricFind offered to remove any lyrics Genius felt had originated from them, even though we did not source them from Genius’ site. Genius declined to respond to that offer. Despite that, our team is currently investigating the content in our database and removing any lyrics that seem to have originated from Genius.”

https://searchengineland.com/google-to-add-attribution-to-li...

The dismissal seems logical to me

Cthulhu_ · 6 years ago

Sounds like everyone and their mother is scraping stuff off Genius, not just Google; they went after Google specifically because they knew they couldn't just disappear and they had the financial means to pay for compensation, unlike the thousands of crappy lyrics websites.

That said, it would've been just if Google would pay for access to Genius' particular, well-curated, "source" database of lyrics, especially given that they're basically stealing traffic.

shadowgovt · 6 years ago

But it sounds like the issue is that Google really wasn't using Genius's data directly. The problem is that Google is sourcing from "The Internet," and everybody and their grandmother is 'stealing' from Genius.

Here's an interesting question: if Genius closed up shop tomorrow, how long would it take Google to become the primary source of song lyrics online (by rebuilding Genius's dataset from general Internet harvesting)?

ggggtez · 6 years ago

The same thing came up in the Linked In scraping case. The courts have defended website scraping.

Even if Google scraped it on purpose in order to steal traffic, it would likely be legal.

qppo · 6 years ago

Last I checked, ignorance wasn't usually a defense but I'm not a lawyer. I just know not to pretend the Keurig I bought off the back of a truck was a good deal for everyone involved.

But physical analogies to IP fall apart quickly so I'm not going to encourage people to read into that too deeply

jdm2212 · 6 years ago

So there's two different concepts. One is original creative output, which is copyrightable. The other is information, which is not copyrightable.

If you find something verbatim identical in a bunch of different places, you've got a strong case that it's just information, because if it were original creative output it wouldn't show up identically in multiple places.

If it turns out everyone was plagiarizing a single source, but you were unaware and took down the offending content when asked, you won't have much in the way of legal liability.

Dead Comment

luckylion · 6 years ago

If that flies, it's a great tactic. You can't just use the data from site A, so you build anonymous sites B, C and D who use the data from site A, and use the data from those sites instead. "We didn't source from A".

It's not a great tactic if the owner of site A decides to sue you and subpoenas the hosting providers for B, C and D. You can only get away with it as long as you aren't successful enough to draw the attention of anyone with money to burn on legal fees.

jtxx · 6 years ago

pretty much data laundering

DigitalSea · 6 years ago

Google has a history of scraping content that they want, their business is built on the back of scraping other peoples content. The story I read just recently of what happened to Celebrity Net Worth was an interesting read where Google asked for an API, they refused and Google just scraped the content anyway. There was no lawsuit, but CNW put up fake content and sure enough, it made its way to Google.

It is all ironic given how aggressive Google are in blocking any attempts to scrape its content.

anonytrary · 6 years ago

Probably a silly question, but why not just use robots.txt? That was designed for preventing exactly this.

asutekku · 6 years ago

I’d say most of Genius’ visitors comes from the “song x lyrics” so hiding those with robots would ultimately make them lose almost all of their traffic.

dewey · 6 years ago

Not due to robots.txt but you can see what happens to genius formerly rapgenius when they get removed from the index:

https://techcrunch.com/2013/12/25/google-rap-genius/

robots.txt is designed to keep garbage off search results. It has absolutely no power to prevent a bot to do anything. Also if the site added robots.txt they might as well shut down because their entire userbase comes from people searching lyrics on google.

AgloeDreams · 6 years ago

The problem is that Google is stealing content and placing it on search so the user never goes to the source, By blocking it with robots they block themselves from google results AND Google may already keep scraping the content.

robots.txt isn’t enforced by anything

Avamander · 6 years ago

They also scrape MusicBrainz, but even if they don't index MusicBrainz at least they donate to it

niknetniko · 6 years ago

They have an contract with MusicBrainz. They are listed on https://metabrainz.org/supporters/tiers/4.

> The Unicorn tier is for large companies or companies that would like to have a reciprocal relationship with our foundation. If you need special guarantees, indemnities or require us to sign your contract for a data license, please select this tier. If you have another creative idea you would like to propose, please also select the unicorn tier.

> For any of these cases, please detail your request in the company information field and we will work with you to fit your company's mythical situation. We will also find an appropriate monthly support amount to our non-profit foundation of $1500 or more per month. Please always consider enabling the growth of our non-profit foundation and the continuous growth of our metadata!

smabie · 6 years ago

That's like saying it's ironic that a soldier fights for his life when he tries to kill other people.

It's just the war that is being fought, not some sort of hypocrisy or irony.

harry8 · 6 years ago

Garbage.

We live in a society of laws. Even soldiers. Google have shown they have no respect for the law not equality before it and will cheat while using the law as a cudgel. Recall law exists that the strongest might not always get their way. "Ironic" is the pole way of pointing this out.

Without law, Google cease to exist immediately. They are incapable of enforcing property rights without it.

Pardons aside, soldiers go to jail for taking an attitude like Google's.

dwheeler · 6 years ago

I am not a lawyer, but my understanding is that only the copyright holder can sue for copyright infringement. I am pretty certain Genius does not hold the copyright to those lyrics. It's odd Genius brought this case at all. This is briefly noted at the end of the original article, but it seems like the whole point. Did I miss something?

1vuio0pswjnm7 · 6 years ago

From genius.com: "Genius Media Group, Inc. (GMG) is fully licensed to display lyrics across all of its properties. In 2013, GMG entered into licenses with every major music publisher: Sony/ATV Music Publishing, EMI Music Publishing, Universal Music Publishing Group, and Warner/Chappell Music. In addition, GMG developed a form license with the National Music Publishers' Association (NMPA) which today covers more than 96% of the independent publisher market."

Original copyright holder could give someone else authorisation to sue on their behalf, e.g., through an assignment. Doubtful Genius got an assignment in the agreements they have with publishers.

Also, Google claimed it is sub-licensed to re-publish through a third party, LyricFind, which has licenses with "over 4000" music publishers.

sls · 6 years ago

>Copyright holder could give someone else authorisation to sue on their behalf, e.g., through a license.

They can't assign the bare right to sue. To have standing the plaintiff will need to hold at least one of the exclusive rights in 17 U.S. Code § 106 aiui. Cf Righthaven cases, Silvers v Sony Pictures

luma · 6 years ago

They are going to have a problem with standing for the exact reason you suggest. This case was one company who was scraping other people’s copyrighted works suing another company for doing the same.

SilkRoadie · 6 years ago

Isn’t their main argument unfair competition? Google, the starting point of the internet, decided to undermine their business by taking their collated content and publishing it at the top of results?

Google appears to do this for other things, asking questions often shows answers without needing to visit the website. Perhaps these are all licensed and there is a kick back for these sites...

Google appear to be serving ads on content other people have collated while eliminating the source of traffic to the original site.. If that isn’t unfair business practice and taking advantage of their monopoly on search I don’t know what is.

noncoml · 6 years ago

It should be legal then for someone to run a meta engine on top of Google?

judge2020 · 6 years ago

You very well can, but that doesn't mean Google can't block you (CFAA protection). Genius here was reliant on Google for a large portion of their regular traffic so they couldn't just block Google without suffering revenue losses.

SpaceRaccoon · 6 years ago

Does Google hold the copyright on its search results? Why can't I scrape Google?

tanilama · 6 years ago

Here is the question: Does Google has a right to block you? I believe they do, it is their API afterall.

In Genius's case, does it disallow Google for scraping?

From their robots.txt, I can't tell:

https://genius.com/robots.txt

adventured · 6 years ago

Because they'll block you. You can prevent Google from indexing your content using robots.txt (Google has a robots.txt on its site as well).

You don't have a right to access their service as many times as you want to, eg by automated means, although you can attempt it. Flip a coin on whether they sue to stop you if you become too annoying.

The Genius complaint is essentially that they want to be represented in Google search without having Google take lyrics from their service and use them in their own served-up content snippets (making a sizable part of the value of genius.com void). Genius knows Google can get lyrics elsewhere if they have to, the lawsuit is probably out of spite due to past conflict with Google and their annoyance at Google competing with them in a shady way (Google was de facto using Genius's service to reduce the value of Genius).

bogomipz · 6 years ago

That's correct however the publishing company that administers an artists royalties is generally the one to bring the suit. This is the same type of royalty as sheet music.

I feel like this is a forgotten bit of history but for years Genius didn't pay royalties for reproducing lyrics instead choosing to claim that their own reprinting of lyrics fell under "fair use" guidelines:

>"David Lowery, frontman and songwriter for Cracker and Camper van Beethoven, is waging war on the sites he believes make money off song lyrics but don't pay the songwriter. Once he took a closer look at where his music was making money on the Internet, he realized: There were more people searching to find lyrics to his songs than searching to illegally download mp3s of his music. And he wasn't making money off those searches. Last November, after months of exhaustive and systematic Googling, he released something called The Undesirable Lyric Website List.

>"The National Music Publishers Association seized upon this list, and announced that it would be sending take-down notices to every single name. At the top of that list was the very popular Rap Genius."

>"Rap Genius has been around for a few years, and it's extremely popular. No ads, lots of traffic and, just recently, a major investment from one of the hottest venture capital firms in Silicon Valley. The founder of Rap Genius, Ilan Zechory, says the site doesn't belong on Lowery's list. Because it's way more than just transcribed lyrics. He says the site is more like a social network: a discussion board for music geeks and even some of the musicians themselves — prominent rappers like Nas and Rick Ross — to comment on their own lyrics. Artists, the founders say, love the site."

>"Just this week, Rap Genius announced that, despite its opinion that the site falls under the criteria for fair use, it's going to pay songwriters for posting their lyrics. It's just easier than fighting with music publishers, who've been very successful at going after other lyric sites in the past few years. ..."[1]

[1] "https://www.npr.org/sections/money/2014/05/09/310462951/when...

TAForObvReasons · 6 years ago

Genius claims that Google’s actions caused a decline in traffic to its site. The lawsuit was probably a way to assuage nervous investors (who have poured >70M into the company)

rrdharan · 6 years ago

They're gonna annotate the web though! Any day now...

echelon · 6 years ago

$70M for lyrics not even owned seems absurd.

Still, Google is being very fucking evil here. It's as if they stole that $70M for themselves.

saagarjha · 6 years ago

Surely the investors are nervous now?

earthnail · 6 years ago

I think it's important to point out that when you license lyrics, you don't actually get the lyrics. I know, sounds ridiculous. You'll get the license to display them, and when you ask the rightsholders of these lyrics (the publishers) for the actual lyrics they'll tell you "oh, we don't have the actual text, just the rights. You need to find the text somewhere else."

As a result, creating an accurate lyrics database like Genius has done is an enormous amount of work, and my non-lawyer gut-feeling says that in this case, Google is screwing over Genius big time. Too bad the legal system doesn't support that.

abdulhaq · 6 years ago

It's for this sort of thing that Google had to get rid of their "do no evil" spiel.

kyle_morris_ · 6 years ago

If Google can scrape my site, am I allowed to scrape Google results? Could I create a Google clone by scraping?

If I scraped the most common search results from Google, front page only, and removed all the ads what would Google's argument against that be?

On one hand, so many sites make finding information difficult, on the other it feels pretty scuzzy that Google prevents searchers from clicking through to the site that put the work into generating content.

"If Google can scrape my site, am I allowed to scrape Google results."

You are alllowed. Google would not likely try to sue you. They will try to block you however.

Bing was created by copying Google results. Google did not sue Microsoft, but they did try to expose the copying.

bransonf · 6 years ago

> Bing was created by scraping Google results.

Do you have a source? Just curious about the back story.

gscott · 6 years ago

Google was pretty unhappy with Bing for doing just that.

https://www.wired.com/2011/02/bing-copies-google/

galkk · 6 years ago

The amount of Google captchas that you needeed to solve when searching on Google from Microsoft office made me think that it was some kind of psychological warfare.

bawolff · 6 years ago

Being unhappy and being illegal are two very different things

From the article, Genius lost this case because:

Both Google and Genius are licensing the lyrics. Ironically, Genius ended up having to settle a case years ago because they were using lyrics without the appropriate licensing [1].

[1] https://www.nytimes.com/2014/05/07/business/media/rap-genius...

wombatmobile · 6 years ago

It is scuzzy that Google steps on other sites' air hoses.

Your idea to scrape Google's search results is pithy, ironic counter-innovation at its dastardly best.

All you need to pull this off is funding for a top legal team, and deep reserves of emotional energy.

Go for it!

bynormous · 6 years ago

I think the legal argument services like serpapi make is that as long as you don't create a google account and/or accept google's terms then you are free to scrape and clone what is publicly accessible (at least in the US). I have no idea though.

anonymousab · 6 years ago

I would assume Google respects robots.txt, so you should be fine non-abusively scraping their site insofar as you respect their robots.txt

Google's robots.txt does not tell the full story.

For example, if you include a User-Agent header and put certain strings in it, e.g., "curl/7.47", you will be blocked.

   echo -e 'GET /search?q=robots.txt HTTP/1.1\r\nHost: www.google.com\r\nUser-Agent: curl/7.47\r\nConnection: close\r\n\r\n' |socat -,ignoreeof ssl:www.google.com,verify=0

The problem with the robots.txt "standard", e.g., ones like Google's with no "crawl-delay" directives, is that it does not define what is a "robot". The query above is obviously not a "robot", but Google, with all it resources, still treats as such.

Google probably does more (abusive) scraping than any other entity. Web scraping is in their DNA. It is in their web pages, too.

   curl https://www.google.com/search/static/gs/animal/m05py0.html|grep scrape

tempestn · 6 years ago

Yeah, the tricky thing for those scraped by google is that given google's search monopoly, the sites can't block their scraping entirely, since they need to be shown in search results.

oefrha · 6 years ago

From https://www.google.com/robots.txt:

  User-agent: *
  Disallow: /search
  Allow: /search/about
  Allow: /search/static
  Allow: /search/howsearchworks

volume · 6 years ago

If you succeed then make something that is a proxy for gmail, and then for sheets and docs and chat! google-nextgen.com is not taken... yet.

emptyparadise · 6 years ago

I wonder if it's even possible to fix Google search in the framework of a for-profit company. It seems like the trajectory of any ad-supported service eventually lands it in a "don't let the user out no matter the cost" phase. Perhaps such a service really does need to operate as a non-profit foundation of some sorts.

There was a post about regulating Google like a public utility recently, but perhaps we should also consider looking at other less conventional internet "public utilities" - things like the Internet Archive, Wikipedia or essential open source projects like Debian. I think a search engine that's transparent both in terms of its logic and how it's maintained and managed might be the only way.

dannyw · 6 years ago

The other option would be strong antitrust enforcement; allowing competitors to emerge and compete with incumbents.

How would that work? What should an antitrust order demand Google to do in order to allow competitors to emerge?