This is a bizarre take. The most substantial point is buried near the end of the article: Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it. At best, you could claim your copy is a derivative work, but that only grants you protection for your additional creative contributions on top of the original work, which for a straight transcription is... well, nothing.
Genius knows this, which is why they didn't file a copyright suit. Instead, they claimed other things like unfair competition and breach of contract. However, Title 17 Section 301 of the US Code says that "all legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright [...] are governed exclusively by this title". To avoid this, Genius needed to prove that their claims weren't "equivalent" – ie weren't just copyright claims dressed up as something else. They failed to do this, and so their case was thrown out.
You seem focused on whether this case was the legally correct decision, which it sure seems to be. This article, like many readers, is more focused on whether this was a fair result. Nothing bizarre about that.
The judge may have done the correct thing, but readers may feel that Congress didn't. This case will doubtless be used in the future to argue for sui generis database rights like the EU has.
(My view is that in principle, some form of sui generis database rights makes sense, but for the things that US copyright law already covers it is currently far, FAR too restrictive and lasts too long, so I would vehemently oppose expansion of existing US copyright law to cover sui generis database rights.
However, if US copyright law were reformed such that it mandated blanket licensing (see [the EFF proposal]), strengthened fair use protections, and shortened copyright duration, then I would totally support similar rights for sui generis databases.)
> Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.
Not really all true though. Genius started out by stealing lyrics from other sites. In the early days many of the lyrics had the exact same errors as other more establish sites. That may have changed since.
Aren't collections of facts copyrightable? So google has copyright over google maps and I can not copy that but I can go out and record exactly the same data since I collected it myself.
No, in the US you cannot copyright facts, only expression. So you have control over word-for-word copies of your article about a bird; but you have no control over dissemination of the facts you discovered about the bird. SCOTUS decided this in 1991, Feist v. Rural: https://en.wikipedia.org/wiki/Feist_Publications,_Inc.,_v._R....
Compilations of facts are copyrightable in the US, but they can't be just raw collections - there has to be a choice made what to include.
> The Act also provides copyright protection to compilations, but only to the extent that there has been a contribution of originality in assembling that compilation.
Map copyright is based on the idea there are decisions made around what to include and how to display it.
You can't photocopy a map and claim copyright. However, a human can trace the same map and claim copyright.
Regarding compilations of facts, the general doctrine is that copyright would protect the semi-arbitary choices of what to include in that compilation (e.g. judgement of relevance - which words to put in dictionary, what detail to include/exclude in a map) and disallows copying that compilation; but it explicitly does not protect "work and sweat" required to gather that data, and allows people to copy particular facts out of that compilation, for example, if they are making their own selection with different criteria, as the underlying facts are not protected no matter how much effort it took to obtain them.
In this regard, copying lyrics of some particular song does not violate the rights of Genius - they don't have copyright to that particular song and the compilation-of-facts rights don't apply for that particular single item.
I would guess maps is a different case because those are their own works. There are decisions on design being made, how to show overlays. But that’s just my assumption.
> Genius does not own the copyright to the lyrics. Yes, they may have taken billions of painstaking person-years to hand-transcribe them onto artisanal silks, but no matter how much effort you put into copying someone else's work, it doesn't make it your work, and you can't sue people over it.
Apparently they license the lyrics now:
> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.
It's not a case of someone copying without permission and then suing another person who copied them. It's a valid licensee suing someone who is copying them.
Imagine if a McDonald's franchisee sued someone running a rogue/unlicensed McDonald's around the corner. Would we have no sympathy for them also?
Legally speaking, it appears the right to sue requires at least some exclusive copyright rights, [1] which Genius surely didn't have (and a McDonald's franchisee also would not have). This is presumably why they didn't bring a copyright suit.
I'm not sure what your background is but your analysis sounds quite strong opiniated.
There are examples in law that 'work' can be protected; Just because you don't have the copyright doesn't mean that someone else is just allowed to use your work results.
Apparently in this specific case its not protected.
If I was going to scrape this data and re-purpose it, I would've absolutely cleaned up those apostrophes. The pivoting between straight and curly would certainly be a pet peeve. Unless there's a semantic difference between the two I'm unaware of.
Yeah, makes sense, but this is still a pretty good approach. Inserting invisible or unusual Unicode symbols would prompt the scraper to carefully cleanup the read files (maybe even fixing these apostrophes as a result). Unusual whitespace is also likely to be removed and cleaned up.
On the other hand, these alternating apostrophes have a chance to stay unnoticed (or neglected), falling through the cracks.
There is a semantic difference between the two. The straight quote is a superset of the curly one.
So "rock 'n' roll" is correct. And "rock ’n’ roll" is correct. But "rock ‘n’ roll" is not correct, since the wrong apostrophe is used. We're not quoting the letter n, we're showing that the letter a was removed.
On this quantity of data, you wouldn't be able to do this manually.
If you hope to avoid being caught this way, I'm going to assume you noticed this without the benefit of hindsight and plan to correct all out-of-place Unicode characters automatically. How will you avoid over-correcting?
There's also no reason to believe this is the only fingerprinting Genius has done (they only need to publish the most obvious fail). For example, I can use the same fingerprinting technique but switch between American and British spellings.
I'm not disputing that they proved their point, but this is triggering one of my pet peeves about common misunderstandings of Morse code.
Timing is critical in Morse code. You can't just write out a bunch of dashes and dots to transcribe it without clearly transcribing the rests between dots and dashes as well. They haven't given us the rests at all, so all the info they end up having is:
And that can be interpreted in any number of different possible ways besides "REDHANDED". E.g. it could also be "AU5EWRFE", or any of thousands of different interpretations (actually probably a lot more than that; this would be a fun programming problem). They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED". Once you include the short rests that are needed, we're talking 44 binary bits or 22 ternary bits. And if you want the long rests to distinguish properly the spaces between words, then 22 ternary bits won't do it; you need the full 44 binary bits.
>They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED".
The fact that the sequence can be interpreted as REDHANDED with a particular way of grouping the input is just being cute. Regardless of the grouping, it is a binary encoding of a 22-bit number, and so would have a one-in-2^22 chance of being reproduced at random.
Edit: To clarify: You're saying they should've mentioned 22-bits in the context of binary digits without mentioning Morse code, and if they did want to bring up Morse code they should've used trits or more bits to encode the stops. I'm saying that the fact that their 22-bit sequence can be interpreted in Morse code as a relevant word is just dressing, and does not detract from the point that the sequence was likely copied. Put another way, if someone tried to counter by saying their sequence could've been generated independently because "AU5EWRFE" and many other strings also encode to the same sequence, it would not affect the facts at all.
And they definitely do it with maps. There is a tiny little village I visit in rural Roscommon each year. Each year a new major retailer appears to have opened in this 500 population village, well according to Google Maps that is. At the moment there is a branch of New Look situated on a farm down a single track country lane.
I recall coming across this in my travels. There was a named "town" at the intersection of two streets - upon passing through there, nothing. Later wondering where the town went I found that it was not ever there and was just present to identify people copying that map.
(Software patents should be abolished. I just like to point out their absurdity and how it's easy to independently develop a technique (steganography in a search engine result) that someone has already grubbed a "patent" on.)
> while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. _Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims._
If I'm reading this correctly, this patent is claiming things that are "apparent" (obvious?) to those "in the know". Computer or not, how did this get granted?
Absolutely. It usually tends to be either more visible, or less visible though - some pdf files have a literal watermark on the pages, while other formats like epub contain a guid or other watermark content in the source (epub is zipped xhtml)
Amusingly, I believe if you use calibre's ebook conversion to replace stylesheets and add toc, it may also actually remove those markers that have no actual content and only exist to provide a unique ID.
There's no "benefit", they were just looking for a nice unique way of watermarking textual content, to prove that what shows up on a Google search is indeed sources from them and not some other transcription of the lyrics.
> LyricFind. LyricFind is a Google licensing partner, and may be the source of the Genius content appearing in Google’s search results. LyricFind published an explanation on its web site Monday, saying, “Some time ago, Ben Gross from Genius notified LyricFind that they believed they were seeing Genius lyrics in LyricFind’s database. As a courtesy to Genius, our content team was instructed not to consult Genius as a source. Recently, Genius raised the issue again and provided a few examples. All of those examples were also available on many other lyric sites and services, raising the possibility that our team unknowingly sourced Genius lyrics from another location. As a result, LyricFind offered to remove any lyrics Genius felt had originated from them, even though we did not source them from Genius’ site. Genius declined to respond to that offer. Despite that, our team is currently investigating the content in our database and removing any lyrics that seem to have originated from Genius.”
Sounds like everyone and their mother is scraping stuff off Genius, not just Google; they went after Google specifically because they knew they couldn't just disappear and they had the financial means to pay for compensation, unlike the thousands of crappy lyrics websites.
That said, it would've been just if Google would pay for access to Genius' particular, well-curated, "source" database of lyrics, especially given that they're basically stealing traffic.
But it sounds like the issue is that Google really wasn't using Genius's data directly. The problem is that Google is sourcing from "The Internet," and everybody and their grandmother is 'stealing' from Genius.
Here's an interesting question: if Genius closed up shop tomorrow, how long would it take Google to become the primary source of song lyrics online (by rebuilding Genius's dataset from general Internet harvesting)?
Last I checked, ignorance wasn't usually a defense but I'm not a lawyer. I just know not to pretend the Keurig I bought off the back of a truck was a good deal for everyone involved.
But physical analogies to IP fall apart quickly so I'm not going to encourage people to read into that too deeply
So there's two different concepts. One is original creative output, which is copyrightable. The other is information, which is not copyrightable.
If you find something verbatim identical in a bunch of different places, you've got a strong case that it's just information, because if it were original creative output it wouldn't show up identically in multiple places.
If it turns out everyone was plagiarizing a single source, but you were unaware and took down the offending content when asked, you won't have much in the way of legal liability.
If that flies, it's a great tactic. You can't just use the data from site A, so you build anonymous sites B, C and D who use the data from site A, and use the data from those sites instead. "We didn't source from A".
It's not a great tactic if the owner of site A decides to sue you and subpoenas the hosting providers for B, C and D. You can only get away with it as long as you aren't successful enough to draw the attention of anyone with money to burn on legal fees.
Google has a history of scraping content that they want, their business is built on the back of scraping other peoples content. The story I read just recently of what happened to Celebrity Net Worth was an interesting read where Google asked for an API, they refused and Google just scraped the content anyway. There was no lawsuit, but CNW put up fake content and sure enough, it made its way to Google.
It is all ironic given how aggressive Google are in blocking any attempts to scrape its content.
I’d say most of Genius’ visitors comes from the “song x lyrics” so hiding those with robots would ultimately make them lose almost all of their traffic.
robots.txt is designed to keep garbage off search results. It has absolutely no power to prevent a bot to do anything. Also if the site added robots.txt they might as well shut down because their entire userbase comes from people searching lyrics on google.
The problem is that Google is stealing content and placing it on search so the user never goes to the source, By blocking it with robots they block themselves from google results AND Google may already keep scraping the content.
> The Unicorn tier is for large companies or companies that would like to have a reciprocal relationship with our foundation. If you need special guarantees, indemnities or require us to sign your contract for a data license, please select this tier. If you have another creative idea you would like to propose, please also select the unicorn tier.
> For any of these cases, please detail your request in the company information field and we will work with you to fit your company's mythical situation. We will also find an appropriate monthly support amount to our non-profit foundation of $1500 or more per month. Please always consider enabling the growth of our non-profit foundation and the continuous growth of our metadata!
We live in a society of laws. Even soldiers. Google have shown they have no respect for the law not equality before it and will cheat while using the law as a cudgel. Recall law exists that the strongest might not always get their way. "Ironic" is the pole way of pointing this out.
Without law, Google cease to exist immediately. They are incapable of enforcing property rights without it.
Pardons aside, soldiers go to jail for taking an attitude like Google's.
I am not a lawyer, but my understanding is that only the copyright holder can sue for copyright infringement. I am pretty certain Genius does not hold the copyright to those lyrics. It's odd Genius brought this case at all. This is briefly noted at the end of the original article, but it seems like the whole point. Did I miss something?
From genius.com: "Genius Media Group, Inc. (GMG) is fully licensed to display lyrics across all of its properties. In 2013, GMG entered into licenses with every major music publisher: Sony/ATV Music Publishing, EMI Music Publishing, Universal Music Publishing Group, and Warner/Chappell Music. In addition, GMG developed a form license with the National Music Publishers' Association (NMPA) which today covers more than 96% of the independent publisher market."
Original copyright holder could give someone else authorisation to sue on their behalf, e.g., through an assignment. Doubtful Genius got an assignment in the agreements they have with publishers.
Also, Google claimed it is sub-licensed to re-publish through a third party, LyricFind, which has licenses with "over 4000" music publishers.
>Copyright holder could give someone else authorisation to sue on their behalf, e.g., through a license.
They can't assign the bare right to sue. To have standing the plaintiff will need to hold at least one of the exclusive rights in 17 U.S. Code § 106 aiui. Cf Righthaven cases, Silvers v Sony Pictures
They are going to have a problem with standing for the exact reason you suggest. This case was one company who was scraping other people’s copyrighted works suing another company for doing the same.
Isn’t their main argument unfair competition? Google, the starting point of the internet, decided to undermine their business by taking their collated content and publishing it at the top of results?
Google appears to do this for other things, asking questions often shows answers without needing to visit the website. Perhaps these are all licensed and there is a kick back for these sites...
Google appear to be serving ads on content other people have collated while eliminating the source of traffic to the original site.. If that isn’t unfair business practice and taking advantage of their monopoly on search I don’t know what is.
You very well can, but that doesn't mean Google can't block you (CFAA protection). Genius here was reliant on Google for a large portion of their regular traffic so they couldn't just block Google without suffering revenue losses.
Because they'll block you. You can prevent Google from indexing your content using robots.txt (Google has a robots.txt on its site as well).
You don't have a right to access their service as many times as you want to, eg by automated means, although you can attempt it. Flip a coin on whether they sue to stop you if you become too annoying.
The Genius complaint is essentially that they want to be represented in Google search without having Google take lyrics from their service and use them in their own served-up content snippets (making a sizable part of the value of genius.com void). Genius knows Google can get lyrics elsewhere if they have to, the lawsuit is probably out of spite due to past conflict with Google and their annoyance at Google competing with them in a shady way (Google was de facto using Genius's service to reduce the value of Genius).
That's correct however the publishing company that administers an artists royalties is generally the one to bring the suit. This is the same type of royalty as sheet music.
I feel like this is a forgotten bit of history but for years Genius didn't pay royalties for reproducing lyrics instead choosing to claim that their own reprinting of lyrics fell under "fair use" guidelines:
>"David Lowery, frontman and songwriter for Cracker and Camper van Beethoven, is waging war on the sites he believes make money off song lyrics but don't pay the songwriter. Once he took a closer look at where his music was making money on the Internet, he realized: There were more people searching to find lyrics to his songs than searching to illegally download mp3s of his music. And he wasn't making money off those searches. Last November, after months of exhaustive and systematic Googling, he released something called The Undesirable Lyric Website List.
>"The National Music Publishers Association seized upon this list, and announced that it would be sending take-down notices to every single name. At the top of that list was the very popular Rap Genius."
>"Rap Genius has been around for a few years, and it's extremely popular. No ads, lots of traffic and, just recently, a major investment from one of the hottest venture capital firms in Silicon Valley. The founder of Rap Genius, Ilan Zechory, says the site doesn't belong on Lowery's list. Because it's way more than just transcribed lyrics. He says the site is more like a social network: a discussion board for music geeks and even some of the musicians themselves — prominent rappers like Nas and Rick Ross — to comment on their own lyrics. Artists, the founders say, love the site."
>"Just this week, Rap Genius announced that, despite its opinion that the site falls under the criteria for fair use, it's going to pay songwriters for posting their lyrics. It's just easier than fighting with music publishers, who've been very successful at going after other lyric sites in the past few years. ..."[1]
Genius claims that Google’s actions caused a decline in traffic to its site. The lawsuit was probably a way to assuage nervous investors (who have poured >70M into the company)
I think it's important to point out that when you license lyrics, you don't actually get the lyrics. I know, sounds ridiculous. You'll get the license to display them, and when you ask the rightsholders of these lyrics (the publishers) for the actual lyrics they'll tell you "oh, we don't have the actual text, just the rights. You need to find the text somewhere else."
As a result, creating an accurate lyrics database like Genius has done is an enormous amount of work, and my non-lawyer gut-feeling says that in this case, Google is screwing over Genius big time. Too bad the legal system doesn't support that.
If Google can scrape my site, am I allowed to scrape Google results? Could I create a Google clone by scraping?
If I scraped the most common search results from Google, front page only, and removed all the ads what would Google's argument against that be?
On one hand, so many sites make finding information difficult, on the other it feels pretty scuzzy that Google prevents searchers from clicking through to the site that put the work into generating content.
The amount of Google captchas that you needeed to solve when searching on Google from Microsoft office made me think that it was some kind of psychological warfare.
> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.
Both Google and Genius are licensing the lyrics. Ironically, Genius ended up having to settle a case years ago because they were using lyrics without the appropriate licensing [1].
I think the legal argument services like serpapi make is that as long as you don't create a google account and/or accept google's terms then you are free to scrape and clone what is publicly accessible (at least in the US). I have no idea though.
The problem with the robots.txt "standard", e.g., ones like Google's with no "crawl-delay" directives, is that it does not define what is a "robot". The query above is obviously not a "robot", but Google, with all it resources, still treats as such.
Google probably does more (abusive) scraping than any other entity. Web scraping is in their DNA. It is in their web pages, too.
Yeah, the tricky thing for those scraped by google is that given google's search monopoly, the sites can't block their scraping entirely, since they need to be shown in search results.
I wonder if it's even possible to fix Google search in the framework of a for-profit company. It seems like the trajectory of any ad-supported service eventually lands it in a "don't let the user out no matter the cost" phase. Perhaps such a service really does need to operate as a non-profit foundation of some sorts.
There was a post about regulating Google like a public utility recently, but perhaps we should also consider looking at other less conventional internet "public utilities" - things like the Internet Archive, Wikipedia or essential open source projects like Debian. I think a search engine that's transparent both in terms of its logic and how it's maintained and managed might be the only way.
Genius knows this, which is why they didn't file a copyright suit. Instead, they claimed other things like unfair competition and breach of contract. However, Title 17 Section 301 of the US Code says that "all legal or equitable rights that are equivalent to any of the exclusive rights within the general scope of copyright [...] are governed exclusively by this title". To avoid this, Genius needed to prove that their claims weren't "equivalent" – ie weren't just copyright claims dressed up as something else. They failed to do this, and so their case was thrown out.
The judge may have done the correct thing, but readers may feel that Congress didn't. This case will doubtless be used in the future to argue for sui generis database rights like the EU has.
(My view is that in principle, some form of sui generis database rights makes sense, but for the things that US copyright law already covers it is currently far, FAR too restrictive and lasts too long, so I would vehemently oppose expansion of existing US copyright law to cover sui generis database rights.
However, if US copyright law were reformed such that it mandated blanket licensing (see [the EFF proposal]), strengthened fair use protections, and shortened copyright duration, then I would totally support similar rights for sui generis databases.)
[EFF proposal for blanket licensing]: https://www.eff.org/deeplinks/2020/05/plan-pay-artists-encou...
Deleted Comment
Not really all true though. Genius started out by stealing lyrics from other sites. In the early days many of the lyrics had the exact same errors as other more establish sites. That may have changed since.
It's much better now, but that's only because of unpaid volunteer editors who do most of the corrections and annotations in their site.
You might be thinking of sui generis database rights, which DOES cover collections of facts. The EU, Russia, and Brazil recognize this right, but the US doesn't: https://en.wikipedia.org/wiki/Database_right#United_States
> The Act also provides copyright protection to compilations, but only to the extent that there has been a contribution of originality in assembling that compilation.
Map copyright is based on the idea there are decisions made around what to include and how to display it.
You can't photocopy a map and claim copyright. However, a human can trace the same map and claim copyright.
See City of New York v. GeoData Plus, and the discussion in https://wiki.openstreetmap.org/w/images/6/6f/Protection_of_C...
In this regard, copying lyrics of some particular song does not violate the rights of Genius - they don't have copyright to that particular song and the compilation-of-facts rights don't apply for that particular single item.
Apparently they license the lyrics now:
> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.
It's not a case of someone copying without permission and then suing another person who copied them. It's a valid licensee suing someone who is copying them.
Imagine if a McDonald's franchisee sued someone running a rogue/unlicensed McDonald's around the corner. Would we have no sympathy for them also?
Legally speaking, it appears the right to sue requires at least some exclusive copyright rights, [1] which Genius surely didn't have (and a McDonald's franchisee also would not have). This is presumably why they didn't bring a copyright suit.
https://www.lexology.com/library/detail.aspx?g=7d4ea127-0fb0...
The fact that Genius collated those works is meaningful work in its own right.
There are examples in law that 'work' can be protected; Just because you don't have the copyright doesn't mean that someone else is just allowed to use your work results.
Apparently in this specific case its not protected.
https://imgur.com/IGs0sg7
So "rock 'n' roll" is correct. And "rock ’n’ roll" is correct. But "rock ‘n’ roll" is not correct, since the wrong apostrophe is used. We're not quoting the letter n, we're showing that the letter a was removed.
If you hope to avoid being caught this way, I'm going to assume you noticed this without the benefit of hindsight and plan to correct all out-of-place Unicode characters automatically. How will you avoid over-correcting?
There's also no reason to believe this is the only fingerprinting Genius has done (they only need to publish the most obvious fail). For example, I can use the same fingerprinting technique but switch between American and British spellings.
This is not a straightforward problem.
Timing is critical in Morse code. You can't just write out a bunch of dashes and dots to transcribe it without clearly transcribing the rests between dots and dashes as well. They haven't given us the rests at all, so all the info they end up having is:
dot dash dot dot dash dot dot dot dot dot dot dot dash dash dot dash dot dot dot dash dot dot
And that can be interpreted in any number of different possible ways besides "REDHANDED". E.g. it could also be "AU5EWRFE", or any of thousands of different interpretations (actually probably a lot more than that; this would be a fun programming problem). They should have used a binary encoding; 22 bits (all they have given us) is not enough information to uniquely encode the string "REDHANDED". Once you include the short rests that are needed, we're talking 44 binary bits or 22 ternary bits. And if you want the long rests to distinguish properly the spaces between words, then 22 ternary bits won't do it; you need the full 44 binary bits.
The fact that the sequence can be interpreted as REDHANDED with a particular way of grouping the input is just being cute. Regardless of the grouping, it is a binary encoding of a 22-bit number, and so would have a one-in-2^22 chance of being reproduced at random.
Edit: To clarify: You're saying they should've mentioned 22-bits in the context of binary digits without mentioning Morse code, and if they did want to bring up Morse code they should've used trits or more bits to encode the stops. I'm saying that the fact that their 22-bit sequence can be interpreted in Morse code as a relevant word is just dressing, and does not detract from the point that the sequence was likely copied. Put another way, if someone tried to counter by saying their sequence could've been generated independently because "AU5EWRFE" and many other strings also encode to the same sequence, it would not affect the facts at all.
E.G. When they caught Bing copying them... https://www.wired.com/2011/02/bing-copies-google/
And they definitely do it with maps. There is a tiny little village I visit in rural Roscommon each year. Each year a new major retailer appears to have opened in this 500 population village, well according to Google Maps that is. At the moment there is a branch of New Look situated on a farm down a single track country lane.
I recall coming across this in my travels. There was a named "town" at the intersection of two streets - upon passing through there, nothing. Later wondering where the town went I found that it was not ever there and was just present to identify people copying that map.
[1]: https://patents.google.com/patent/US9881516B1/en
(Software patents should be abolished. I just like to point out their absurdity and how it's easy to independently develop a technique (steganography in a search engine result) that someone has already grubbed a "patent" on.)
> while particular embodiments and applications have been illustrated and described, it is to be understood that the disclosed embodiments are not limited to the precise construction and components disclosed herein. _Various modifications, changes and variations, which will be apparent to those skilled in the art, may be made in the arrangement, operation and details of the method and apparatus disclosed herein without departing from the spirit and scope defined in the appended claims._
If I'm reading this correctly, this patent is claiming things that are "apparent" (obvious?) to those "in the know". Computer or not, how did this get granted?
The patent claim requires the program to "load available PHP server header information"
https://www.urbandictionary.com/define.php?term=hiybbprqag
https://www.cbsnews.com/news/hiybbprqag-how-google-tripped-u...
The way to check this is pretty easy though, get 2 users to bit for bit compare their books to work out if they are identical.
https://gizmodo.com/harpercollins-is-now-using-digital-water...
https://goodereader.com/blog/e-book-news/everything-you-need...
Amusingly, I believe if you use calibre's ebook conversion to replace stylesheets and add toc, it may also actually remove those markers that have no actual content and only exist to provide a unique ID.
https://searchengineland.com/google-to-add-attribution-to-li...
The dismissal seems logical to me
That said, it would've been just if Google would pay for access to Genius' particular, well-curated, "source" database of lyrics, especially given that they're basically stealing traffic.
Here's an interesting question: if Genius closed up shop tomorrow, how long would it take Google to become the primary source of song lyrics online (by rebuilding Genius's dataset from general Internet harvesting)?
Even if Google scraped it on purpose in order to steal traffic, it would likely be legal.
But physical analogies to IP fall apart quickly so I'm not going to encourage people to read into that too deeply
If you find something verbatim identical in a bunch of different places, you've got a strong case that it's just information, because if it were original creative output it wouldn't show up identically in multiple places.
If it turns out everyone was plagiarizing a single source, but you were unaware and took down the offending content when asked, you won't have much in the way of legal liability.
Dead Comment
It is all ironic given how aggressive Google are in blocking any attempts to scrape its content.
https://techcrunch.com/2013/12/25/google-rap-genius/
> The Unicorn tier is for large companies or companies that would like to have a reciprocal relationship with our foundation. If you need special guarantees, indemnities or require us to sign your contract for a data license, please select this tier. If you have another creative idea you would like to propose, please also select the unicorn tier.
> For any of these cases, please detail your request in the company information field and we will work with you to fit your company's mythical situation. We will also find an appropriate monthly support amount to our non-profit foundation of $1500 or more per month. Please always consider enabling the growth of our non-profit foundation and the continuous growth of our metadata!
Deleted Comment
It's just the war that is being fought, not some sort of hypocrisy or irony.
We live in a society of laws. Even soldiers. Google have shown they have no respect for the law not equality before it and will cheat while using the law as a cudgel. Recall law exists that the strongest might not always get their way. "Ironic" is the pole way of pointing this out.
Without law, Google cease to exist immediately. They are incapable of enforcing property rights without it.
Pardons aside, soldiers go to jail for taking an attitude like Google's.
Original copyright holder could give someone else authorisation to sue on their behalf, e.g., through an assignment. Doubtful Genius got an assignment in the agreements they have with publishers.
Also, Google claimed it is sub-licensed to re-publish through a third party, LyricFind, which has licenses with "over 4000" music publishers.
They can't assign the bare right to sue. To have standing the plaintiff will need to hold at least one of the exclusive rights in 17 U.S. Code § 106 aiui. Cf Righthaven cases, Silvers v Sony Pictures
Google appears to do this for other things, asking questions often shows answers without needing to visit the website. Perhaps these are all licensed and there is a kick back for these sites...
Google appear to be serving ads on content other people have collated while eliminating the source of traffic to the original site.. If that isn’t unfair business practice and taking advantage of their monopoly on search I don’t know what is.
In Genius's case, does it disallow Google for scraping?
From their robots.txt, I can't tell:
https://genius.com/robots.txt
You don't have a right to access their service as many times as you want to, eg by automated means, although you can attempt it. Flip a coin on whether they sue to stop you if you become too annoying.
The Genius complaint is essentially that they want to be represented in Google search without having Google take lyrics from their service and use them in their own served-up content snippets (making a sizable part of the value of genius.com void). Genius knows Google can get lyrics elsewhere if they have to, the lawsuit is probably out of spite due to past conflict with Google and their annoyance at Google competing with them in a shady way (Google was de facto using Genius's service to reduce the value of Genius).
I feel like this is a forgotten bit of history but for years Genius didn't pay royalties for reproducing lyrics instead choosing to claim that their own reprinting of lyrics fell under "fair use" guidelines:
>"David Lowery, frontman and songwriter for Cracker and Camper van Beethoven, is waging war on the sites he believes make money off song lyrics but don't pay the songwriter. Once he took a closer look at where his music was making money on the Internet, he realized: There were more people searching to find lyrics to his songs than searching to illegally download mp3s of his music. And he wasn't making money off those searches. Last November, after months of exhaustive and systematic Googling, he released something called The Undesirable Lyric Website List.
>"The National Music Publishers Association seized upon this list, and announced that it would be sending take-down notices to every single name. At the top of that list was the very popular Rap Genius."
>"Rap Genius has been around for a few years, and it's extremely popular. No ads, lots of traffic and, just recently, a major investment from one of the hottest venture capital firms in Silicon Valley. The founder of Rap Genius, Ilan Zechory, says the site doesn't belong on Lowery's list. Because it's way more than just transcribed lyrics. He says the site is more like a social network: a discussion board for music geeks and even some of the musicians themselves — prominent rappers like Nas and Rick Ross — to comment on their own lyrics. Artists, the founders say, love the site."
>"Just this week, Rap Genius announced that, despite its opinion that the site falls under the criteria for fair use, it's going to pay songwriters for posting their lyrics. It's just easier than fighting with music publishers, who've been very successful at going after other lyric sites in the past few years. ..."[1]
[1] "https://www.npr.org/sections/money/2014/05/09/310462951/when...
Still, Google is being very fucking evil here. It's as if they stole that $70M for themselves.
As a result, creating an accurate lyrics database like Genius has done is an enormous amount of work, and my non-lawyer gut-feeling says that in this case, Google is screwing over Genius big time. Too bad the legal system doesn't support that.
If I scraped the most common search results from Google, front page only, and removed all the ads what would Google's argument against that be?
On one hand, so many sites make finding information difficult, on the other it feels pretty scuzzy that Google prevents searchers from clicking through to the site that put the work into generating content.
You are alllowed. Google would not likely try to sue you. They will try to block you however.
Bing was created by copying Google results. Google did not sue Microsoft, but they did try to expose the copying.
Do you have a source? Just curious about the back story.
https://www.wired.com/2011/02/bing-copies-google/
> Genius isn’t the copyright holder for these lyrics, it just licenses them itself.
Both Google and Genius are licensing the lyrics. Ironically, Genius ended up having to settle a case years ago because they were using lyrics without the appropriate licensing [1].
[1] https://www.nytimes.com/2014/05/07/business/media/rap-genius...
Your idea to scrape Google's search results is pithy, ironic counter-innovation at its dastardly best.
All you need to pull this off is funding for a top legal team, and deep reserves of emotional energy.
Go for it!
Deleted Comment
For example, if you include a User-Agent header and put certain strings in it, e.g., "curl/7.47", you will be blocked.
The problem with the robots.txt "standard", e.g., ones like Google's with no "crawl-delay" directives, is that it does not define what is a "robot". The query above is obviously not a "robot", but Google, with all it resources, still treats as such.Google probably does more (abusive) scraping than any other entity. Web scraping is in their DNA. It is in their web pages, too.
There was a post about regulating Google like a public utility recently, but perhaps we should also consider looking at other less conventional internet "public utilities" - things like the Internet Archive, Wikipedia or essential open source projects like Debian. I think a search engine that's transparent both in terms of its logic and how it's maintained and managed might be the only way.