The Mueller Report Can’t Be Copyrighted, Is Flagged by Copyright Bots Anyway

How about some sort of legal punishment against scribd/youtube etc for false positives?

chrisseaton · 7 years ago

Who do you think Scribd are? A government agency? They're a private-sector business. If they don't want to host your document for a good reason, a bad reason, a reason you don't agree with, a mistaken reason, a false reason, or no reason at all, then that's their decision (except for protected categories for the purpose of discrimination, which do not apply here.)

They’re not stopping you copying anything - they just don’t want to host documents they’re not sure of the status of in their automated system.

Host your own documents if you don't think they’re right.

Meph504 · 7 years ago

These companies aren't violating the law, at best they are violating their own policies in regards to the content they host.

dmix · 7 years ago

Who defines what is a "violation" of their "own" policy? Or what the policy is exactly? Are there no repercussions if their "policy" is not directly reflecting the goals and intentions of original copyright law?

bob_doggo · 7 years ago

isn't the problem here that there is a punishment for not removing copyrighted material. so, anytime some bot says that there is X% chance that this is copyrighted, it's safer to just remove the material.

And these are private companies, not public institutions, they are not letting you upload stuff because of some common good, they do it because it's their business model, why should they let you upload something that has a 0.1% chance of being copyrighted by someone that could then demand money from them.

Theodores · 7 years ago

Curious to see Scribd mentioned. Last time I used anything on Scribd was a decade ago to embed some PDFs that should have been web pages really.

Then they made it all paywalled so that was that.

I think their business has 'mySpaced' and fundamentally does not go with how people consume content. Nobody wants a PDF these days.

So why are they mentioned now?

Well, the report is wrong!!! It is a scanned document. So it is not digital. They could have written it with HTML5 but they are stuck in the past, twenty years out of date. PDF was okay back then but not now. Scanned images in a PDF are not accessible. You can't search it.

People who like PDF because they used it for legal documents back in the day have their arguments about why to use PDF. But if this report came from a single URL and was a tenth the size in HTML then nobody would have problems determining if the page was genuine or a fake - it would be in the URL.

Nobody would need to disperse copies of it over the internets. Just the one URL would do.

Really I think that this Mueller chap and the whole freaking government needs to be sacked and prosecuted for not making their work accessible. It is important to democracy.

tantalor · 7 years ago

You'd have to show a harm.

It's about as bad as the mailman losing your letter. So what? If it was important, you would have insured/certified it.

ajuc · 7 years ago

> It's about as bad as the mailman losing your letter

No it's not. If mailman loses your letter 1 person won't be able to read what you wrote to him. If youtube blocks your channel nobody will, and if you depend on youtube for money your business defaults.

> If it was important, you would have insured/certified it.

There's no way to insure against copyright strikes as far as I know.

Is Scribd under any legal obligation to display content that users upload? I cannot find any in the Terms of Use.

I did find some terms that allow Scribd to remove content for any reason and terms which purport to limit any remedy for such removal to "Don't like it? Then don't use it."

12.1 Scribd. You agree that Scribd, in its sole discretion, for any or no reason, and without penalty, may terminate any account (or any part thereof) You may have with Scribd or Your use of the Scribd Platform and remove and discard all or any part of Your account, User profile, and any content, at any time and without notice to You.

12.2 You. Your only remedy with respect to any dissatisfaction with (i) the Scribd Platform, (ii) any term of these Terms, (iii) any policy or practice of Scribd in operating the Scribd Platform, or (iv) any content or information transmitted through the Scribd Platform, is to cancel Your account and stop using Scribd.

https://support.scribd.com/hc/en-us/articles/210129326-Gener...

7. Removal of Content. Regardless of which purchase option You have selected, Scribd reserves the right to modify or withdraw at any time any Scribd Commercial Content from access by You at the request of its publisher or for any other reason.

https://support.scribd.com/hc/en-us/articles/210129486-Scrib...

sfifs · 7 years ago

While this doesn't seem to be DCMA, a question for the lawyers here on DCMA takedown requests. Given issuing a request require the requestor to make a "good faith" affidavit, if they're reporting something flagged by a bot that is obviously wrong like this if a human had checked, can it be used as evidence of bad faith & committing perjury?

mutagenesis · 7 years ago

While a false DMCA takedown request can be fined and ultimately punished with jail time, most of the casework on this has clear malicious intent. These are parties that send out a notice just for articles or posts that are critical of said party. In these cases, you had a human on one side knowingly filing a single false DMCA takedown.

http://www.aaronkellylaw.com/consequences-of-filing-a-false-...

These are cases of programmers creating takedown bots with false positives. Do the programmers know that there will be false positives? Yes. Do they not make a good faith effort to prevent those false positives? Probably. Good luck proving this in court though.

pavel_lishin · 7 years ago

This is an excellent example of why @qntm called AI "money laundering for responsibility":

https://twitter.com/qntm/status/1030846375213379584

inflatableDodo · 7 years ago

Given there is a contentID system for explicitly copyrighted works, it seems insane that nobody involved in these systems has seen fit to do the same for explicitly public works.

edgineer · 7 years ago

These systems are good at picking up small portions of copyrighted works within some larger context, but to guarantee that 100% of a YouTube video or scribd document is in the public domain is a different problem.

For published works like the Mueller report, one wouldn't need contentID. Matching the document's hash would suffice.

gizmo686 · 7 years ago

A hash would still only work for the exact document. Suppose someone uploads a new version with a better table of contents, or with some added annotations [0]; now your hash no longer matches, while contentID probably will.

[0] Both things which could be copywrited, but I'll assume will not trigger the actual flag.

itronitron · 7 years ago

these systems seem to have no concept of non-copyrighted works which is a fundamental flaw

minikites · 7 years ago

There's no profit motive.

There is an excellent profit motive, it is just not as directly obvious as it is mostly realised in money not spent and humans seem notoriously bad at preferring small gains over large savings.

whym · 7 years ago

The issue seems to be skewed incentive towards false negatives vs false negatives. If leaving false positives (materials being taken down based on a false claim) give much lower risk to the platform than false negatives (materials that should be removed but has not been removed), they would naturally prioritize reducing false negatives.

fouc · 7 years ago

vojta_letal · 7 years ago

Well, the EU has just passed the Copyright Directive. That makes reading such articles both funny and terrifying.

3xblah · 7 years ago

josteink · 7 years ago

I don’t understand why people use something like scribbd to host a PDF a few megs in size and then complain about takedowns.

Host yourself. The operational and BW cost for hosting static content is ridiculously low.

lostmyoldone · 7 years ago

I believe that the particular report was 139MB of scanned printouts of a redacted PDF. Depending on your site hosting, I imagine it could actually become somewhat expensive if it turns out to be popular.

colejohnson66 · 7 years ago

This. 10,000 downloads would already be a terabyte of bandwidth

exo762 · 7 years ago

We need a reform. One that would remove the idea that one can collect royalties from the work. Royalties are incompatible with the way information is disseminated.

Authors don't collect royalties. Absolute majority of authors don't live of royalties. Lets just kill this monstrosity.

If your business model comes with huge externalities for the society as whole... your business model must go.