Befuddling that this happened again. It’s not the first time
- Paul Manafort court filing (U.S., 2019)
Manafort’s lawyers filed a PDF where the “redacted” parts were basically black highlighting/boxes over live text. Reporters could recover the hidden text (e.g., via copy/paste).
- TSA “Standard Operating Procedures” manual (U.S., 2009)
A publicly posted TSA screening document used black rectangles that did not remove the underlying text; the concealed content could be extracted. This led to extensive discussion and an Inspector General review.
- UK Ministry of Defence submarine security document (UK, 2011)
A MoD report had “redacted” sections that could be revealed by copying/pasting the “blacked out” text—because the text was still present, just visually obscured.
- Apple v. Samsung ruling (U.S., 2011)
A federal judge’s opinion attempted to redact passages, but the content was still recoverable due to the way the PDF was formatted; copying text out revealed the “redacted” parts.
- Associated Press + Facebook valuation estimate in court transcript (U.S., 2009)
The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
A broader “history of failures” compilation (multiple orgs / years)
The PDF Association collected multiple incidents (including several above) and describes the common failure mode: black shapes drawn over text without deleting/sanitizing the underlying content.
https://pdfa.org/wp-content/uploads/2020/06/High-Security-PD...
Never trust a lawyer with a redact tool any more complicated than a marker.
I've seen lawyers at major, high-priced law firms make this same mistake. Once it was a huge list of individuals names and bank account balances. Fortunately I was able to intervene just before the uploaded documents were made public.
Folks around here blame incompetence, but I say the frequency of this kind of cock-up is crystal clear telemetry telling you the software tools suck.
If the software is going to leverage the familiarity of using a blackout marker to give you a simple mechanism to redact text, it should honour that analogy and work the way any regular user would expect, by killing off the underlying text you're obscuring, and any other correponding, hidden bits. Or it should surface those hidden bits so you can see what could come back to bite you later. E.g. It wouldn't be hard to make the redact tool simultaneously act as a highlighter that temporarily turns proximate text in the OCR layer a vibrant yellow as you use it.
It often comes down to not using the right software and training issues. They have to use Acrobat, which has a redaction tool. This is expensive so some places cheap out on other tools that don’t have a real redaction feature. They highlight with black and think it does the same thing whereas the redaction tool completely removes the content and any associated metadata from the document.
This was basically the only reason we were willing to cough up like $400 for each Acrobat license for a few hundred people. One redaction fuckup could cost you whatever you saved by buying something else.
I would like to believe that the DOJ lacking the proper software might have something to do with DOGE. That would be sweet irony.
> Folks around here blame incompetence, but I say the frequency of this kind of cock-up is crystal clear telemetry telling you the software tools suck.
Absolutely. They know this is confusing, and they're bound and determined not to fix it. At the least, they need a pop-up to let you know that it's not doing what you might think it's doing.
Apple’s Preview app (which has a very thorough PDF markup tool) does this right: it has an explicit “redact” tool which deletes the content it’s used on.
Of course we can blame incompetence. It's incompetent not to realise your own incompetencies, also known as overconfidence.
Any lawyer should be like "I don't know what I'm doing here I'll get an expert to help" just like as a software developer I'd ask a lawyer for their help with law stuff...because IANAL uwu
Always worth remembering that PDFs are basically a graphic design format/editor from the 70s. It was never intended for securely redacting documents and while it can be done, that’s not the default behaviour.
No surprise non-experts muck it up and I don’t see that changing until they move to special-purpose tools.
In 2025, never attribute to incompetence what you could to a conspiracy. [sarcasm]
They fired/drove away/reassigned most of those who are competent in the executive branch generally, it is pretty easy to believe that none of those managing the document release and few of those working on it are actually experienced or skilled in how you do omissions in a document release correctly. Those people are gone.
> - Associated Press + Facebook valuation estimate in court transcript (U.S., 2009) The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
> What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Typically, two copies of a redacted document are submitted via ECF. One is an unredacted but sealed copy that is visible to the judge and all parties to the case. The other is a redacted copy that is visible to the general public.
So, to answer what I believe to be your question: the opposing party in a case would typically have an unredacted copy regardless of whether information is leaked to the general public via improper redaction, so the issue you raise is moot.
My guess would be that if the benefitting legal party didn't need to declare they also benefitted from this (because they legally can't be caught, etc.) they wouldn't.
I know and am friends with a lot of lawyers. They're pretty ruthless when it comes to this kind of thing.
Legally, I would think both parties get copies of everything. I don't know if that was the case here.
> strict ethical standards to not use the information (for what little that may be worth)
If it's worth so little to your eyes/comprehension you will have no problem citing a huge count of cases where lawyers do not respect their obligations towards the courts and their clients...
That snide remark is used to discredit a profession in passing, but the reason you won't find a lot of examples of this happening is because the trust clients have to put in lawyers and the legal system in general is what makes it work, and betraying that trust is a literal professional suicide (suspension, disbarment, reputational ruin, and often civil liability) for any lawyer... that's why "strict" doesn't mean anything "little" in this case.
> Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
Curious. I am not a litigator but this is surprising if you found support for it. My gut was that the general obligation to be a zealous advocate for your client would require a litigant to use inadvertently disclosed information unless it was somehow barred by the court. Confidentiality obligations would remain owed to the client, and there might be some tension there but it would be resolvable.
Here in NL if confidential information about offenders leaks from court documents, it usually leads to a reduction in sentencing because the leak of classified information is weighed as part of the punishment. If the leak was proven to be intentional, it might lead to a mistrial or even acquittal. Leaking of victims' information usually only results in a groveling public apology from the Minister/Secretary of Justice du jour.
This has happened so many times I feel like the DoJ must have some sort of standardised redaction pipeline to prevent it by now. Assuming they do, why wasn't it used?
I am happy with their lack of expertise and hope it stays that way, because I cannot remember a single case where redactions put the citizenry at a better place for it.
Of course if it's in the middle of an investigation it can spoil the investigation, allow criminals to cover their tracks, allow escape.
In such case the document should be vetted by competent and honest officials to judge whether it is timely to release it, or whether suppressing it just ensures that investigation is never concluded, extending a forever renewed cover to the criminals.
There was also a process on how to communicate top secret information, but these idiots prefered to use signal.
I'm completly lost on how you can be surprised by this at all? Trump is in there, tells some FBI faboon to black everything out, they collect a group of people they can find and start going through these files as fast as they can.
"When a clown moves into a palace, he doesn't become a king; the palace instead becomes a circus."
Secure systems are not exactly the right environment for quick release and handling. So documents invariably get onto regular desktops with off the shelf software used by untrained personnel.
Given the context and the baldly political direction behind the redactions, it's not at all unlikely that this is the result of deliberate sabotage or malicious compliance. Bondi isn't blacking these things out herself, she's ordering people to do it who aren't true believers. Purges take time (and often blood). She's stuck with the staff trained under previous administrations.
Not to mention when the White House published Obama's birth certificate as a PDF. I remember being able to open it and turn the different layers off and on.
Are you trying to suggest that indicated it was fraudulent? That has very much been debunked -- it's just an artifact of OCR and compression, something that many scanners do automatically [1].
"There are major differences between the Trump 1.0 and 2.0 administrations. In the Trump 1.0 administration, many of the most important officials were very competent men. One example would be then-Attorney General William Barr. Barr is contemptible, yes, but smart AF. When Barr’s DOJ released a redacted version of the Mueller Report, they printed the whole thing, made their redactions with actual ink, and then re-scanned every page to generate a new PDF with absolutely no digital trace of the original PDF file. There are ways to properly redact a PDF digitally, but going analog is foolproof.
The Trump 2.0 administration, in contrast, is staffed top to bottom with fools."
It's like Russian spies being caught in the Netherlands with taxi receipts showing they took a taxi from their Moscow HQ to the airport: corrupt organizations attract/can only hire incompetent people...
Anyone remember how the Trump I regime had staff who couldn't figure out the lighting in the White House, or mistitled Australia's Prime Minister as President?
The bigger difference from my perspective is that they have competent people doing the strategy this time. The last Trump administration failed to use the obvious levers available to accomplish fascism, while this one has been wildly successful on that end. In a few years they will have realigned the whole power dynamic in the country, and unfortunately more and more competent people will choose to work for them in order to receive the benefits of doing so.
> but smart AF. When Barr’s DOJ released a redacted version of the Mueller Report, they printed the whole thing, made their redactions with actual ink, and then re-scanned every page to generate a new PDF with absolutely no digital trace of the original PDF file.
This is a dumb way of doing that, exactly what "stupid" people do when their are somewhat aware of the limits of their competence or only as smart as the tech they grew up with. Also, this type of redaction eliminates the possibility to change text length, which is a very common leak when especially for various names/official positions. And it doesn't eliminate the risk of non-redaction since you can't simply search&replace with machine precision, but have to do the manual conversion step to printed position
IIRC there was a Slashdot discussion about it that went "Oh yeah, obviously you need to black out the face entirely, or use a randomized Gaussian blur." "Yeah, or just not molest kids."
similar to pressing delete or emptying recycle bin, in that all that happens is the operating system is told that section of the hard drive is now blank, but the underlying files are still there and available to recover
The covid origins Slack messages discovery material (Anderson & Holmes) were famously poorly redacted pdfs, allowing their unredacting by Gilles Demaneuf, benefiting all of us.
The U.S. federal government is bad at redactions on purpose.
The offices responsible for redactions are usually in-house legal shops (e.g., an Office of Chief Counsel inside an agency like CBP) and the agency’s FOIA office. They’re often doing redactions manually in Adobe, which is slow, tedious, and error-prone. Because the process is error prone, the federal government gets multiple layers of review, justified (as DOJ lawyers regularly tell courts) by the need to “protect the information of innocent U.S. citizens.”
But the “bad at redactions” part isn’t an accident. It functions as a litigation tactic. Makes production slow, make FOIA responses slow, and then point to that slow, manual process as the reason the timeline has to be slow. The government could easily buy the kind of redaction tools that most law firms have used for decades. Purpose built redaction tools speed the work up and reduce mistakes. But the government doesn't buy those tools because faster, cleaner production benefits the requester.
The downside for the government is that every so often a judge gets fed up and orders a normal timeline. Then agencies go into panic mode and initiate an “all hands on deck.” Then you end up with untrained, non-attorney staff doing rushed redactions by hand in Adobe. Some of them can barely use a mouse. That’s when you see the classic technical failures: someone draws a black rectangle that looks like a redaction, instead of applying a real redaction that actually removes the underlying text.
This is an extremely interesting perspective. I haven't really heard it before, but I once worked for the state in a technical capacity and watched as they spent entire workdays and scheduled multiple meetings with the sole purpose of figuring out ways to slow down or narrow FOIA requests.
I didn't really know how they slept at night, but I don't know how a lot of people sleep at night. I only had to be involved because I had to do the actual trawling through the emails. They spent their time trying to narrow the keywords that I'd have to search, and trying to figure out new definitions of the words "related to."
There’s a great Australian comedy called Utopia about a government department that has a whole episode B-plot of the characters working on the Aussie equivalent of a FOIA request. It’s pretty funny and in the end one of the workers just finds it easier to leak the document to the requesting journalist rather than deal with the official process, even though it was mundane contract details on a carpark that came in ahead of schedule and under budget.
In another episode they’re trying to find out the length of a stealth submarine for construction planning purposes of a port or something, and they have to go through endless layers of security checks with the military that lead nowhere. In the end a reporter filming a documentary episode on the government agency tells them the length because they were allowed to film the submarine on another program.
Definitely recommend the show and my friends in government say it’s scarily accurate.
No, there isn't an enormous cohort of bureaucrats going to work every day, collectively wringing their hands and saying "haha, we're going to be STUPID today!"
It's funny seeing this play out because in my personal life anytime I'm sharing a sensitive document where someone needs to see part of it but I don't want them to see the rest that's not relevant, I'll first block out/redact the text I don't want them to see (covering it, using a redacting highlighter thing, etc.), and then I'll screenshot the page and make that image a PDF.
I always felt paranoid (without any real evidence, just a guess) that there would always be a chance that anything done in software could be reversed somehow.
If it's not done properly, and you happen at any point in the chain to put black blocks on a compressed image (and PDF do compress internal images), you are leaking some bits of information in the shadow casted by the compression algorithm : (Self-plug : https://github.com/unrealwill/jpguncrop )
And that's just in the non-adversarial simple case.
If you don't know the provenance of images you are putting black box on (for example because of a rogue employee intentionally wanting to leak them, or if the image sensor of your target had been compromised to leak some info by another team), your redaction can be rendered ineffective, as some images can be made uncroppable by construction .
Somewhat related, I once sent a FOI request to a government agency that decided the most secure way to redact documents was to print them, use a permanent marker, and then scan them. Unfortunately they used dye based markers over laser print, so simply throwing the document into Photoshop and turning up the contrast made it readable.
I was thinking I understand what's going on but then I came to the image showing the diff and I don't understand at all how that diff can unredact anything.
I'm trying to understand this cause it sounds fascinating but I don't get it. I don't have an advanced understanding of compression so that might be part of why.
If you compare an image to another image, you could guess by compression what is under the blocked part, that makes some sense to me conceptually, what I don't get is for the PDF specifically why does it compressing the black boxes I put have any risk? It's compressing the internal image which is just the black box part? Or are you saying the whole screenshot is an internal image?
There's also metadata in the image files. What specifically would be sensitive in the pdf with screenshots metadata that is also not present in the sceenshot image metadata?
it's absolutely bewildering how ridiculous everything has been so far in terms of competence and this really takes the cherry on the top near Christmas too.
USA is still very high, so they can go much much lower, but I think they might go to some still lower places, finding them where we didn't even know such places could exist. Some ideas:
This low https://en.wikipedia.org/wiki/Child_abuse_in_Pakistan aka a society where child abuse is simply accepted and mainstream, with the child abuse of child labour and dhijhadism being just additional nightmare fuel on top.
Normally, I'd never attribute to intention what can be blamed on incompetence. Especially if the government is doing it. But sure, if I were the intern tasked with this job...
I learned that a long time ago when I was a student and wanted to submit a pdf generated by a trial version of some software as an assignment and was trying to be clever and cover the watermark that said unregistered with a white box.
When opening the file in my slow computer, I could see all the rendering of the watermark happening in slow motion until the white box would pop up on top of the text.
When I was a student, and using a shareware or trial version of some software and wanted some printed output from it without a watermark, I printed to postscript (chose a printer that supported postscript and the driver used it instead of rasterized images), but using a file instead of a printer.
I could then open up the postscript, delete the commands that rendered the watermark, save it, then I converted it to PDF so it would be easy to print.
It's actually quite easy to open the pdf and see that there are several different elements per page to the document, eg the main text, an image, the footer, the title.
Randomly removing these by trial and error will usually quite easily allow you to find the watermark and nix it, with the advantage that even a sophisticated recipient will not be able to find out from the pdf file what the watermark was.
I then convert the image to grayscale only. Then I apply a filter so that only 16 colors are used. And I then adjust brightness/contrast so that "white is really white". It's all scripted: "screenshot to PDF". One of my oldest shell script.
16 shades of grey (not 50) is plenty enough for text to still be smooth.
I do it for several reasons, one of them being I often take manual notes on official documents (which infuriates my wife btw) but then sometimes I need to then scan the documents and send them (local IRS / notary / bank / whatever). So I'll just scan then I'll fill rectangle with white where I took handnotes. Another reason is when there's paper printed on two sides, at scan times sometimes if the paper is thin / ink is thick, the other side shall show.
I wonder how that'd work vs adversarial inputs: never really thought about it.
Personally, I only trust an image manipulation tool to put down solid colored blocks, or something that does not involve the source pixels when deciding on the redacted pixel. Formats like PDF are just so complicated to trust.
This is what I do while sharing such images. I crop out those parts first and then take another screenshot. I do not even risk painting over and then take another screenshot. I have been doing this forever.
In practical terms, a more convenient way to achieve this is just printing the document to a PDF, which rasterises the visible layer into what the printer would see. Most pdf tools support this.
That seems like a dangerous approach. Though printer drivers do often use rasterization, especially when targeting cheap printers, many printers can render vector graphics and text as well. Print-to-PDF will often use the later approach, unless of course the source program always rasterizes it's output when sending it out to the printer driver, or the used Print-to-PDF driver is particularly stupid.
You can, but I don't trust software for these types of niche but critical tasks hah. Next thing I know I'd be reading a headline about how "bug in print to PDF actually retains XYZ metadata"
They're not 'hacks' it's the people doing the redaction making beginner mistakes of not properly removing the selectable text under the redactions. They're either drawing black rectangles over the text or highlighting it black neither of which prevents the underlying text from being selected.
Keeping that secret would require sponaneous silence from everyone looking at these docs which is just not possible.
The whole thing is just too suspicious. Too good to be true. What's the chance of this being some 4D chess where the government has already edited the files, and then presented them as redacted so the "unredacted (but edited)" version looks more genuine?
> What's the chance of this being some 4D chess where the government has already edited the files, and then presented them as redacted so the "unredacted (but edited)" version looks more genuine?
With how they have pushed out any career public servants who were good at their jobs in favor of sycophants and loyalists, I'm not sure government organizations are still capable of playing 4D chess, if they ever were.
Please share your redacting tricks as loudly as you can, but only the ones that allow retrieving the original text. I'd love Google and the AIs to spout bad censoring tricks as much as possible.
This was my initial reaction to this news. I mean think about it
The Trump team knows that nobody is gonna buy whatever they put out as being the full story. Isn't this just the perfect way to make people feel like they got something they weren't supposed to see? They can increase trust in the output without having to increase trust in the source of it
And as far as I've heard there hasn't been anything "unredacted" that's been of any consequence. It all just feels a little too perfect.
Black square vs redaction tool difference is well known if someone's job involves redacting PDF or just working with PDF. It's most likely that additional staffs were pulled in and weren't given enough training.
Colleagues whose full time job is doing this sort of thing for various bits of the government have told me this is exactly the case here. People from all over the government have been deputized to redact these documents with little or no prior training.
Let people believe it's deliberate sabotage. Unfortunately, in real life, minions of a dictator serve the dictator; they don't risk their live or safety for a noble cause. Any screw-ups are a result of gross incompetence that is typical for every dictatorship.
It seems insane that nobody at the other end runs something as simple as MAT or imagick (twice) over it to take the text layers out before uploading though. I hope this is at least partially intentional.
My understanding is that many people were fired and replaced by loyalists at the FBI. I think there are a lot of incompetent people working there right now.
Any major documents/files have been removed all together. Then the rest was farmed out to anyone they could find with basic instructions to redact anything embarrassing.
Since there's absolutely zero chance anyone in the administration will ever be held accountable for what's left, they're not overly concerned.
The thing that I've been waiting to see for years is the actual video recordings. There were supposedly cameras everywhere, for years. I'm not even talking about the disgusting stuff, I'm talking security for entrances, hallways, etc.
The FBI definitely has them, where are they?
What about Maxwell's media files? There was nothing found there? Did they subpoena security companies and cloud providers?
The documents are all deniable. Yes video evidence can now be easily faked, but real video will have details that are hard to invent. Regardless, videos are worth millions of words.
Reporting is that they had a basically impossible deadline and they took lawyers off of counterintelligence work to do this. So a conscious act of resistance is possible, but it's a situation where mistakes are likely - people working very quickly trying to meet a deadline and doing work they aren't that familiar with and don't really want to be doing.
It seems like a common tactic by this administration is to just not do what they are required to do until they have been told 50 times and criminal charges are being filed. I suspect the actual truth here is 'don't do this' turned into 'you have 1 day to do this and keep my name out of the release' which led to lots of issues. They probably spent more time deciding the order of pages to release, and how to avoid releasing the things damaging to the administration, than actually doing the work needed to release it. Now they will say 'look, see! You didn't give us enough time and our incompetence is the proof'
Given the sheer number of people they had to pull in and work overtime to redact Trump's name as well as those of prominent Republicans and donors as per numerous sources within the FBI and the administration itself, incompetence is likely for a chunk of it.
It’s funny that this effort, the largest exertion of FBI agents second only to 9/11, seems to be unprepared to redact. Cynically, I’m prepared for it to be part of a generative set of PDFs derived from the prompt “create court documents consistent with these 16 PDFs which obscure the role of Donald Trump between 1993 and 1998.”
For context, lawyers deal with this all the time. In discovery, there is an extensive document ("doc") review process to determine if documents are responsive or non-responsive. For example, let's say I subpoenaed all communication between Bob and Alice between 1 Jan 2019 and 1 Jan 2020 in relation to the purchase of ABC Inc as part of litigation. Every email would be reviewed and if it's relevant to the subpoena, it's marked as responsive, given an identifier and handed over to the other side. Non-responsive communication might not be eg attorney-client communications.
It can go further and parts of documents can be viewed as non-responsive and otherwise be blacked out eg the minutes of a meeting that discussed 4 topics and only 1 of them was about the company purchase. That may be commercially sensitive and beyond the scope of the subpoena.
Every such redaction and exclusion has to be logged and a reason given for it being non-responsive where a judge can review that and decide if the reason is good or not, should it ever be an issue. Can lawyers find something damaging and not want to hand it over and just mark it non-responsive? Technically, yes. Kind of. It's a good way to get disbarred or even jailed.
My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts, particularly when the only incentive is keeping their job and the downside is losing their career. It could happen.
What I suspect is happening is all the good lawyers simply aren't engaging in this redaction process because they know better so the DoJ had the wheel out some bad and/or unethical ones who would.
What they're doing is in blatant violation to the law passed last month and good lawyers know it.
There's a lot of this going on at the DoJ currently. Take the recent political prosecutions of James Comey, Letitia James, etc. No good prosecutor is putting their name to those indictments so the administration was forced to bring in incompetent stooges who would. This included former Trump personal attorneys who got improerly appointed as US Attorneys. This got the Comey indictment thrown out.
The law that Ro Khanna and Thomas Massey co-sponsored was sweeping and clear about what needs to be released. The DoJ is trying to protect both members of the administration and powerful people, some of whom are likely big donors and/or foreign government officials or even heads of state.
That's also why this process is so slow I imagine. There are only so many ethically compromised lackeys they can find.
Fine, but the teeth of this act belong to some future justice department. I predict Trump will issue blanket pardons for everyone involved, up to Bondi; and that none of them will respect a congressional subpoena.
> My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
> So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts,
Political redaction in this release under the Epstein Transparency Act is an overt, illegal act.
Does that reconfigure your estimation of whether DoJ attorneys that aren't the Trump inner-circle loyalists installed in leadership roles might engage in resistance against (or at least fail to point out methodological flaws in the inplmentation of) it?
I tried to reproduce this - turns out the affected files weren't in the data sets recently released, but other files on the DOJ site (now taken down).
I guess the big take-away is scrape everything ASAP when it comes out. I haven't found any meaningful differences yet, but file hashes are different in the published data set zip files available today versus when Archive.org took a snapshot a few days ago.
I did write a bit of a tool which will detect and log and dump the text of affected PDF's, since redacting via drawing black boxes as well as using dark-colored highlights are both programmatically detectable. Pretty trivial to do so. Happy Holidays for anyone else who has the day off!
Ah, you new 'round these parts? It's unfashionable to speak directly--we must fragment, hypothesize, add complexity and nuance rather than simply leave someone's slightly vague statement uncorrected. -_-
HN rewards "technical discussion" in controversial threads, even when it's not salient or intellectually gratifying. Touching on the political implications is enough to split opinion and guarantee your place towards the middle/bottom of the thread.
I noticed my most recent nitpick comment got a significant number of upvotes. I spent some time today wondering if HN needed a way to indicate something is a nitpick and cause the votes on it to carry less weight in the sorting. Because if the nitpick is valid I don't think downvotes are appropriate since people might end up seeing it and having misconceptions corrected, but it also shouldn't detract from discussions on the meat of the post.
Of course I'm probably the odd one out, wanting to apply that modifier to my own nitpick comments, so that idea probably wouldn't end up being very useful in general.
(There is also some irony in me commenting on your comment here where it's completely unrelated to the actual post...)
Its not a hack to copy and paste text that is part of the document data. The incompetence of the people responsible to comply with the law doesnt mean its reasonable to label something a hack.
I’m not an attorney or anything, but the relevant federal statute is explicitly about unauthorized access of computer systems (18 USC 1030).
Opening someone else’s laptop and guessing the password would absolutely fall under that definition, but I think it’s very much questionable if poking around a document that you have legitimately obtained would do so.
If someone sends me a document with text in it that they meant to remove but didn't and then I read that text, I haven't hacked anything they're just incompetent.
Hacking is unauthorised use of a system. Reading a document that was not adequately redacted can hardly be considered hacking.
But copying and pasting text of publicly released documents is not illegal. Accessing someone’s computer is illegal.
While maybe it could fall under the umbrella of hacking in some general way, articles, and especially titles, should be more precise.
You guessing my password is not the same as a know and expected behavior of a program. Adobe has a specific feature to redact. PDF is a format known to have layers. Lawyers are trained on day one not to make this mistake. (I am a recovering lawyer). This is either incompetence or deliberate disclosure.
Yes, this is the digital equivalent of sticking a blank Post-it over text and calling it “redacted”. Mind-boggling that the same mistake has been made over and over again.
Hacking is any use of a technology in a way that it wasn’t intended. The redaction is so stupid as to almost appear intentional, so maybe you’re right, this isn’t hacking because maybe the information was intended to be discovered.
Also had this first thought, but then a hack could just be a way around a limit/lack of authorization, doesn't have to be unknown/sophisticated, so copy of black boxes fits
By serving up the PDF file I am being authorized to receive, view, process, etc etc the entire contents. Not just some limited subset. If I wasn't authorized to receive some portion of the file then that needed to be withheld to begin with.
That's entirely different from gaining unauthorized entry to a system and copying out files that were never publicly available to begin with.
To put it simply, I am not responsible for the other party's incompetence.
But this isn’t an unexpected technique it’s literally the core design of the pdf format. It’s a layered format that preserves the layers on any machine. Adobe has a redaction feature to overcome the default behavior that each layer can be accessed even if there is a top layer in front.
- Paul Manafort court filing (U.S., 2019) Manafort’s lawyers filed a PDF where the “redacted” parts were basically black highlighting/boxes over live text. Reporters could recover the hidden text (e.g., via copy/paste).
- TSA “Standard Operating Procedures” manual (U.S., 2009) A publicly posted TSA screening document used black rectangles that did not remove the underlying text; the concealed content could be extracted. This led to extensive discussion and an Inspector General review.
- UK Ministry of Defence submarine security document (UK, 2011) A MoD report had “redacted” sections that could be revealed by copying/pasting the “blacked out” text—because the text was still present, just visually obscured.
- Apple v. Samsung ruling (U.S., 2011) A federal judge’s opinion attempted to redact passages, but the content was still recoverable due to the way the PDF was formatted; copying text out revealed the “redacted” parts.
- Associated Press + Facebook valuation estimate in court transcript (U.S., 2009) The AP reported it could read “redacted” portions of a court transcript by cut-and-paste (classic overlay-style failure). Secondary coverage notes the mechanism explicitly.
A broader “history of failures” compilation (multiple orgs / years) The PDF Association collected multiple incidents (including several above) and describes the common failure mode: black shapes drawn over text without deleting/sanitizing the underlying content. https://pdfa.org/wp-content/uploads/2020/06/High-Security-PD...
I've seen lawyers at major, high-priced law firms make this same mistake. Once it was a huge list of individuals names and bank account balances. Fortunately I was able to intervene just before the uploaded documents were made public.
Folks around here blame incompetence, but I say the frequency of this kind of cock-up is crystal clear telemetry telling you the software tools suck.
If the software is going to leverage the familiarity of using a blackout marker to give you a simple mechanism to redact text, it should honour that analogy and work the way any regular user would expect, by killing off the underlying text you're obscuring, and any other correponding, hidden bits. Or it should surface those hidden bits so you can see what could come back to bite you later. E.g. It wouldn't be hard to make the redact tool simultaneously act as a highlighter that temporarily turns proximate text in the OCR layer a vibrant yellow as you use it.
This was basically the only reason we were willing to cough up like $400 for each Acrobat license for a few hundred people. One redaction fuckup could cost you whatever you saved by buying something else.
I would like to believe that the DOJ lacking the proper software might have something to do with DOGE. That would be sweet irony.
Absolutely. They know this is confusing, and they're bound and determined not to fix it. At the least, they need a pop-up to let you know that it's not doing what you might think it's doing.
Any lawyer should be like "I don't know what I'm doing here I'll get an expert to help" just like as a software developer I'd ask a lawyer for their help with law stuff...because IANAL uwu
No surprise non-experts muck it up and I don’t see that changing until they move to special-purpose tools.
Placing a black rectangle on a PDF is easier than modifying an image or removing text from that same PDF.
https://en.wikipedia.org/wiki/Hanlon%27s_razor
They fired/drove away/reassigned most of those who are competent in the executive branch generally, it is pretty easy to believe that none of those managing the document release and few of those working on it are actually experienced or skilled in how you do omissions in a document release correctly. Those people are gone.
What happens in a court case when this occurs? Does the receiving party get to review and use the redacted information (assuming it’s not gagged by other means) or do they have to immediately report the error and clean room it?
Edit: after reading up on this it looks like attorneys have strict ethical standards to not use the information (for what little that may be worth), but the Associated Press was a third party who unredacted public court documents in a separate Facebook case.
Typically, two copies of a redacted document are submitted via ECF. One is an unredacted but sealed copy that is visible to the judge and all parties to the case. The other is a redacted copy that is visible to the general public.
So, to answer what I believe to be your question: the opposing party in a case would typically have an unredacted copy regardless of whether information is leaked to the general public via improper redaction, so the issue you raise is moot.
I know and am friends with a lot of lawyers. They're pretty ruthless when it comes to this kind of thing.
Legally, I would think both parties get copies of everything. I don't know if that was the case here.
If it's worth so little to your eyes/comprehension you will have no problem citing a huge count of cases where lawyers do not respect their obligations towards the courts and their clients...
That snide remark is used to discredit a profession in passing, but the reason you won't find a lot of examples of this happening is because the trust clients have to put in lawyers and the legal system in general is what makes it work, and betraying that trust is a literal professional suicide (suspension, disbarment, reputational ruin, and often civil liability) for any lawyer... that's why "strict" doesn't mean anything "little" in this case.
Curious. I am not a litigator but this is surprising if you found support for it. My gut was that the general obligation to be a zealous advocate for your client would require a litigant to use inadvertently disclosed information unless it was somehow barred by the court. Confidentiality obligations would remain owed to the client, and there might be some tension there but it would be resolvable.
Of course if it's in the middle of an investigation it can spoil the investigation, allow criminals to cover their tracks, allow escape.
In such case the document should be vetted by competent and honest officials to judge whether it is timely to release it, or whether suppressing it just ensures that investigation is never concluded, extending a forever renewed cover to the criminals.
There was also a process on how to communicate top secret information, but these idiots prefered to use signal.
I'm completly lost on how you can be surprised by this at all? Trump is in there, tells some FBI faboon to black everything out, they collect a group of people they can find and start going through these files as fast as they can.
"When a clown moves into a palace, he doesn't become a king; the palace instead becomes a circus."
You can still open it with Illustrator if you want to see: https://obamawhitehouse.archives.gov/sites/default/files/rss...
[1] https://www.snopes.com/fact-check/birth-certificate/
The Trump 2.0 administration, in contrast, is staffed top to bottom with fools."
https://daringfireball.net/linked/2025/12/23/trump-doj-pdf-r...
That's not very competent.
> going analog is foolproof
Absolutely not. There are many way's to f this up. Just the smallest variation in places that have been inked twice will reveal the clear text.
https://www.theverge.com/2023/6/28/23777298/sony-ftc-microso...
https://www.vice.com/en/article/russian-spies-chemical-weapo...
Anyone remember how the Trump I regime had staff who couldn't figure out the lighting in the White House, or mistitled Australia's Prime Minister as President?
Dead Comment
You mean the guy who covered up for Epstein's 'suicide' and expected us morons to believe it?
This is a dumb way of doing that, exactly what "stupid" people do when their are somewhat aware of the limits of their competence or only as smart as the tech they grew up with. Also, this type of redaction eliminates the possibility to change text length, which is a very common leak when especially for various names/official positions. And it doesn't eliminate the risk of non-redaction since you can't simply search&replace with machine precision, but have to do the manual conversion step to printed position
https://www.minnpost.com/politics-policy/2007/11/you-can-swi...
IIRC there was a Slashdot discussion about it that went "Oh yeah, obviously you need to black out the face entirely, or use a randomized Gaussian blur." "Yeah, or just not molest kids."
Not that it matters much what the law says if the goal is to protect the man who hands out pardons...
Deleted Comment
Deleted Comment
Dead Comment
The offices responsible for redactions are usually in-house legal shops (e.g., an Office of Chief Counsel inside an agency like CBP) and the agency’s FOIA office. They’re often doing redactions manually in Adobe, which is slow, tedious, and error-prone. Because the process is error prone, the federal government gets multiple layers of review, justified (as DOJ lawyers regularly tell courts) by the need to “protect the information of innocent U.S. citizens.”
But the “bad at redactions” part isn’t an accident. It functions as a litigation tactic. Makes production slow, make FOIA responses slow, and then point to that slow, manual process as the reason the timeline has to be slow. The government could easily buy the kind of redaction tools that most law firms have used for decades. Purpose built redaction tools speed the work up and reduce mistakes. But the government doesn't buy those tools because faster, cleaner production benefits the requester.
The downside for the government is that every so often a judge gets fed up and orders a normal timeline. Then agencies go into panic mode and initiate an “all hands on deck.” Then you end up with untrained, non-attorney staff doing rushed redactions by hand in Adobe. Some of them can barely use a mouse. That’s when you see the classic technical failures: someone draws a black rectangle that looks like a redaction, instead of applying a real redaction that actually removes the underlying text.
I didn't really know how they slept at night, but I don't know how a lot of people sleep at night. I only had to be involved because I had to do the actual trawling through the emails. They spent their time trying to narrow the keywords that I'd have to search, and trying to figure out new definitions of the words "related to."
In another episode they’re trying to find out the length of a stealth submarine for construction planning purposes of a port or something, and they have to go through endless layers of security checks with the military that lead nowhere. In the end a reporter filming a documentary episode on the government agency tells them the length because they were allowed to film the submarine on another program.
Definitely recommend the show and my friends in government say it’s scarily accurate.
No, there isn't an enormous cohort of bureaucrats going to work every day, collectively wringing their hands and saying "haha, we're going to be STUPID today!"
I always felt paranoid (without any real evidence, just a guess) that there would always be a chance that anything done in software could be reversed somehow.
If you don't know the provenance of images you are putting black box on (for example because of a rogue employee intentionally wanting to leak them, or if the image sensor of your target had been compromised to leak some info by another team), your redaction can be rendered ineffective, as some images can be made uncroppable by construction .
(Self-plug : https://github.com/unrealwill/uncroppable )
And also be aware that compression is hiding everywhere : https://en.wikipedia.org/wiki/Compressed_sensing
If you compare an image to another image, you could guess by compression what is under the blocked part, that makes some sense to me conceptually, what I don't get is for the PDF specifically why does it compressing the black boxes I put have any risk? It's compressing the internal image which is just the black box part? Or are you saying the whole screenshot is an internal image?
Deleted Comment
(Note there's also other metadata in a PDF, which you may not want your recipient to know either.)
how much lower can they go ?!
- Leave NATO
- Start openly supporting Russia and North Korea
- Arrest whole International Criminal Court
- Preventively invade China
I'm more concerned with them dragging everyone else down, and someone much worse taking their place.
When opening the file in my slow computer, I could see all the rendering of the watermark happening in slow motion until the white box would pop up on top of the text.
I could then open up the postscript, delete the commands that rendered the watermark, save it, then I converted it to PDF so it would be easy to print.
Randomly removing these by trial and error will usually quite easily allow you to find the watermark and nix it, with the advantage that even a sophisticated recipient will not be able to find out from the pdf file what the watermark was.
16 shades of grey (not 50) is plenty enough for text to still be smooth.
I do it for several reasons, one of them being I often take manual notes on official documents (which infuriates my wife btw) but then sometimes I need to then scan the documents and send them (local IRS / notary / bank / whatever). So I'll just scan then I'll fill rectangle with white where I took handnotes. Another reason is when there's paper printed on two sides, at scan times sometimes if the paper is thin / ink is thick, the other side shall show.
I wonder how that'd work vs adversarial inputs: never really thought about it.
Let all the files get released first.
Then show your hacks.
Keeping that secret would require sponaneous silence from everyone looking at these docs which is just not possible.
With how they have pushed out any career public servants who were good at their jobs in favor of sycophants and loyalists, I'm not sure government organizations are still capable of playing 4D chess, if they ever were.
The Trump team knows that nobody is gonna buy whatever they put out as being the full story. Isn't this just the perfect way to make people feel like they got something they weren't supposed to see? They can increase trust in the output without having to increase trust in the source of it
And as far as I've heard there hasn't been anything "unredacted" that's been of any consequence. It all just feels a little too perfect.
And yes, I've heard of Hanlon's Razor haha
https://en.wikipedia.org/wiki/Hanlon%27s_razor
Deleted Comment
Any major documents/files have been removed all together. Then the rest was farmed out to anyone they could find with basic instructions to redact anything embarrassing.
Since there's absolutely zero chance anyone in the administration will ever be held accountable for what's left, they're not overly concerned.
The thing that I've been waiting to see for years is the actual video recordings. There were supposedly cameras everywhere, for years. I'm not even talking about the disgusting stuff, I'm talking security for entrances, hallways, etc.
The FBI definitely has them, where are they?
What about Maxwell's media files? There was nothing found there? Did they subpoena security companies and cloud providers?
The documents are all deniable. Yes video evidence can now be easily faked, but real video will have details that are hard to invent. Regardless, videos are worth millions of words.
For context, lawyers deal with this all the time. In discovery, there is an extensive document ("doc") review process to determine if documents are responsive or non-responsive. For example, let's say I subpoenaed all communication between Bob and Alice between 1 Jan 2019 and 1 Jan 2020 in relation to the purchase of ABC Inc as part of litigation. Every email would be reviewed and if it's relevant to the subpoena, it's marked as responsive, given an identifier and handed over to the other side. Non-responsive communication might not be eg attorney-client communications.
It can go further and parts of documents can be viewed as non-responsive and otherwise be blacked out eg the minutes of a meeting that discussed 4 topics and only 1 of them was about the company purchase. That may be commercially sensitive and beyond the scope of the subpoena.
Every such redaction and exclusion has to be logged and a reason given for it being non-responsive where a judge can review that and decide if the reason is good or not, should it ever be an issue. Can lawyers find something damaging and not want to hand it over and just mark it non-responsive? Technically, yes. Kind of. It's a good way to get disbarred or even jailed.
My point with this is that lawyers, which the Department of Justice is full of, are no strangers to this process so should be able to do it adequately. If they reveal something damaging to their client this way, they themselves can get sued for whatever the damages are. So it's something they're careful about, for good reason.
So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts, particularly when the only incentive is keeping their job and the downside is losing their career. It could happen.
What I suspect is happening is all the good lawyers simply aren't engaging in this redaction process because they know better so the DoJ had the wheel out some bad and/or unethical ones who would.
What they're doing is in blatant violation to the law passed last month and good lawyers know it.
There's a lot of this going on at the DoJ currently. Take the recent political prosecutions of James Comey, Letitia James, etc. No good prosecutor is putting their name to those indictments so the administration was forced to bring in incompetent stooges who would. This included former Trump personal attorneys who got improerly appointed as US Attorneys. This got the Comey indictment thrown out.
The law that Ro Khanna and Thomas Massey co-sponsored was sweeping and clear about what needs to be released. The DoJ is trying to protect both members of the administration and powerful people, some of whom are likely big donors and/or foreign government officials or even heads of state.
That's also why this process is so slow I imagine. There are only so many ethically compromised lackeys they can find.
> So in my opinion, it's unlikely that this is an act of resistance. Lawyers won't generally commit overt illegal acts,
Political redaction in this release under the Epstein Transparency Act is an overt, illegal act.
Does that reconfigure your estimation of whether DoJ attorneys that aren't the Trump inner-circle loyalists installed in leadership roles might engage in resistance against (or at least fail to point out methodological flaws in the inplmentation of) it?
I guess the big take-away is scrape everything ASAP when it comes out. I haven't found any meaningful differences yet, but file hashes are different in the published data set zip files available today versus when Archive.org took a snapshot a few days ago.
I did write a bit of a tool which will detect and log and dump the text of affected PDF's, since redacting via drawing black boxes as well as using dark-colored highlights are both programmatically detectable. Pretty trivial to do so. Happy Holidays for anyone else who has the day off!
Of course I'm probably the odd one out, wanting to apply that modifier to my own nitpick comments, so that idea probably wouldn't end up being very useful in general.
(There is also some irony in me commenting on your comment here where it's completely unrelated to the actual post...)
Dead Comment
Please change the title.
You don't need to do some sophisticated thing for it to be considered hacking
Opening someone else’s laptop and guessing the password would absolutely fall under that definition, but I think it’s very much questionable if poking around a document that you have legitimately obtained would do so.
Hacking is unauthorised use of a system. Reading a document that was not adequately redacted can hardly be considered hacking.
By serving up the PDF file I am being authorized to receive, view, process, etc etc the entire contents. Not just some limited subset. If I wasn't authorized to receive some portion of the file then that needed to be withheld to begin with.
That's entirely different from gaining unauthorized entry to a system and copying out files that were never publicly available to begin with.
To put it simply, I am not responsible for the other party's incompetence.
HN discourages editorializing headlines.
While I wouldn't call it a "hack," common usage even here on HN isn't limited to "to gain illegal access to (a computer network, system, etc.)" [0]
[0] https://www.merriam-webster.com/dictionary/hack
Still technically a hack.