Readit News logoReadit News
gyomu · a month ago
Like other commenters point out, automatic OCR on Apple platforms is a godsend, and it's such a great use of our modern AI capabilities that it should be a standard feature in every document viewer on every platform.

Another thing I wish was more common is metadata in screenshots, especially on phones. Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded (eg instagram.com/p/ABCD1234/). If I take a screenshot in the browser, include the URL that's being viewed (+ path to the DOM element in the viewport). If I take a screenshot in a maps app, include the bounding coordinates. If I take a screenshot in a PDF viewer, include a SHA1 hash of the document being viewed + offset in the document so that if I send the screenshot to someone else with the same document, it can seamlessly link to it. Etc etc.

There are probably privacy concerns to solve here, but no idea is new in computer science and I'm pretty sure some grad student somewhere has already explored the topic in depth (it just never made it to mainstream computing platforms).

It feels like screenshots have become the de facto common denominator in our mobile computing era, since platforms have abstracted files away from us. Lots of people who have only ever used phones as their main computing devices are confused when it comes to files, but everyone seems to understand screenshots.

Also, necessary shout out to Screenshot Conf! https://screenshot.arquipelago.org

avree · a month ago
OCR is a godsend, 100% agree. Not a fan of the metadata idea personally, 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing (like your examples of GPS coordinates for a maps app, url for browser) sounds like a privacy nightmare, and like something that will make a very reliable core feature much harder to use.

There are companies like Evernote/Zight/CloudApp that at one point tried some things like this, but they never really caught - I think because it's pretty easy to add annotations yourself or some note of your own - and a screenshot not "trying to do everything" is part of what makes them useful & ubiquitous.

manwe150 · a month ago
But apps (most notably Snapchat comes to mind) have been doing exactly that analysis though. Theoretically they could then [offer to] edit the photo immediately afterwards to add context, since they had access to the photo roll or files https://android.stackexchange.com/a/119767
aexer0e · a month ago
> 'screenshotting' is done by the operating system, and exposing ways to allow apps to know that they were 'in' the screenshot plus expose some metadata of their choosing sounds like a privacy nightmare

The apps don't have to know a screenshot was taken for this feature to exist; they could write into a passive "in case a screenshot is taken, use this as metadata" object data field that the OS uses when the user takes a screenshot

m463 · a month ago
I agree

deep linking allows apps to know/intercept known URLs and do "things". I don't know if the screenshot mechanism would involve this.

I do know that some things cannot be screenshotted. On macs this is any HDCP image on the screen (shows up as a blank rectangle). On android I believe some apps cannot be captured in a screenshot. Don't know about ios.

paulmooreparks · a month ago
OP here. You raised a point that I should have mentioned in the article: screenshots of web pages that don't include the URL. I'm perfectly fine with screenshots of browser windows, since the context is almost always relevant. The system I work on right now puts a lot of useful context into the URL, but it's almost never included in the initial screenshot, so I have to ask for that. Of course, I generally ask for it as text so that I don't have to try to type the whole thing without making a mistake.
heddycrow · a month ago
I was content to write the original off as "to each his own", but this one I feel you on.

Maybe the problem is sharing without caring and/or without being aware.

Case in point, folks capture large blocks of text as you mentioned and paste it into slack which converts certain characters unless included in a code block. This can be much worse than sharing a screenshot.

Please know the best way to share what you are sharing when you share. I've had to come to expect this request will not be honored.

I also might be guilty of not honoring sharing with caring myself. For example, I didn't read this entire thread before posting; others may have made this exact point already.

pests · a month ago
> It feels like screenshots have become the de facto common denominator in our mobile computing era,

Google/Apple have taken notice. Both have recently redone their full-screen post-screenshot UI to include AI insights / automatic product searches / direct chat with Gemini/LLM / etc.

Its true everyone uses screenshots to save things they are interested in or want to look up / search more of / save for reason and this UI is the perfect place to insert themselves.

NooneAtAll3 · a month ago
> Eg if I take a screenshot of a picture in Instagram, I wish a URL of the picture was embedded

bloody hell of all privacy concerns

gyomu · a month ago
Why? Either it's public content, and it can be traced back manually anyways (screenshots from social media posts typically include the username), or it's private content and knowing the URL slug doesn't change anything (the fact that you're sharing a screenshot of private content is the privacy breach, not the fact that some UUID is embedded).
flemhans · a month ago
Fun side-fact: The original MacPaint, while in development, had an "ocr" copy feature, albeit much simpler of course.

It didn't make it in the release version out of fear that people would use MacPaint as a Word Processor.

ggirelli · a month ago
Why spend electricity and time to read the text in a screenshot, and then more time making sure there are no mistakes. When the sender could have just copied the original text?
pjc50 · a month ago
> metadata in screenshots

Interesting idea, but I think this understates how often screenshots are "slightly adversarial". I'm taking a screenshot because the app or webpage has deliberately made it hard to select text for some reason. Or the UI is just annoying about selection (e.g. trying to select the text from a link anchor without being considered as having clicked on it, which is fiddly on Android).

Then there's the question of fully adversarial screenshots. I can definitely see why people want "I want to send this to someone and discourage them from seamlessly resharing it", but at the same time: it's my screen. Not generally a problem on desktops unless you're dealing with video content.

PeterStuer · a month ago
Honestly, why are you developing software if you are "confused when it comes to files"?

Your OCR isn't going to help you for the missing off-screenshot clipped parts.

epigramx · a month ago
OCR is not AI
1gn15 · a month ago

    AI is whatever hasn't been done yet.
        — Larry Tesler, 1970
Source: https://en.wikipedia.org/wiki/AI_effect

MathMonkeyMan · a month ago
Yes but they're quite good at it. Reliable OCR is font dependent, whereas I think a lot of models just kind of figure it out regardless.
pylotlight · a month ago
But AI can OCR
ponector · a month ago
On apple platforms it definitely is an AI. Apple intelligence!
9rx · a month ago
AI says that OCR is AI.
radarsat1 · a month ago
God of the gaps
crazygringo · a month ago
I disagree. I use screenshots all the time, because it:

- Preserves the full 80 character width without line-wrapping, which destroys readability

- Guarantees monospace, so tabular data doesn't get all misaligned

- Preserves a good coding font, so it doesn't come out as some hairline-width Courier on the other end

- Preserves syntax highlighting, very helpful

Obviously if somebody needs a whole file or whole log, then send the whole thing as an attachment. But very often I'll still include a screenshot of the relevant part. With line numbers, it's not difficult to jump to the right part of the attached file.

Screenshots are incredibly useful for keeping code and terminal output looking like code and terminal output, and not getting completely mangled in an e-mail or chat message being read on a mobile device or in a narrow column.

tom_ · a month ago
Key things required for posting to the chat: people reading can read it, people reading can copy and paste it, and people searching can actually find it. It doesn't need to exactly match what you might see in a text editor. Anybody wanting to look at the actual text in context won't be doing it in the chat, but will rather be opening the file of interest in the appropriate tool, and examining it that way; anybody stuck reading the text only in the chat is probably on their phone or something and will be best served by being able to easily see all of it.

For reading purposes, the question of screen width is best left to the reader. They will have the window set to their preferred width, possibly limited by screen size. If the text has to wrap, so be it. It's better that than having to try to squint at your 3713x211 screen grab on an iPhone (portrait orientation). Also bear in mind that even the most basic of font and colour choices (large/small font, dark/light mode) can cause accessibility issues for some readers.

For copying and pasting purposes, images suck. Yes, macOS can do it, sort of, and I expect Windows 11 can do it too, probably to about the same extent. But it's not as easy as having the text right there in copyable form.

For searching purposes, ditto - only worse, because at least when you copy and paste and it comes out wrong, you'll notice. When you search: you just won't find the thing. You'll never know.

crazygringo · a month ago
> people reading can read it

Which is why screenshots help, for the reasons I gave

> people reading can copy and paste it

Why? If there's something like a user ID or error code that the person needs as text, I'll paste that separately. Stuff I include in a screenshot is for understanding, not copying and pasting.

> and people searching can actually find it.

Which is what the message text around the screenshot is for. Which actually includes the relevant keywords, not random tabular data or lines of code which just add noise to search.

> Anybody wanting to look at the actual text in context won't be doing it in the chat, but will rather be opening the file of interest in the appropriate tool, and examining it that way;

Except when they aren't/can't. The whole point of screenshots is for when they can't access something easily that way, which happens for a million different reasons.

> anybody stuck reading the text only in the chat is probably on their phone or something and will be best served by being able to easily see all of it.

Which is what images make far easier to read without being messed up.

> For reading purposes, the question of screen width is best left to the reader. They will have the window set to their preferred width, possibly limited by screen size. If the text has to wrap, so be it.

No it's not. Wrapping destroys indentation and alignment. It's not "so be it", it goes from readable to literally unreadable. I can't change the width of my phone or a lot of viewing areas. I can always scroll an image horizontally though.

> It's better that than having to try to squint at your 3713x211 screen grab on an iPhone (portrait orientation).

Which is why zooming and panning exist. I don't know where you're getting something silly like 3713 pixels though. But if that's the width of some massive table whose layout needs to be preserved, then so be it.

dietr1ch · a month ago
> - Preserves the full 80-character width without line-wrapping, which destroys readability

Readability is on the eyes of the final user, they are free to use whatever narrow column width they prefer.

> - Guarantees monospace, so tabular data doesn't get all misaligned

When was the last time a computer shipped without a monospace font? This points at the rare occasion where there's a problem with the setup, but you could also argue that maybe there's a system with a broken image decompressor.

> Screenshots are incredibly useful for keeping code and terminal output looking like code and terminal output, and not getting completely mangled in an e-mail or chat message being read on a mobile device or in a narrow column.

Are you complaining about GMail's rendering maybe? Its awful[^0], but that's more of a GMail problem that could be solved if they wanted.

[^0]: Column width unbounded even on 4k monitors. Weird and inconsistent font sizes across different fonts (monospace is smaller). Reads poorly on phones too.

crazygringo · a month ago
> Readability is on the eyes of the final user, they are free to use whatever narrow column width they prefer.

For plaintext sure. Not for code or tabular data. It destroys indentation and destroys column alignment and interleaves parts of rows. It's a horrid mess.

> When was the last time a computer shipped without a monospace font?

When was the last time I have to read something in a font I can't control that is forced to be proportional? Oh, constantly. Literally all the time.

> Are you complaining about GMail's rendering maybe?

Yes, and messaging clients, and chat clients, and everything unless it has actual dedicated code blocks that render with a horizontal scroll bar. Which are the exception as opposed to the rule.

eviks · a month ago
All of the points are mostly personal and thus should never be forced on anyone else via a screenshot (as a general rule, though depends on the content)

- no line wrapping destroys readability more since you can't toggle it in a screenshot. Imagine that url on the screenshot taking 3 lines instead of 1 and pushing useful text off screen. Also, forcing 80 on a user of a wider monitor is barbaric.

- And if there is no tabular data (and autoformatted code doesn't do tabular code) you've just lost nice proportional text for nothing

- Syntax highlighting as is commonly used (and as is shown in the blog screenshot) is useless, and is anyway unlikely to match reader's convention

> being read on a mobile device or in a narrow column.

So it can't even be read properly, you have to scroll the screenshot left and right... instead of just reading

muppetman · a month ago
100% this. I fully disagree with the post - screenshots show context/colour/formatting etc that often doesn't even translate properly if you DO try to paste it into some IM or other "text swapping" application.

Sure, if you want someone to reproduce the text of course you'd send them actual text. But to show a problem, a picture is, as they say, worth 1000 words.

autoexec · a month ago
Most of the time if someone is sending me code as text (which is by far preferable to a screenshot) I'm copying it out and pasting it into my own editor.

That way I get a width appropriate for my screen (which may be different from yours), text that's still aligned correctly, and uses the font of my choosing (which may differ from yours), and still has syntax highlighting (using the sizes/colors/styles that I'm accustomed to).

Sending the whole file (or a link to it) works well too but screenshots are absolutely likely to be some level of annoying for anyone who isn't you no matter how helpful you think you're being.

Forcing someone else to view code the way you like seeing it isn't always going to be completely obnoxious for them (although you might be surprised by what some people find acceptable) but it does make it difficult/impossible to view it the way I like seeing it (in addition to losing the ability to search/edit)

unixplumber · a month ago
> still has syntax highlighting (using the sizes/colors/styles that I'm accustomed to)

Where I work I find it's usually the youngins using a ridiculous light on dark color scheme that post screenshots of code. Are we still stuck in the '80s? And are they pining for a time they never experienced themselves? Computer hardware has been capable of displaying the more civilized and easier-on-the-eyes dark on light color schemes since then.

crazygringo · a month ago
This isn't about sending 300 lines of code in a screenshot or something.

This is about, "hey, look at these 6 lines which is where I think the problem might be". It's not for pasting in a separate editor, why would you do that? It's about providing quick context even if you're on your phone.

If you want to go inspect that spot in the file once you're back at your computer then go do that. The screenshot is to save you time because often you can answer just based on it.

rester324 · a month ago
I think slack and other mail/chat clients rescale the image and apply aggressive compression on it. Sometimes they even crop the image or make it so that you need to scroll left and right. Also your syntax highlighting might be annoying to others and might make legibility worse for the receiver, and as other people pointed out most chat/mail clients support monospace code blocks. Plus I agree with all the things that the blog post author pointed out.
crazygringo · a month ago
> or make it so that you need to scroll left and right

That's the point.

If have an ASCII table that is 150 character columns wide, I'm sending you a screenshot so that you can scroll left and right, rather than have everything end up in a jumble of interleaved overflowing lines that turn into unreadable spaghetti.

This is a feature, not a bug. Not everyone is opening the message on a full-width monitor.

skydhash · a month ago
My only use of code screenshot is to emulate the "take a look at my screen workflow". It's only meant for the other person to take a quick glance at. Anything further than that is transmitted as a code block or text file.
riazrizvi · a month ago
Yeah. OP has an egocentric bias - it’s not the norm in the world of work sharing that you can faithfully reproduce the live/contextual environment of the sender given the raw string.

(OP’s blog purports to be pertinent to freelance software development).

jahsome · a month ago
What about accessibility?
hamasho · a month ago
I genuinely thought this was a satire until I read `Preserves syntax highlighting, very helpful`.
pwdisswordfishy · a month ago
> Preserves the full 80 character width without line-wrapping, which destroys readability

How line wrapping interacts with readability is for the reader software to worry about, not the control-freak author. Line length higher than the device width can handle can be even worse for readability than lines wrapped in the wrong places. It's one of the reasons I loathe PDFs.

> Preserves a good coding font, so it doesn't come out as some hairline-width Courier on the other end

If the reader wants to have their code in hairline-width Courier, that's their right. It's not for the control-freak with awful taste in fonts to decide.

> Preserves syntax highlighting, very helpful

Forces a particular style of syntax highlighting upon the reader without giving them an easy recourse to change it. No thanks.

> Guarantees monospace, so tabular data doesn't get all misaligned

The closest thing to a decent argument. Except pretty much any text input that accepts embedded images will usually also provide a monospace formatting option, so there is no need to screenshot text here either.

jolmg · a month ago
It also allows drawing on top. I find it convenient to screenshot / take picture of some code / error log / terminal output then circle some bit, draw some arrows, or do other types of drawings to draw attention to things.

Emphasizing bits on code-formatted text is not as straightforward and would typically be ambiguous (was this punctuation meant for emphasis or was it part of the original text?).

Pictures are also quick to make and grasp, which is a plus when having to quickly diagnose something with others.

The main complaint in the article about having to type into a search bar instead of copying and pasting doesn't make sense. It's like a word or two at most which you'd have to search. It might even be faster to type than to copy and paste (moving cursor around, etc.).

The error log complaint would also be valid had it been text instead of a screenshot. That was a problem of not sharing enough context, not the format of the message.

nrhrjrjrjtntbt · a month ago
```

Is widely supported to add code. E.g. in Slack, Confluence...

Osyris · a month ago
Both examples you gave have pretty rough or nonexistent syntax highlighting support.
crazygringo · a month ago
It's not widely supported. It's not in e-mail or SMS or Gmail or Docs or Word or a hundred other pieces of software where I communicate.

Yes, I use that wherever it exists. It's great, and you're lucky when it's there. I wish it was everywhere. But as long as it's not, for everything else, there's screenshots.

phito · a month ago
It's so bad in teams and they put a rather small character limit on it....
nvllsvm · a month ago
Slack seems to always wrap code blocks. It makes python particularly shit to read.
smallerize · a month ago
Even Google chat can do it.
thomasfromcdnjs · a month ago
I feel like I've seen good solutions to both problems before, aren't there vscode extensions that let you just select the code and create a sharable link with all the view type options to appear everyone?

e.g. https://snippetshare.dev/

kelnos · a month ago
I do not care about any of those criteria you mention. I want something I can copy and paste myself. Send me text. Just the text.
johannes1234321 · a month ago
Except it doesn't use my preferred font, not my don't size, not my colors and I can't copy parts of it as easily and then the stupid chat app scales the image for some reason ot another.

Deleted Comment

nacozarina · a month ago
Many greatly appreciate receiving an accurate image instead of garbled text.
bluedino · a month ago
> Preserves a good coding font, so it doesn't come out as some hairline-width Courier on the other end

Let me introduce you to Putty users who never change the default font...

Affric · a month ago
See, imo this is why having a good embedding for code is so important. The best of both worlds is available.
sen · a month ago
The ability to highlight/copy/etc text on Macs/iOS these days is such a killer feature. I use it almost every day, both for copying/translating text in screenshots or taking photos of text to then copy it into my notes later (eg school notice boards or event posters etc).
sqrt_1 · a month ago
Windows built-in snipping tool (shortcut Win + Shift + S) also has a text actions button to extract text.
vismit2000 · a month ago
jiri · a month ago
I am using this tool all the time and I did not know this! Thanks!
whycome · a month ago
I have to say, the ability to quickly copy and paste between macbook and iphone is such a great flow
tylerflick · a month ago
Totally agree. It’s one of those features that feels like magic. So handy for those digital purchase codes you get with blu-rays.
Jedd · a month ago
Yup - I recall when this feature was released, maybe a dozen years ago, with KDEConnect. Real QoL improvement. Glad to hear some other OS's are catching up.
ca_tech · a month ago
It gave me a "living in the future" feeling the day someone sent me a picture of a phone number through imessage. Barely thinking, I pressed on the phone number in the image and I was prompted to call it. It was like technology and primitive intuition teamed up to create that moment.
cosmic_cheese · a month ago
Part of what makes it so good is that it's everywhere. Preview, QuickLook, QuickTime Player (yes, videos get OCR'd too!), any app that uses the system frameworks for displaying media.

This includes Safari, where not only do images (inline or otherwise) have selectable text, but the built in translator leverages that text and uses it to translate images, too! This is super useful for translating Japanese webpages in particular, which tend to have tons of text baked into images.

internetter · a month ago
I use Shottr, I take a screenshot of a screenshot and hit “O” immediately after. Saves me from first saving the file to open it in the native viewer
alyyousuf7 · a month ago
I have Shottr keyboard shortcut (cmd+opt+control+o) setup to allow me to OCR from whatever is on the screen and copy the text to clipboard. So whether someone shares code or error log as screenshot on slack, it’s 3 steps: 1. cmd+opt+control+o 2. select the area to OCR 3. cmd+v in vscode or google
WorldPeas · a month ago
this. makes me wish more image viewers would ocr->png special field->have location-attached selectable text like a pdf
ivewonyoung · a month ago
OneNote had this for a long time.
sumnole · a month ago
Aside from copying text from images, OneNote can also make text in images searchable.
romaniitedomum · a month ago
Screenshots of text! Luxury! In my day, the screenshots were embedded in a Word document too.

But I can't be the only one appalled at the suggestion to use an LLM to parse the text. The sheer, prodigious waste of computing power, just to round-trip text to an image and back to text, when what's really missing is a computer user interface that makes it as simple to send text or other snippets as it is to send screenshots.

RedShift1 · a month ago
It can get worse. People use their phones to send a literal photo of their screen. This happens almost daily in one of the programming discords I'm member of and it drives me crazy, these are supposed to be (future) programmers and they don't even know how to make a proper screenshot??
tonyedgecombe · a month ago
When I'd get that from a customer I knew I was in for a whole lot of pain.
PeterStuer · a month ago
I knew an enterprise where Word documents were used as "folders" for Word documents. The "top" document would have a collection of files pasted in to them (the actual file objects, not just the file content).
FabHK · a month ago
Obligatory XKCD:

https://xkcd.com/2116/

thundergolfer · a month ago
This is rule 7 of 10 in my post 'How to Ask for help in Slack'[1]

1. I have ‘rubber duck debugged’ my own question.

2. I checked that this question hasn’t been asked before.

3. I have noted in my message what I’ve tried.

4. I have avoided the ‘XY problem’ by clearly detailing the core problem, X.

5. I have provided specifics of my issue, not vague references or descriptions.

6. I have provided URL links to relevant content, and where possible the URL links are immutable.

7. I have not included screenshots of text in my message.

8. I have not used obscure acronyms or abbreviations.

9. I have formatted my message well, particularly paying attention to code formatting and headings.

10. I have not just said “hi” and waited for a reply.

Like other posters, I don't think Apple OCR is sufficient to make up for screenshotting. The biggest problem is search.

1. https://thundergolfer.com/communication/slack/2021/02/24/how...

icehawk · a month ago
> Like other posters, I don't think Apple OCR is sufficient to make up for screenshotting. The biggest problem is search.

Both Spotlight and Photos will find text in screenshots.

superdocs1 · a month ago
https://screenshotocr.app does this on Windows (I'm the founder).
thundergolfer · a month ago
I don't think Slack does, nor Github, which is where I most find screenshotting being a problem.
VladVladikoff · a month ago
Pretty much every new programmer I’ve ever hired has done this in their first few weeks. Every time I have to tell them why it’s so unhelpful to share screenshots of text instead of just pasting the text. Usually they learn. When they don’t I usually end up firing them, not for that reason but for others.

The reason I personally hate it is I am often working from my phone. And it’s much easier to read text rendered properly than pinch zooming text in an image. What’s worse is slack will downgrade images for mobile and you can’t even pinch zoom in fully.

albert_e · a month ago
My preference -- Link or attachment to the full document or code in context (if needed) ... along with screenshot of a relevant portion. (Many times the former is optional because there is enough context already.)

It is extra work to do both but I like to be through even when asking for help. Even if the other side doesn't need it -- because I myself might not remember all the nuances when I refer to that conversation later.

Also screenshot preserves (before any fixes) the exact way things looked when I confronted a certain situation. The visual of the screenshot serves as a much stronger reminder of that situation and my thinking ...way better than mere copy pasted text.

kjellsbells · a month ago
Thing is, screenshots are fast, and the same technique works across every app. If you work in a field where you might be filing trouble reports from web app A, native app B, website C,... its just easier to use the same technique across the board. Win+S or CMD Shift 4, and move on with your life.

Not nice for the recipient maybe, but hella efficient for everyone else, and there are many more people in the latter camp than the former.