I use the same script as Dibby053, copied from stackoverflow but with some tweaks to work on kde,gnome and wayland as well as x11 and with some notifications on what state it is in.
I didn't test the x11/wayland check yet, but feel free to use it and report back.
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick
# on gnome: gnome-screenshot
# on kde: spectacle
# on x11: xsel
# on wayland: wl-clipboard
die(){
notify-send "$1"
exit 1
}
cleanup(){
[[ -n $1 ]] && rm -rf "$1"
}
SCR_IMG=$(mktemp) || die "failed to take screenshot"
# shellcheck disable=SC2064
trap "cleanup '$SCR_IMG'" EXIT
notify-send "Select the area of the text"
if which "spectacle" &> /dev/null
then
spectacle -r -o "$SCR_IMG.png" || die "failed to take screenshot"
else
gnome-screenshot -a -f "$SCR_IMG.png" || die "failed to take screenshot"
fi
# increase image quality with option -q from default 75 to 100
mogrify -modulate 100,0 -resize 400% "$SCR_IMG.png" || die "failed to convert image"
#should increase detection rate
tesseract "$SCR_IMG.png" "$SCR_IMG" &> /dev/null || die "failed to extract text"
if [ "$XDG_SESSION_TYPE" == "wayland" ]
then
wl-copy < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
else
xsel -b -i < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
fi
notify-send "Text extracted"
exit
I slightly modified your script to:
1. Clean up properly
2. Run spectacle in BG mode, so the window does not pop up after screenshotting.
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick
# on gnome: gnome-screenshot
# on kde: spectacle
# on x11: xsel
# on wayland: wl-clipboard
die(){
notify-send "$1"
exit 1
}
cleanup(){
[[ -n $1 ]] && rm -r "$1"
}
SCR_IMG=$(mktemp -d) || die "failed to take screenshot"
# shellcheck disable=SC2064
trap "cleanup '$SCR_IMG'" EXIT
#notify-send "Select the area of the text"
if which "spectacle" &> /dev/null
then
spectacle -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
else
gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
fi
# increase image quality with option -q from default 75 to 100
mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png" || die "failed to convert image"
#should increase detection rate
tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
if [ "$XDG_SESSION_TYPE" == "wayland" ]
then
wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
else
xsel -b -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
fi
notify-send "Text extracted"
exit
Also made some minor modifications: replaced `xsel` with `xclip` and added truncated version of the copied text to the `notify-send`:
#!/bin/bash
# Dependencies: tesseract-ocr imagemagick
# on gnome: gnome-screenshot
# on kde: spectacle
# on x11: xsel
# on wayland: wl-clipboard
die(){
notify-send "$1"
exit 1
}
cleanup(){
[[ -n $1 ]] && rm -r "$1"
}
SCR_IMG=$(mktemp -d) || die "failed to take screenshot"
# shellcheck disable=SC2064
trap "cleanup '$SCR_IMG'" EXIT
#notify-send "Select the area of the text"
if which "spectacle" &> /dev/null
then
spectacle -n -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
else
gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
fi
# increase image quality with option -q from default 75 to 100
mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png" || die "failed to convert image"
#should increase detection rate
tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
if [ "$XDG_SESSION_TYPE" == "wayland" ]
then
wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
else
# xsel -b -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
xclip -selection clipboard -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
fi
# Notify the user what was copied but truncate the text to 100 characters
notify-send "Text extracted from image" "$(head -c 100 "$SCR_IMG/scr.txt")" || die "failed to send notification"
exit
If you just put `set -o errexit -o pipefail -o nounset` in the first line after the shebang your script will have proper error-handling as well. Currently if any fails, notify-send will still be triggered.
This version looks nice and short, any thoughts on prober error reporting to the end user?
My version has more feedback for the user which was important because the user was somebody not familiar with linux/bash, but even my version "swallows" errors.
In the spirit of sharing, cuz I think this is a great script (thank you), I prefer using maim over scrot simply because it has a --nodrag option. Personally feels better when making selections from a trackpad. Click once, move cursor, click again.
I was using something like this for awhile, but I found tesseract did poorly quite often. That resize trick didn't seem to affect much. I'm not sure what pre-processing would make it better.
I'd love to if TextSnatcher does anything to improve on this. The github page is opaque.
For my fellow Windows-using plebians, the official Microsoft PowerToys add-in [0] has a feature that does this (it's also been added to the stock screenshot tool, but I personally find the one keyboard shortcut in PowerToys more pleasant to use).
Snipping tool build in OCR works for multiple languages (English, Russian, Chinese, Japanese etc.) without the need to install any language OCR packs though
It's bugging me for a long time now. Is tesseract actually the state-of-the-art solution here?
I just really don't know, it feels like it's, uh, subpar. Isn't it? I never seriously worked in that domain, but it somehow felt to me in the 2019, that with all recent advancements in computer vision, text recognition must be essentially a solved problem. I'd expect it to be better than human. Yet I still cannot accurately convert a low-res scan (scan! not even a photo!) of a receipt with tesseract, especially if it isn't in English. Maybe I just cannot properly use it?
I use Tesseract semi-regularly and only rarely have recognition issues, including with receipt scans (or even photos). How are you specifically using it?
Myself I tried it probably 10-15 years ago on scanned scientific papers (decent scanning quality). The results were disappointing. The manual postprocessing required was not much less than typing it directly. So tesseract became a synonym of "not worth trying" to me.
Maybe things have improved over the years, so I should give it a new try. (No particular use case at the moment, but those tend to appear occasionally.)
It’s good now _if_ you OCR only scanned documents or otherwise have a lot of control over how you prepare the images before it’s OCR’ed. For more general purpose recognition with weird fonts and bad image quality EasyOCR gave me much better results
It's way better now. I used it 15 years ago and had to do quite a bit of preprocessing to get not-entirely-terrible results, but now I use it with great success and no preprocessing.
Being a Flatpak app, it will require desktop portals to fully work. That said, it worked absolutely fine out of the box for me with my existing xdg-desktop-portal-wlr setup. So, it should work fine in any X11 or Wayland setup where you have an xdg-desktop-portal setup that supports the Screenshot API.
The results are mixed, but not bad by any means. Cleanly readable text comes out mostly fine with maybe only whitespace issues and the occasional error, which makes this still potentially very useful for copying text out of error dialogs and whatnot. (Though, I've found that on Linux, error dialogs are far more likely to have selectable text in the first place. And on Windows, standard MessageBox responds to Ctrl+C.)
There is a utility available for macOS that extends beyond simply opening a document in Preview and attempting to select the text: https://github.com/schappim/macOCR
FWIW one can skip Preview and just do Cmd-Shift-3, click the thumbnail, and interact with the text in the quicklook. Then, delete the image (trashcan in top right). Cmd-A works, too. Here's me using it on that comment: https://imgur.com/a/q0NvcS6
I didn't test the x11/wayland check yet, but feel free to use it and report back.
edit:Formatting
Also made some minor modifications: replaced `xsel` with `xclip` and added truncated version of the copied text to the `notify-send`:
Why did you remove the -f parameter?
My version has more feedback for the user which was important because the user was somebody not familiar with linux/bash, but even my version "swallows" errors.
The command passed to `trap` is evaluated normally, so variable expansions do take place.
Will behave correctly, and the expansion of SCR_IMG won't be susceptible to issues relating to unquoted shell characters.Alternatively, if you're using a modern bash (this probably won't work on a mac by default), then this is an option too:
I just press the hotkey (Super+O), drag the selection over whatever I want to OCR, then immediately get a popup dialog containing the captured text.
Deleted Comment
I'd love to if TextSnatcher does anything to improve on this. The github page is opaque.
https://github.com/RajSolai/TextSnatcher/blob/master/src/ser...
The script calls Tesseract in default page segmentation mode (PSM 3). [1]
Depending on the input text, PSM mode 11 for disconnected text would probably work much better. That uses the flag "--psm 11".
[1] From the original repo: string tess_command = "tesseract " + file_path + " " + out_path + @" -l $lang" ;
Apologies to all of my fellow Unix-Windows borderers.
also, use mktemp -d and recursively delete the directory
[0] https://github.com/microsoft/PowerToys
WIN+SHIFT+S
If it doesn't have the "Text actions" icon (dashed square with paragraph lines in it), you can update it via windows store to get the latest version.
I just really don't know, it feels like it's, uh, subpar. Isn't it? I never seriously worked in that domain, but it somehow felt to me in the 2019, that with all recent advancements in computer vision, text recognition must be essentially a solved problem. I'd expect it to be better than human. Yet I still cannot accurately convert a low-res scan (scan! not even a photo!) of a receipt with tesseract, especially if it isn't in English. Maybe I just cannot properly use it?
Myself I tried it probably 10-15 years ago on scanned scientific papers (decent scanning quality). The results were disappointing. The manual postprocessing required was not much less than typing it directly. So tesseract became a synonym of "not worth trying" to me.
Maybe things have improved over the years, so I should give it a new try. (No particular use case at the moment, but those tend to appear occasionally.)
Being a Flatpak app, it will require desktop portals to fully work. That said, it worked absolutely fine out of the box for me with my existing xdg-desktop-portal-wlr setup. So, it should work fine in any X11 or Wayland setup where you have an xdg-desktop-portal setup that supports the Screenshot API.
The results are mixed, but not bad by any means. Cleanly readable text comes out mostly fine with maybe only whitespace issues and the occasional error, which makes this still potentially very useful for copying text out of error dialogs and whatnot. (Though, I've found that on Linux, error dialogs are far more likely to have selectable text in the first place. And on Windows, standard MessageBox responds to Ctrl+C.)
I like the author.