TextSnatcher: Copy text from images, for the Linux Desktop

I use the same script as Dibby053, copied from stackoverflow but with some tweaks to work on kde,gnome and wayland as well as x11 and with some notifications on what state it is in.

I didn't test the x11/wayland check yet, but feel free to use it and report back.

  #!/bin/bash 
  # Dependencies: tesseract-ocr imagemagick 
  # on gnome: gnome-screenshot 
  # on kde: spectacle
  # on x11: xsel
  # on wayland: wl-clipboard

  die(){
  notify-send "$1"
  exit 1
  }
  cleanup(){
  [[ -n $1 ]] &&  rm -rf "$1"
  }

  SCR_IMG=$(mktemp)  || die "failed to take screenshot"

  # shellcheck disable=SC2064
  trap "cleanup '$SCR_IMG'" EXIT

  notify-send "Select the area of the text" 
  if  which "spectacle" &> /dev/null
  then
    spectacle -r -o "$SCR_IMG.png" || die "failed to take screenshot"
  else
    gnome-screenshot -a -f "$SCR_IMG.png" || die "failed to take screenshot"
  fi

  # increase image quality with option -q from default 75 to 100
  mogrify -modulate 100,0 -resize 400% "$SCR_IMG.png"  || die "failed to convert image"
  #should increase detection rate

  tesseract "$SCR_IMG.png" "$SCR_IMG" &> /dev/null || die "failed to extract text"
  if [ "$XDG_SESSION_TYPE" == "wayland" ]
  then 
  wl-copy < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
  else
  xsel -b -i  < "$SCR_IMG.txt" || die "failed to copy text to clipboard"
  fi
  notify-send "Text extracted"
  exit

edit:

Formatting

guipsp · 2 years ago

I slightly modified your script to: 1. Clean up properly 2. Run spectacle in BG mode, so the window does not pop up after screenshotting.

  #!/bin/bash 
  # Dependencies: tesseract-ocr imagemagick 
  # on gnome: gnome-screenshot 
  # on kde: spectacle
  # on x11: xsel
  # on wayland: wl-clipboard
  
  die(){
    notify-send "$1"
    exit 1
  }
  cleanup(){
    [[ -n $1 ]] && rm -r "$1"
  }
  
  SCR_IMG=$(mktemp -d) || die "failed to take screenshot"
  
  # shellcheck disable=SC2064
  trap "cleanup '$SCR_IMG'" EXIT
  
  #notify-send "Select the area of the text" 
  if  which "spectacle" &> /dev/null
  then
    spectacle -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
  else
    gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
  fi
  
  # increase image quality with option -q from default 75 to 100
  mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png"  || die "failed to convert image"
  #should increase detection rate
  
  tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
  if [ "$XDG_SESSION_TYPE" == "wayland" ]
  then 
    wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
  else
    xsel -b -i  < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
  fi
  notify-send "Text extracted"
  exit

palmy · 2 years ago

This is great!

Also made some minor modifications: replaced `xsel` with `xclip` and added truncated version of the copied text to the `notify-send`:

  #!/bin/bash 
  # Dependencies: tesseract-ocr imagemagick 
  # on gnome: gnome-screenshot 
  # on kde: spectacle
  # on x11: xsel
  # on wayland: wl-clipboard

  die(){
    notify-send "$1"
    exit 1
  }
  cleanup(){
    [[ -n $1 ]] && rm -r "$1"
  }

  SCR_IMG=$(mktemp -d) || die "failed to take screenshot"

  # shellcheck disable=SC2064
  trap "cleanup '$SCR_IMG'" EXIT

  #notify-send "Select the area of the text" 
  if  which "spectacle" &> /dev/null
  then
    spectacle -n -b -r -o "$SCR_IMG/scr.png" || die "failed to take screenshot"
  else
    gnome-screenshot -a -f "$SCR_IMG/scr.png" || die "failed to take screenshot"
  fi

  # increase image quality with option -q from default 75 to 100
  mogrify -modulate 100,0 -resize 400% "$SCR_IMG/scr.png"  || die "failed to convert image"
  #should increase detection rate

  tesseract "$SCR_IMG/scr.png" "$SCR_IMG/scr" &> /dev/null || die "failed to extract text"
  if [ "$XDG_SESSION_TYPE" == "wayland" ]
  then 
    wl-copy < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
  else
    # xsel -b -i  < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"
    xclip -selection clipboard -i < "$SCR_IMG/scr.txt" || die "failed to copy text to clipboard"  
  fi
  # Notify the user what was copied but truncate the text to 100 characters
  notify-send "Text extracted from image" "$(head -c 100 "$SCR_IMG/scr.txt")" || die "failed to send notification"
  exit

bpfrh · 2 years ago

Good catch with spectacle, I thought I fixed that already.

Why did you remove the -f parameter?

rjzzleep · 2 years ago

I like all the error handling, but you could skip the temp files if you just pipe it through

    #!/usr/bin/env bash
    langs=(eng ara fas chi_sim chi_tra deu ell fin heb hun jpn kor nld rus tur)
    lang=$(printf '%s\n' "${langs[@]}" | fuzzel -d "$@")
    grim -g "$(slurp)" - | mogrify -modulate 100,0 -resize 400% png:- | tesseract -l eng+${lang} - - | wl-copy
    notify-send "Text extracted"

miduil · 2 years ago

If you just put `set -o errexit -o pipefail -o nounset` in the first line after the shebang your script will have proper error-handling as well. Currently if any fails, notify-send will still be triggered.

bpfrh · 2 years ago

This version looks nice and short, any thoughts on prober error reporting to the end user?

My version has more feedback for the user which was important because the user was somebody not familiar with linux/bash, but even my version "swallows" errors.

tmerse · 2 years ago

I also used the very same script until I stumbled upon this on hn [0].

    #!/usr/bin/env bash
    langs=(eng ara fas chi_sim chi_tra deu ell fin heb hun jpn kor nld rus tur)
    lang=$(printf '%s\n' "${langs[@]}" | dmenu "$@")
    maim -us | tesseract --dpi 145 -l eng+${lang} - - | xsel -bi

[0]: https://news.ycombinator.com/item?id=33704483#33705272

tmerse · 2 years ago

Ah just saw rjzzleep posted an updated version here. Happy to steal this one again :)

begueradj · 2 years ago

Looks nice

Arch-TK · 2 years ago

    # shellcheck disable=SC2064
    trap "cleanup '$SCR_IMG'" EXIT

While shellcheck can have false positives, and SCR_IMG probably doesn't have any characters which need escaping, it's not exactly wrong in this case.

The command passed to `trap` is evaluated normally, so variable expansions do take place.

    trap 'cleanup "$SCR_IMG"' EXIT

Will behave correctly, and the expansion of SCR_IMG won't be susceptible to issues relating to unquoted shell characters.

Alternatively, if you're using a modern bash (this probably won't work on a mac by default), then this is an option too:

    trap "cleanup ${SCR_IMG@Q}" EXIT

bpfrh · 2 years ago

thanks for fixing and explaining that, I thought '' would work and forgot about escaping characters.

Gormo · 2 years ago

Binding a hotkey to `bash -c 'flameshot gui -s -r | tesseract - - | gxmessage -title "Decoded Data" -fn "Consolas 12" -wrap -geometry 640x480 -file -'` does the job for me.

I just press the hotkey (Super+O), drag the selection over whatever I want to OCR, then immediately get a popup dialog containing the captured text.

jonquark · 2 years ago

The Wayland leg works fine for me on gnome+wayland.

bpfrh · 2 years ago

thanks!

Deleted Comment

#!/bin/bash # Dependencies: tesseract-ocr imagemagick scrot xsel IMG=`mktemp` trap "rm $IMG*" EXIT scrot -s $IMG.png -q 100 # increase image quality with option -q from default 75 to 100 mogrify -modulate 100,0 -resize 400% $IMG.png #should increase detection rate tesseract $IMG.png $IMG &> /dev/null cat $IMG.txt | xsel -bi notify-send "Text copied" "$(cat $IMG.txt)" exit

I see tesseract mentioned more and more.

Myself I tried it probably 10-15 years ago on scanned scientific papers (decent scanning quality). The results were disappointing. The manual postprocessing required was not much less than typing it directly. So tesseract became a synonym of "not worth trying" to me.

Maybe things have improved over the years, so I should give it a new try. (No particular use case at the moment, but those tend to appear occasionally.)

graynk · 2 years ago

It’s good now _if_ you OCR only scanned documents or otherwise have a lot of control over how you prepare the images before it’s OCR’ed. For more general purpose recognition with weird fonts and bad image quality EasyOCR gave me much better results

sp332 · 2 years ago

This project is including Tesseract 4.1.1 which is at least a couple years old.

mellutussa · 2 years ago

Try https://github.com/ocrmypdf/OCRmyPDF - it uses Tesseract behind the scenes and it absolutely brilliant.

mkl · 2 years ago

It's way better now. I used it 15 years ago and had to do quite a bit of preprocessing to get not-entirely-terrible results, but now I use it with great success and no preprocessing.

walteweiss · 2 years ago

First time I used it 3 to 4 years ago, it was good.

Dibby053 · 2 years ago

A while back I copied from somewhere this script that does the job nicely.

grimgrin · 2 years ago

In the spirit of sharing, cuz I think this is a great script (thank you), I prefer using maim over scrot simply because it has a --nodrag option. Personally feels better when making selections from a trackpad. Click once, move cursor, click again.

    maim -s --nodrag --quality=10 $IMG.png

10 is scrot's 100

raphman · 2 years ago

Yet another variation I have been using for ages, using ImageMagick's `import` tool (which probably only works on X11)

    import "$tempfile"
    TEXT=`tesseract -l eng+deu "$tempfile" stdout`
    echo "$TEXT" | xsel -i -b

dsp_person · 2 years ago

I was using something like this for awhile, but I found tesseract did poorly quite often. That resize trick didn't seem to affect much. I'm not sure what pre-processing would make it better.

I'd love to if TextSnatcher does anything to improve on this. The github page is opaque.

mappu · 2 years ago

The source is pretty straightforward - it's calling `scrot -s -o` to a temp file, and then `tessaract` with no further preprocessing.

https://github.com/RajSolai/TextSnatcher/blob/master/src/ser...

stevesimmons · 2 years ago

> I found tesseract did poorly quite often

The script calls Tesseract in default page segmentation mode (PSM 3). [1]

Depending on the input text, PSM mode 11 for disconnected text would probably work much better. That uses the flag "--psm 11".

[1] From the original repo: string tess_command = "tesseract " + file_path + " " + out_path + @" -l $lang" ;

hiAndrewQuinn · 2 years ago

I had a PowerShell script which did this as well, but alas, it was lost to time with the rest of my little scripts from my last job.

Apologies to all of my fellow Unix-Windows borderers.

  trap "rm $IMG*" EXIT

see https://www.shellcheck.net/wiki/SC2064

also, use mktemp -d and recursively delete the directory

doix · 2 years ago

This is perfect for me! Having a window with a button that I need to click is much worse than just binding a script to a hotkey.

cfiggers · 2 years ago

For my fellow Windows-using plebians, the official Microsoft PowerToys add-in [0] has a feature that does this (it's also been added to the stock screenshot tool, but I personally find the one keyboard shortcut in PowerToys more pleasant to use).

[0] https://github.com/microsoft/PowerToys

fredzel · 2 years ago

Snipping tool build in OCR works for multiple languages (English, Russian, Chinese, Japanese etc.) without the need to install any language OCR packs though

lysp · 2 years ago

Inbuilt snip tool does that too.

WIN+SHIFT+S

If it doesn't have the "Text actions" icon (dashed square with paragraph lines in it), you can update it via windows store to get the latest version.

krick · 2 years ago

It's bugging me for a long time now. Is tesseract actually the state-of-the-art solution here?

I just really don't know, it feels like it's, uh, subpar. Isn't it? I never seriously worked in that domain, but it somehow felt to me in the 2019, that with all recent advancements in computer vision, text recognition must be essentially a solved problem. I'd expect it to be better than human. Yet I still cannot accurately convert a low-res scan (scan! not even a photo!) of a receipt with tesseract, especially if it isn't in English. Maybe I just cannot properly use it?

I use Tesseract semi-regularly and only rarely have recognition issues, including with receipt scans (or even photos). How are you specifically using it?

usr1106 · 2 years ago

jchw · 2 years ago

I gave it a try. Works pretty good.

Being a Flatpak app, it will require desktop portals to fully work. That said, it worked absolutely fine out of the box for me with my existing xdg-desktop-portal-wlr setup. So, it should work fine in any X11 or Wayland setup where you have an xdg-desktop-portal setup that supports the Screenshot API.

The results are mixed, but not bad by any means. Cleanly readable text comes out mostly fine with maybe only whitespace issues and the occasional error, which makes this still potentially very useful for copying text out of error dialogs and whatnot. (Though, I've found that on Linux, error dialogs are far more likely to have selectable text in the first place. And on Windows, standard MessageBox responds to Ctrl+C.)

rvdca · 2 years ago

The similar app I am using is Frog (https://getfrog.app) with great sucesss.

mathfailure · 2 years ago

No AppImage, no .deb, not even brew.

ssernikk · 2 years ago

It's on nixpkgs under name `gnome-frog` (for nix users)

schappim · 2 years ago

There is a utility available for macOS that extends beyond simply opening a document in Preview and attempting to select the text: https://github.com/schappim/macOCR

I like the author.

lelandfe · 2 years ago

FWIW one can skip Preview and just do Cmd-Shift-3, click the thumbnail, and interact with the text in the quicklook. Then, delete the image (trashcan in top right). Cmd-A works, too. Here's me using it on that comment: https://imgur.com/a/q0NvcS6

helsinkiandrew · 2 years ago

Thank You!