I hope in the future, when Chip manufacture cost is no longer the bottleneck of AI, we can have more options.
LLMs that are attached to normal CPUs need lots of fast memory because they are doing very large matrix operations with very few arithmetic units which implies a lot of data motion. Changing that architecture might save on the need to move so much data, but it isn't at all clear what these people are proposing.
It also isn't at all obvious why their stuff would be any better than an ordinary vectorized arithmetic unit (often provocatively called a "tensor" chip).
#!/bin/bash
screenshot=$(mktemp)
decoded_data=$(mktemp)
processed_data=$(mktemp)
cleanup() {
rm "$screenshot" "$decoded_data" "$processed_data"
}
trap cleanup EXIT
flameshot gui -s -r > "$screenshot"
convert "$screenshot" \
-colorspace Gray \
-scale 1191x2000 \
-unsharp 6.8x2.69+0 \
-resize 500% \
"$screenshot"
tesseract \
--dpi 300 \
--oem 1 "$screenshot" - > "$decoded_data"
grep -v '^\s*$' "$decoded_data" > "$processed_data"
cat "$processed_data" | \
xclip -selection clipboard
yad --text-info --title="Decoded Data" \
--width=940 \
--height=580 \
--wrap \
--fontname="Iosevka 14" \
--editable \
--filename="$processed_data"
So I spent some time added the OCR function into flameshot. I didn't choose to compile tesseract into flameshot, but using the rest api way to call a server running remotely. The reason for this way is I also added llama.cpp translation feature after OCR.
Here're github repositories for my fork of flameshot and the OCR and translation server which is written causually in Rust.
https://github.com/jason-ni/flameshothttps://github.com/jason-ni/flameshot-ocr-server
Deleted Comment