Readit News logoReadit News
Posted by u/clamlady 4 months ago
Ask HN: Is there an OCR that might be able to handle field datasheets?
I am an ecologist looking for OCR that can take .pdf scans of my Rite-in-the-Rain field notebooks (which are sometimes quite dirty) with length measurements and extract them. I've tried tesseract in R, but it doesn't handle them well. I plan on using this as an additional QC step after I enter them by hand. Thanks in advance!
keepsweet · 4 months ago
I've also tried tesseract in the past with handwritten notes, which didn't provide very accurate results. Then I started looking into some commercial solutions and stumbled upon many different tools, but the only one that could handle my handwriting was Klippa DocHorizon: https://www.klippa.com/en/ocr/ It uses machine learning and OCR instead of just plain OCR like tesseract does, so it might be an option to look into. You could also test it out at https://www.klippa.com/en/ocr/tools/

I've been using it for a while and would highly recommend it. hopefully it can work out for your use case

solardev · 4 months ago
In my limited experience, Google Cloud Vision API was much better than Tesseract: https://cloud.google.com/vision#demo
atsaloli · 4 months ago
have you tried ai chatbots? they are pretty good at ocr nowadays