Readit News logoReadit News
adammajcher commented on Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API   github.com/majcheradam/oc... · Posted by u/adammajcher
cess11 · 21 days ago
Why is 12GB+ VRAM a requirement? The OCR model looks kind of small, https://huggingface.co/PaddlePaddle/PaddleOCR-VL/tree/main, so I'm assuming it is some processing afterwards it would be used for.
adammajcher · 21 days ago
fixed
adammajcher commented on Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API   github.com/majcheradam/oc... · Posted by u/adammajcher
mechazawa · 21 days ago
Is only bun supported or also regular node?
adammajcher · 21 days ago
it's bun first because of performance
adammajcher commented on Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API   github.com/majcheradam/oc... · Posted by u/adammajcher
tuwtuwtuwtuw · 21 days ago
Sure. The self host guide tells me to enter my github secret, in plain-text, in an env file. But it doesn't tell me why I should do that.

Do people actually store their secrets in plain text on the file system in production environments? Just seems a bit wild to me.

adammajcher · 21 days ago
well, you can use secrets manager as well
adammajcher commented on Show HN: Ocrbase – pdf → .md/.json document OCR and structured extraction API   github.com/majcheradam/oc... · Posted by u/adammajcher
binalpatel · 21 days ago
This is admittedly dated but even back in December 2023 GPT-4 with it's Vision preview was able to very reliably do structured extraction, and I'd imagine Gemini 3 Flash is much better than back then.

https://binal.pub/2023/12/structured-ocr-with-gpt-vision/

Back of the napkin math (which I could be messing up completely) but I think you could process a 100 page PDF for ~$0.50 or less using Gemini 3 Flash?

>560 input tokens per page * 100 pages = 56000 tokens = $0.028 input ($0.5/m input tokens) >~1000 output tokens per page * 100 pages = $0.30 output ($3/m output tokens)

(https://ai.google.dev/gemini-api/docs/gemini-3#media_resolut...)

adammajcher · 21 days ago
sure, in some small projects I recommend my friends to use gemini 3 flash. ocrbase is aimed more at scale and self-hosting: fixed infra cost, high throughput, and no data leaving your environment. at large volumes, that tradeoff starts to matter more than per-100-page pricing

u/adammajcher

KarmaCake day28September 13, 2024
About
https://github.com/ocrbase-hq/ocrbase https://x.com/adammajcher20
View Original