I have a scanner, and some OCR processes I run things through. I am close to 85% from my automatic process.
The pain of going from 85% to 99% though is considerable. (and in my case manual) (well Perl helps)
I went to try this AI on one of the short poem manufscript I have.
I told the prompt I wanted PDF to Markdown, it says sure go ahead give me the pdf. I went upload it. It spent a long time spinning. then a quick messages comes up, something like
"Failed to count tokens"
but it just flashes and goes away.
I guess the PDF is too big? Weird though, its not a lot of pages.
Genuine questions, I don't know if this is the case or not.
I guess my point is the project should be providing a clear path that doesn't involve AWS instead of just stopping short.
I wish the Bottlerocket team would do 1 of 2 things. Either own up that this is just an AWS project, or start to solve for things like this and actually be a product that "runs in the cloud or in your datacenter" as they suggest on their website.