Readit News logoReadit News
diptanu commented on Benchmarking the Most Reliable Document Parsing API   tensorlake.ai/blog/benchm... · Posted by u/calavera
kissgyorgy · a month ago
I just tried it out and docling finished in 20s (with pretty good results) the same document which in Tensorlake is still pending for 10 minutes. I won't even wait for the results.
diptanu · a month ago
There was an unusual traffic spike around that time, if you try now it should be a lot faster. We were calling up but there was not enough GPU capacity at that time.
diptanu commented on Benchmarking the Most Reliable Document Parsing API   tensorlake.ai/blog/benchm... · Posted by u/calavera
recursive4 · a month ago
Curious how it compares to https://github.com/datalab-to/chandra
diptanu · a month ago
We haven’t tested Chandra yet, because it’s very new. Under the hood Tensorlake is very similar to Marker - it’s a pipeline based OCR API, we do layout detection, Text Recognition and Detection, Table Structure Understanding, etc. We then use VLMs to enrich the results. Our models are much bigger than marker, and thus takes a little longer to parse documents. We optimized for accuracy. We will have a faster API soon.
diptanu commented on Benchmarking the Most Reliable Document Parsing API   tensorlake.ai/blog/benchm... · Posted by u/calavera
ianhawes · a month ago
I just tested a non-English document and it rendered English text. Does your model not support anything other than English?
diptanu · a month ago
It does, we have users in Europe and Asia using it with non English languages. Can you please send me a message at diptanu at tensorlake dot ai, would love to see why it didn’t work.
diptanu commented on Benchmarking the Most Reliable Document Parsing API   tensorlake.ai/blog/benchm... · Posted by u/calavera
coderintherye · a month ago
Google's Vertex API for document processing absolutely does bounding boxes. In fact, some of the document processors are just a wrap around Google's product.
diptanu · a month ago
OP mentioned Gemini and not Google’s Vertex OCR API which has very different performance and accuracy characteristics than Gemini
diptanu commented on Benchmarking the Most Reliable Document Parsing API   tensorlake.ai/blog/benchm... · Posted by u/calavera
serjester · a month ago
This is just a company advertisement, not even one that’s well done. They didn’t benchmark any of the real leaders in the space (reducto, extend, etc) and left Gemini out of the first two tests, presumably because it was the best performer (while also being multiple orders of magnitude cheaper).
diptanu · a month ago
Hey! I am the founder of Tensorlake. We benchmarked the models that our customers consider using in enterprises or regulated industries where there is a big need for processing documents for various automation. Benchmarking takes a lot of time so we focussed on the ones that we get asked about.

On Gemini and other VLMs - we excluded these models because they don't do visual grounding - aka they don't provide page layouts, bounding boxes of elements on the pages. This is a table stakes feature for use-cases customers are building with Tensorlake. It wouldn't be possible to build citations without bounding boxes.

On pricing - we are probably the only company offer a pure on-demand pricing without any tiers. With Tensorlake, you can get back markdown from every page, summaries of figures, tables and charts, structured data, page classification, etc - in ONE api call. This means we are running a bunch of different models under the hood. If you add up the token count, and complexity of infrastructure to build a complex pipeline around Gemini, and other OCR/Layout detection model I bet the price you would end up with won't be any cheaper than what we provide :) Plus doing this at scale is very very complex - it requires building a lot of sophisticated infrastructure - another source of cost behind modern Document Ingestion services.

diptanu commented on So you want to parse a PDF?   eliot-jones.com/2025/8/pd... · Posted by u/UglyToad
throwaway4496 · 4 months ago
So you parse PDFs, but also OCR images, to somehow get better results?

Do you know you could just use the parsing engine that renders the PDF to get the output? I mean, why raster it, OCR it, and then use AI? Sounds creating a problem to use AI to solve it.

diptanu · 4 months ago
We parse PDFs to convert them to text in a linearized fashion. The use case for this would be to use the content for downstream use cases - search engine, structured extraction, etc.
diptanu commented on So you want to parse a PDF?   eliot-jones.com/2025/8/pd... · Posted by u/UglyToad
Alex3917 · 4 months ago
> This is exactly the reason why Computer Vision approaches for parsing PDFs works so well in the real world.

One of the biggest benefits of PDFs though is that they can contain invisible data. E.g. the spec allows me to embed cryptographic proof that I've worked at the companies I claim to have worked at within my resume. But a vision-based approach obviously isn't going to be able to capture that.

diptanu · 4 months ago
Yeah we don't handle this yet.

u/diptanu

KarmaCake day468July 23, 2008
About
Founder of @tensorlake. In the past I designed and worked on Hashicorp's cluster scheduler Nomad, the Titan cluster scheduler and Mesos at Netflix, and FBLearner at Facebook. Email - diptanu@tensorlake.ai
View Original