PixelPanda (u/PixelPanda)

PixelPanda commented on Open source 3B param model for documents better than Gemini 2.5 huggingface.co/nanonets/N... · Posted by u/PixelPanda

PixelPanda · 4 months ago

Results with our benchmarking -> https://nanonets.com/research/nanonets-ocr-2/

PixelPanda commented on Nanonets-OCR2-3B – OCR model that transforms documents into structured markdown huggingface.co/nanonets/N... · Posted by u/PixelPanda

PixelPanda · 4 months ago

Excited to share Nanonets-OCR2, a state-of-the-art suite of models designed for advanced image-to-markdown conversion and Visual Question Answering (VQA).

Live Demo -> https://docstrange.nanonets.com/

Blog -> https://nanonets.com/research/nanonets-ocr-2/

PixelPanda commented on Nanonets-OCR-s – OCR model that transforms documents into structured markdown huggingface.co/nanonets/N... · Posted by u/PixelPanda

PixelPanda · 8 months ago

Full disclaimer: I work at Nanonets

Excited to share Nanonets-OCR-s, a powerful and lightweight (3B) VLM model that converts documents into clean, structured Markdown. This model is trained to understand document structure and content context (like tables, equations, images, plots, watermarks, checkboxes, etc.). Key Features:

LaTeX Equation Recognition Converts inline and block-level math into properly formatted LaTeX, distinguishing between $...$ and $$...$$.

Image Descriptions for LLMs Describes embedded images using structured <img> tags. Handles logos, charts, plots, and so on.

Signature Detection & Isolation Finds and tags signatures in scanned documents, outputting them in <signature> blocks.

Watermark Extraction Extracts watermark text and stores it within <watermark> tag for traceability.

Smart Checkbox & Radio Button Handling Converts checkboxes to Unicode symbols like , , and for reliable parsing in downstream apps.

Complex Table Extraction Handles multi-row/column tables, preserving structure and outputting both Markdown and HTML formats.

Huggingface / GitHub / Try it out: https://huggingface.co/nanonets/Nanonets-OCR-s

Try it with Docext in Colab: https://github.com/NanoNets/docext/blob/main/PDF2MD_README.m...