Readit News logoReadit News
adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
think4coffee · 6 months ago
Are you guys Harry Potter fans? Curious how you came up with the name Reducto
adit_a · 6 months ago
Hahaha, a while ago (even before choosing this idea space) we said we would build "magical tools for developers" and Reducto was the name we landed on out of a long list of magic adjacent things
adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
zeld4 · 6 months ago
Looks promising.

Do you store the uploaded doc from free/test account?

adit_a · 6 months ago
Yes in the sense that we have features that will create persisted share links, and by default you can revisit results in your free account until you decide to delete them.

If helpful, we also offer free trial accounts with zero data retention if that's important for your use case

adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
techguy06 · 6 months ago
Your landing page looks absolutely beautiful, did you use Framer or any other landing page builder or is it code?
adit_a · 6 months ago
Code!
adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
jhuguet · 6 months ago
Founder of anyformat.ai here, building from Madrid, Spain, with a specific focus on Europe and its unique market and regulation dynamics.

Just want to say how energizing it is to see this space maturing through thoughtful products like Extend and Reducto. Congrats to both for your Series A. I’d also mention GetOmni, as they’re doing great work leading the open-source front with their ZeroX project. We’ve learned a lot by observing your execution, and frankly, anyone serious about document intelligence tracks this ecosystem closely. It’s been encouraging to see ideas we were exploring early last year reflected in your recent successes. No shame there; good ideas often converge over time.

When we started fundraising (previous to GPT-4o), few investors believed LLMs would meaningfully disrupt this space. Finding the right supporters meant enduring a lot of rejection and delayed us quite a bit. Raising is always hard, and especially in Spain, where even a modest €500K pre-seed round typically requires proven MRR in the order of €10K.

We’re earlier-stage, but strongly aligned in product philosophy. Especially in the belief that the challenge isn’t just parsing PDFs. It’s building a feedback loop so fast and intuitive that deploying new workflows feels like development, not consulting. That’s what enables no-code teams to actually own automation.

From our experience in Europe, the market feels slower. Legacy tools like Textract still hold surprising inertia, and even €0.04/page can trigger pushback, signaling deeper friction tied to organizational change. Curious if US-based teams see the same, or whether pricing and adoption are more elastic. We’ve also heard “we’ll build this internally in 3 weeks” more times than we can count—usually underestimating what it takes to scale AI-based workflows reliably.

One experiment we’re excited about is using AI agents to ease the “blank page” problem in workflow design. You type: “Given a document, split it into subdocuments (contract, ID, bank account proof), extract key fields, and export everything into Excel.” The agent drafts the initial pipeline automatically. It helps DocOps teams skip the fiddly config and get straight to value. Again, no magic—just about removing friction and surfacing intent.

Some broader observations that align with what others here have said:

- Parsing/extraction isn’t a long-term moat. Foundation models keep improving and are beginning to yield bounding boxes. Not perfect yet, but close. - Moats come from orchestration-first strategies and self-adaptive systems: rapid iteration, versioning, observability, and agent-assisted configuration using visual tools like ReactFlow or Langflow. Basically, making an easier life to the pipeline owner. - Prompt-tuning (via DSPY, human feedback, QA) holds promise for adaptability but is still hard to expose through intuitive UX—especially for semi-technical DocOps users without ML knowledge. - Extraction confidence remains a challenge. No method fully prevents hallucinations. We shared our mitigation approach here: http://bit.ly/3T5nB3h. OCR errors are a major contributor—we’ve seen extractions marked high-confidence despite poor OCR input. The extraction logic was right, but we failed to penalize for OCR confidence (we’re fixing that). -Excel files are still a nightmare. We’re experimenting with methods like this one (https://arxiv.org/html/2407.09025v1), but large, messy files (90+ tabs, 100K+ rows) still break most approaches.

I’d love to connect with other founders in this space. Competition is energizing, and the market is big enough for multiple winners. You guys, along with llamaparse, are spearheding from what I see the movement. Also, incumbents are moving fast. Like Snowflake + Landing AI partnership, but fragmentation is probably inevitable. Feels like the space will stratify fast, some will vanish, some will thrive quietly, and a few might become the core infrastructure layer.

We’re small, building hard, and proud to be part of this wave. Kudos again to @kbyatnal and @adit_a for raising the bar, would be great to chat anytime or even offer some workspace if you ever visit Spain!

adit_a · 6 months ago
Appreciate the thoughtful note and want to wish you guys the best as well!
adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
skadamat · 6 months ago
Congrats on the launch! How do you guys compare with Datalab with regards to accuracy?

https://www.datalab.to/

adit_a · 6 months ago
Thanks! We have a lot of respect for the work VikP and his team did on Surya but we haven't benchmarked his newer pipeline so I don't want to make a 1:1 claim.

If you want to do a side by side with your use case we'd be happy to set you up with free trial access.

adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
b0a04gl · 6 months ago
if reducto leans in fully as the layer that remembers every correction, every edge case, every shift in layout or wording across document versions it starts becoming more than a pipeline. it becomes institutional memory for unstructured data. none of the other players really do that. they extract, maybe evaluate once, then forget.

but the real pain is always in the second and third batch. when formats change subtly. if reducto becomes the system that adapts without you babysitting it, that's where it may win. continuity's the moat imo among the competitors

adit_a · 6 months ago
Yeah, we're extremely excited about the potential of building a flywheel for each individual customer's pipeline.
adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
Fraaaank · 6 months ago
Why do you only get a data processing agreement when on the enterprise plan? It's a legal requirement for any European company.
adit_a · 6 months ago
We have a default DPA we're willing to sign on all tiers -- the note in the pricing page is meant to refer to custom/redlined DPAs that become complex to manage over time

We'll edit that to make it more clear

adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
c_moscardi · 6 months ago
We chatted a few months back -- congrats on launch! Looks like a great UX.
adit_a · 6 months ago
Ah yeah I remember! Great to hear from you and thanks :)
adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
willwjack · 6 months ago
This would have saved me so much pain back when I was working on RAG workflows. Great to see.
adit_a · 6 months ago
Would love to help if you end up having any use cases in the future!
adit_a commented on Launch HN: Reducto Studio (YC W24) – Build accurate document pipelines, fast    · Posted by u/adit_a
kbyatnal · 6 months ago
Founder of Extend (https://www.extend.ai/) here, it's a great question and thanks for the tag. There definitely are a lot of document processing companies, but it's a large market and more competition is always better for users.

In this case, the Reducto team seems to have cloned us down to the small details [1][2], which is a bit disappointing to see. But imitation is the best form of flattery I suppose! We thought deeply about how to build an ergonomic configuration experience for recursive type definitions (which is deceptively complex), and concluded that a recursive spreadsheet-like experience would be the best form factor (which we shipped over a year ago).

> "How do you see the space evolving as LLMs commoditize PDF extraction?"

Having worked with a ton of startups & F500s, we've seen that there's still a large gap for businesses in going from raw OCR outputs —> document pipelines deployed in prod for mission-critical use cases. LLMs and VLMs aren't magic, and anyone who goes in expecting 100% automation is in for a surprise.

The prompt engineering / schema definition is only the start. You still need to build and label datasets, orchestrate pipelines (classify -> split -> extract), detect uncertainty and correct with human-in-the-loop, fine-tune, and a lot more. You can certainly get close to full automation over time, but it takes time and effort — and that's where we come in. Our goal is to give AI teams all of that tooling on day 1, so they hit accuracy quickly and focus on the complex downstream post-processing of that data.

[1] https://dub.sh/ojv9b7p

[2] https://dub.sh/X7GFlDd

adit_a · 6 months ago
Hey, we've never used or even attempted to use your platform. Respectfully I think you know that, and that you also know that your team has tried to get access to ours using personal gmail accounts dating back to 2024.

A schema builder with nested array fields has been part of our playground (and nearly every structured extraction solution) for a very long time and is just not something that we even view as a defining part of the platform.

u/adit_a

KarmaCake day53August 28, 2021View Original