- the OmniAI benchmark is bad
- Instead check OmniDocBench[1] out
- Mistral OCR is far far behind most Open Source OCR models and even further behind then Gemini
- End to End OCR is still extremely tricky
- composed pipelines work better (layout detection -> reading order -> OCR every element)
- complex table parsing is still extremely difficult
So you have approx 1MM concurrent customers? That's a big number. You should definitely be able to get preferred pricing from AWS at that scale.
These seem like problems that LLMs are especially well-suited for. I might have spent a fraction of the time if there was some system that could "index" my content library, and intelligently pull relevant clips into a cohesive storyline.
I also spent an ungodly amount of time on animations - it felt like "1 hour of work for 1 minute of animation". I would gladly pay for a tool which reduces the time investment required to be a citizen documentarian.
we have couple investigative journalists and lawyers using us for a similar usecase.