Readit News logoReadit News
breadislove commented on Launch HN: Mosaic (YC W25) – Agentic Video Editing   mosaic.so... · Posted by u/adishj
primitivesuave · a month ago
Last year, I made a YouTube documentary series showcasing the prolific corruption in a small city government. I downloaded all the city government meetings, used Whisper to transcribe them, and then set up a basic RAG so I could query across a decade of committee meetings (around 1 TB of video). Once I got the timestamps that I'm interested in, I then have to embark on a tedious manual process of locating the file, cutting out a few seconds/minutes from a multi-hour video, and then order all the clips into a cohesive narrative.

These seem like problems that LLMs are especially well-suited for. I might have spent a fraction of the time if there was some system that could "index" my content library, and intelligently pull relevant clips into a cohesive storyline.

I also spent an ungodly amount of time on animations - it felt like "1 hour of work for 1 minute of animation". I would gladly pay for a tool which reduces the time investment required to be a citizen documentarian.

breadislove · a month ago
you should check mixedbread out. we support indexing multimodal data and making data ready for ai. we are adding video and audio support by the end of the year. might be interesting for the OP as well.

we have couple investigative journalists and lawyers using us for a similar usecase.

breadislove commented on BERT is just a single text diffusion step   nathan.rs/posts/roberta-d... · Posted by u/nathan-barry
alansaber · 2 months ago
Interested in how this compares to electra
breadislove · 2 months ago
or deberta but nevertheless super interesting!
breadislove commented on DeepSeek OCR   github.com/deepseek-ai/De... · Posted by u/pierre
breadislove · 2 months ago
For everyone wondering how good this and other benchmarks are:

- the OmniAI benchmark is bad

- Instead check OmniDocBench[1] out

- Mistral OCR is far far behind most Open Source OCR models and even further behind then Gemini

- End to End OCR is still extremely tricky

- composed pipelines work better (layout detection -> reading order -> OCR every element)

- complex table parsing is still extremely difficult

[1]: https://github.com/opendatalab/OmniDocBench

breadislove commented on Migrating from AWS to Hetzner   digitalsociety.coop/posts... · Posted by u/pingoo101010
V__ · 2 months ago
This sound really intriguing, and I am really curious. What kind of service do you run where you need a 100s of VMs? Was there a reason for not going dedicated? Looking at their offering their biggest VM is (48 CPU, 192 GB RAM, 960 GB SSD). I can't even imagine using that much. Again, I'm really curious.
breadislove · 2 months ago
we have extremely processing heavy jobs where user upload large collection of files (audios, pdfs, videos etc.) and expect to get fast processing. its just that we need to fan out sometimes, since a lot of our users a sensitive to processing times.
breadislove commented on Migrating from AWS to Hetzner   digitalsociety.coop/posts... · Posted by u/pingoo101010
jgalt212 · 2 months ago
1000 VMs?

So you have approx 1MM concurrent customers? That's a big number. You should definitely be able to get preferred pricing from AWS at that scale.

breadislove · 2 months ago
We have extremely processing heavy jobs where user upload large collection of files (PDFs, audios, videos etc.) and expect to get fast processing.

u/breadislove

KarmaCake day152March 14, 2024View Original