These things are getting really good at just regular transcription (as long as you don't care about verbatimicity), but every additional dimension you add (timestamps, speaker assignment, etc) will make the others worse. These work much better as independent processes that then get reconciled and refined by a multimodal LLM.
Company description: We do construction compliance AI. We're in series B, about $40m raised. Growing a lot! Feel free to reply or DM me if you want some more info.
Tech Stack: Typescript full stack except for Python for ML, NestJS backend, NextJS frontend, React Native mobile, IaC in Pulumi using TS.
Open Jobs:
- Senior/Staff/Principal Frontend Engineer (fullstack Typescript nice to have)
- Senior/Staff/Principal Node Engineer (fullstack Typescript nice to have)
- Senior/Staff/Principal QA Engineer
Location: Austin or Texas preferred, but open to remote for the right candidate in the US.
Jobs page: https://www.documentcrunch.com/careers#open-roles