Readit News logoReadit News
Posted by u/anonbuilder 3 years ago
Show HN: PodText.ai – Search anything said on a podcast, highlight text to playpodtext.ai...
Hi HN, wanted to share a project that I’ve been working on recently.

PodText allows users to find anything said on a podcast. You can also listen and share clips to a specific part of the podcast audio, simply by highlighting the text of that part. Currently there are just over 25k podcast episodes and I’m adding a lot more in the coming weeks (yes my GPU bill is painful).

In order to monetize it, I’m building a sponsorship database to help sponsors find podcasts and vice versa. This will be sold in the form of a $99/month “PodText Business” subscription. I bet I could charge a lot more to large sponsors but I’ll tweak that as I talk to potential customers.

Right now the UI is very bare bones (doesn’t even have pagination) but I’ll polish it once the data pipeline is working well. Please let me know if you run into any bugs or have any questions about the site or business model.

PS: I'm a regular on HN using my real name but can't post under that account since my employer will fire me if they found out about this project :-)

lappa · 3 years ago
This is a really interesting project with a lot of potential.

If I were a sponsor looking for a podcast I would want my search process to look something like this:

- Search for a term relevant to my line of business

- See a list of podcasts ordered by % of utterances which contain my key phrase throughout their last N episodes

- Annotation of how many listeners each podcast had in last N episodes

llambda · 3 years ago
They could and probably should offer semantic search, which would be far more powerful than searching exact match keywords.

If you could identify podcasts that often talk about a domain more broadly, you'll have a higher hit rate and overall a better audience fit.

jcutrell · 3 years ago
I’ve been creating a semantic search using embeddings tonight against my own podcast transcripts. I’d be happy to have my own content surfacing mechanism like this!
anonbuilder · 3 years ago
Appreciate the feedback! I'll keep these use cases in mind as I build out PodText Business
drusepth · 3 years ago
Building out the search a little more to support exact matches would also be super useful in this flow. For example, I've been on several podcasts talking about Notebook.ai, but searching for the name also matches "notebook", which results in an unusable signal-to-noise ratio (seeing every podcast that says the word "notebook"). Likewise, it'd be great to quote-search exact matches for "Andrew Brown", instead of seeing all podcasts that mention "Andrew" or "brown".
pablomendes · 3 years ago
Happy to exchange notes with you on our learnings building https://podsearch.page
johnthescott · 3 years ago
heads up.

the "How it Works" link is broken on home page of podsearch.page

sva_ · 3 years ago
I thought about making something like this, but one important part - which seems to be missing here - is speaker diarization (identify who says what.)

In a world of increasing automated content generation, the "who" might become just as important as the "what" of information.

mike_d · 3 years ago
Amazing. I was literally just building the same thing. I have a little over 50k podcast episodes with karaoke style word by word transcription.

On to the next project I guess.

ProcNetDev · 3 years ago
There are dozens of us! This looks nice and the UI is better than I would have done. Good job, OP. I guess I can stop running my GPU every night.

I might still try to release a nice bundled-up docker container that can STT a podcast RSS into a text RSS. Some podcasts are enjoyable to listen to, but some I just want to skim the text.

ramraj07 · 3 years ago
Tbh this is like the single most common idea every tech guy gets when they’re frustrated with the problem. Many like me move on when we realize how niche and potentially small a TAM it could be (of course any idea can generalize in creative ways). So no harm as long as it was fun. Also many options can co-exist!
jerrygoyal · 3 years ago
I don't like that attitude ;)
mritchie712 · 3 years ago
This is amazing. How do you search for two terms at once? e.g. "aboriginal" and "origin". Doesn't seem possible to require both terms are present.
anonbuilder · 3 years ago
Thanks! The search function is built with Algolia, I'm sure they support boolean ops like "AND" but I'll need to dig into their API. I think if you search both terms, transcripts containing both should be ranked higher.
kwerk · 3 years ago
I’m doing a similar personal product. Highly recommend switching to Typesense before your Algolia trial is up. I’ve heard good things about Meilisearch but Typesense has been rock solid for me.
pablomendes · 3 years ago
You might want to try semantic search instead of fiddling with keywords. Disclaimer: I'm building a plug-and-play semantic search API at https://kailualabs.com
culi · 3 years ago
Surprised to see how many other small projects doing this same thing there are. Kinda seems like a solved problem with ListenNotes. Not affiliated, but I use the service a lot and they have a lot more features than just transcripts including publicly accessible APIs (some of which could probably be utilized by some of the projects posted here)
jonmc12 · 3 years ago
I'd echo this. ListenNotes API is great and I thought podcast search was already solved for devs.

I've always enjoyed this question on their FAQ that gives some tips for potential competitors - https://www.listennotes.com/api/faq/#faq2

> There are at least 3,035,027 podcasts and 156,316,374 episodes on the Internet...

culi · 3 years ago
In addition, it's free for users and the API's free plan is sufficient for most personal uses. Hard to see how any of these newcomers could compete unless they had funding behind them
tootie · 3 years ago
Please take it as encouragement that this is a real business and not discouragement that it's been done but there are similar products in the market. We use podscribe at my place. It's more about better targeting than discovery because we have an ad sales team but suffice to say there is an established market for this kind of thing.
paulryanrogers · 3 years ago
Do you get permission from podcast creators?

Because these transcripts are probably derivative works.

mike_d · 3 years ago
It hasn't been litigated yet, but the closest analog is closed captioning. You can legally create closed captions for someone elses work, and you then own the copyright.
Geee · 3 years ago
I think the closest analog is audiobooks. You can't create audiobooks without permission from the original creator of the book. Your derivative work has it's own copyright, but it only applies to the work you've done, i.e. the audio. You can't steal someone's original work just by adding something to it. You get copyright on the addition only.