Readit News logoReadit News
Posted by u/knowsuchagency a year ago
Show HN: PDF to Podcast – Convert Any PDF into a Podcast Episodepdf-to-podcast.com...
Hi HN!

I'm stoked to share a project I've been working on called PDF to Podcast. It's a free, open-source tool that automatically converts PDF documents into engaging, informative podcast-style audio content using large language models and text-to-speech tech.

Inspiration: The idea for this project came from the NotebookLM demo at Google I/O, where they showcased generating audio dialogue from uploaded PDFs and other sources. However, that audio feature hasn't been publicly released yet, and I wanted to challenge myself to build something similar using existing tools and APIs.

How it works:

The user uploads a PDF The tool extracts the text and feeds it into Google's Gemini Flash language model Gemini Flash generates a natural, engaging podcast dialogue script based on the key information in the document This script is then converted to audio using OpenAI's text-to-speech API The user can listen to the generated "podcast episode" and read along with the transcript I chose to use Gemini Flash for the language model because it's good at writing high-quality prose while being fast and cheap. We use OpenAI's TTS API to then bring the dialogue to life.

Under the hood, it's built with Python, FastAPI, Gradio for the web UI, and my own library, promptic, for calling the LLM and getting structured output. The code is open-source and available on GitHub.

Apart from the tool's practical utility, I'm hoping this project can serve as a helpful example for others looking to build applications on top of large language models. It demonstrates an end-to-end flow from document intake to language model usage to audio output, with a simple web interface on top.

I would love to hear any feedback or ideas from the HN community! I think there's a lot of potential to expand on this concept and make all sorts of written content more accessible and engaging through audio conversion. Let me know what you think :)

simonw · a year ago
I always go straight for the prompt with this kind of thing - it's here: https://github.com/knowsuchagency/pdf-to-podcast/blob/512bfb...

It starts like this:

    Your task is to take the input text provided and turn it into
    an engaging, informative podcast dialogue. The input text may
    be messy or unstructured, as it could come from a variety of
    sources like PDFs or web pages. Don't worry about the
    formatting issues or any irrelevant information; your goal is
    to extract the key points and interesting facts that could be
    discussed in a podcast.
The way this uses different OpenAI TTS voices for the different roles is really neat!

zachthewf · a year ago
I wonder what (if anything) is the impact of the leading spaces on each line of the multiline string, which are an artifact of wanting to keep the prompt pretty within code.

Hopefully not much, but I've heard horror stories about trailing spaces...

simonw · a year ago
As far as I can tell that only really affects the smaller models - GPT-4 / Claude / Gemini all seem pretty much impervious to weird whitespace in my experience.
cchance · a year ago
I imagine you could force this even further by specifying the names of the researcher and interviewer, and giving details of the structure of the episode
cush · a year ago
It might be a good idea to toss some kind of audio disclaimer at the beginning of the podcast that cites the source and that the audio is completely fabricated. Reason being, the "Attention is All You Need" example on your site has Anya Sharma (an actual AI researcher who is unrelated to the Attention paper) on as a guest. Not sure if this is intentional or a hallucination, but it seems like a huge liability
weare138 · a year ago
I tried the example physics article and it just made up a physicist to 'interview' that wasn't mentioned in the article.
xzjis · a year ago
Human podcasters hallucinate too.
wenbin · a year ago
Awesome project!

However, I find that when I realize a podcast is generated using AI and synthetic audio, I immediately lose interest. For me, the value of podcasts lies in authentic human conversations, and AI-generated content just doesn’t have the same appeal.

Probably it's just me being obsessed with old-school podcasts, though. I do believe there are listeners (not sure if many or few) who don't mind if a podcast is AI-generated.

malloryerik · a year ago
Funny, I've been using even primitive text-to-speech on PDFs for years and while nothing compares to an excellent human reader, I find TTS often better than a mediocre human reading. This is mainly because I don't get upset at (and then have to forgive) a machine when it says the "Loovree" instead of the Louvre or in an economic history book pronounces "Keens" for John Maynard Keynes (sound like "Kaynes"). Also the dead neutrality of a machine's reading can jar me less than a numbskull and/or phony human rendition. I must say though that excellent voice actors are to me heaven.
stufffer · a year ago
What extensions/apps would you recommend?

I have tried to set up something similar with text-to-speech browsers extension but I loose my place if I have to close and reopen.

satvikpendem · a year ago
That's interesting, for me, podcasts are just news articles or books that I don't have the time to sit down and read. The only time I listen to podcasts and audiobooks are when I am walking around or doing chores. Yes, many podcasts have a human element to them that is nice, but just as many are still useful without a human, as for these ones, I'm primarily there for the information itself, not who conveys it.
keiferski · a year ago
It’s almost certainly the case that the most profitable and popular podcasts are ones built around the personality of their host(s) and not because the content is merely in audio form. So while this tool is useful for listening to information instead of reading it, the likelihood of a major podcast being entirely AI-generated is pretty low.
spaceship__sun · a year ago
Just a tangent, fans are obssessed with certain artists, say, TSwift, because of their personality rather than pure voice and lyrics. That's why concerts are so fucking popular.
leobg · a year ago
I tried the same thing for my kids:

Take some article or book written for adults. Maybe some archaeological discovery, interesting stuff from HN. Or science books from the 1960s.

Then have it turned into a conversation between the father and a curious, seven year old daughter. And convert it to audio with two different speakers.

While it’s been fun to build this, I never ended up letting my kids use it. It just feels wrong. The educational equivalent of Harlow’s Monkeys.

randomcarbloke · a year ago
why does this feel wrong, it seems like supervised it could be very beneficial.
david1542 · a year ago
Looks good. As other people said, it's risky to give you my OpenAI key, so I'd make the app run locally with React maybe. Moreover, it'd be good to give an approximation of the price. It's kinda scary to click "Submit" and later on see that I was charged $3 by OpenAI.
rahimnathwani · a year ago
The page has a link to the code, so I guess you can self-host it: https://github.com/knowsuchagency/pdf-to-podcast
edward-ca · a year ago
Looks like a fun project!

Do you have any samples of the audio? It would be great to hear what it's like before trying it out.

Also, have you considered doing this all in client side JS? Would be a good way to protect the API key (at least in this demo case).

elicash · a year ago
At the bottom of the page, there are examples.
eggbrain · a year ago
I think it would probably help to take the PDF up front, do a combination of checking the DPI and page count to get an estimated word count (as OCRing to get an exact word count might be costly on your end), and then return back a “price preview” at which point the customer just pays the price to get their podcast.

Like others have mentioned, I’d be scared to accidentally upload a 100 page PDF only for it to cost me $100 without me really knowing up front.

unraveller · a year ago
Sounds exactly like the way the simply news podcast is put together. That is 100% ai for each topic (ai, tech, business, science etc) and combines multiple recent papers/stories for a hundreds of daily podcasts.

https://simply-ai.podbean.com

https://www.simplynews.ai