NotebookLM audio overviews/podcasts have been an absolute boon for my homeschooled kids. They devour audiobooks and podcasts, and they love learning by listening to these first. Then when we come together for class, we discuss what was covered, and can spend time diving into specifics or doing activities based on the content. It’s super nice to have another option for a learning medium here.
To generate them, we’ve scanned the physical book pages, and then with a simple Python script fed the images into GCP’s Document AI to extract the text en-masse, and concatenated the results together into a text-only version of the chapter. Give that text to NotebookLM and run with it.
Why not simply upload the pdf version of the scanned book or document? Extracting the text out of a scanned document via GCP Document AI API sounds like unnecessary use of resources
To generate them, we’ve scanned the physical book pages, and then with a simple Python script fed the images into GCP’s Document AI to extract the text en-masse, and concatenated the results together into a text-only version of the chapter. Give that text to NotebookLM and run with it.