My problem statement is:
- Injest PDFs, summarize, and extract important information.
- Have some way to overlay the extracted information on the pdf in the UI.
- User can provide feedback on the overlaid info by accepting or rejecting the highlights as useful or not.
- This info goes back in to the model for reinforced learning.
Hoping to find something that can make this more manageable.
The tricky part is maintaining a mapping between your LLM extractions and these coordinates.
One way to do it would be with two LLM passes:
1. First pass: Extract all important information from the PDF
2. Second pass: "Hey LLM, find where each extraction appears in these bounded text chunks"
Not the cheapest approach since you're hitting the API twice, but it's straightforward!
"Balls. We want the finest wines available to humanity. We want them here, and we want them now!"
"Here Hare Here"
"I feel like a pig shat in my head."
"I don't advise a haircut, man. All hairdressers are in the employment of the government. Hair are your aerials. They pick up signals from the cosmos and transmit them directly into the brain. This is the reason bald-headed men are uptight."