Have you considered https://github.com/containerd/overlaybd? It seems to offer very similar features to blockdiff.
What are character triples? Are they trigrams?
1. Prompt with examples. I included an example image with an example transcription as part of the prompt. This made GPT make fewer mistakes and improved output accuracy.
2. Confidence score. I extracted the embedded text from the PDF and compared the frequency of character triples in the source text and GPT’s output. If there was a significant difference (less than 90% overlap) I would log a warning. This helped detect cases when GPT omitted entire paragraphs of text.