hugodutka (u/hugodutka)

hugodutka commented on Blockdiff: We built our own file format for VM disk snapshots cognition.ai/blog/blockdi... · Posted by u/cyanf

hugodutka · 3 months ago

Have you considered https://github.com/containerd/overlaybd? It seems to offer very similar features to blockdiff.

hugodutka commented on Show HN: AgentAPI – HTTP API for Claude Code, Goose, Aider, and Codex github.com/coder/agentapi... · Posted by u/hugodutka

andrewfromx · 8 months ago

can you compare this to https://github.com/eyaltoledano/claude-task-master ?

hugodutka · 8 months ago

I haven't used claude-task-master before, but based on the README, it looks like it's an AI agent that integrates well with IDEs. In contrast, AgentAPI lets you control other agents - like Claude Code or OpenAI Codex - using HTTP calls instead of typing commands into the terminal. For example, you could use AgentAPI to control Claude Code from a custom frontend, such as a native desktop application.

hugodutka commented on Show HN: Zerox – Document OCR with GPT-mini github.com/getomni-ai/zer... · Posted by u/themanmaran

nbbaier · a year ago

> I extracted the embedded text from the PDF

What did you use to extract the embedded text during this step? Other than some other OCR tech

hugodutka · a year ago

PyMuPDF, a PDF library for Python.

hugodutka commented on Show HN: Zerox – Document OCR with GPT-mini github.com/getomni-ai/zer... · Posted by u/themanmaran

sidmitra · a year ago

>frequency of character triples

What are character triples? Are they trigrams?

hugodutka · a year ago

I think so. I'd normalize the text first: lowercase it and remove all non-alphanumeric characters. E.g for the phrase "What now?" I'd create these trigrams: wha, hat, atn, tno, now.

hugodutka commented on Show HN: Zerox – Document OCR with GPT-mini github.com/getomni-ai/zer... · Posted by u/themanmaran

hugodutka · a year ago

I used this approach extensively over the past couple of months with GPT-4 and GPT-4o while building https://hotseatai.com. Two things that helped me:

1. Prompt with examples. I included an example image with an example transcription as part of the prompt. This made GPT make fewer mistakes and improved output accuracy.

2. Confidence score. I extracted the embedded text from the PDF and compared the frequency of character triples in the source text and GPT’s output. If there was a significant difference (less than 90% overlap) I would log a warning. This helped detect cases when GPT omitted entire paragraphs of text.