cccybernetic (u/cccybernetic)

cccybernetic commented on Withnail and I (2001) criterion.com/current/pos... · Posted by u/dcminter

cccybernetic · 10 months ago

An incredibly written film that's infinitely quotable.

"Balls. We want the finest wines available to humanity. We want them here, and we want them now!"

"Here Hare Here"

"I feel like a pig shat in my head."

"I don't advise a haircut, man. All hairdressers are in the employment of the government. Hair are your aerials. They pick up signals from the cosmos and transmit them directly into the brain. This is the reason bald-headed men are uptight."

cccybernetic commented on Ingesting PDFs and why Gemini 2.0 changes everything sergey.fyi/articles/gemin... · Posted by u/serjester

scottydelta · a year ago

This is what I am trying to figure out how to solve.

My problem statement is:

- Injest PDFs, summarize, and extract important information.

- Have some way to overlay the extracted information on the pdf in the UI.

- User can provide feedback on the overlaid info by accepting or rejecting the highlights as useful or not.

- This info goes back in to the model for reinforced learning.

Hoping to find something that can make this more manageable.

cccybernetic · a year ago

Most PDF parsers give you coordinate data (bounding boxes) for extracted text. Use these to draw highlights over your PDF viewer - users can then click the highlights to verify if the extraction was correct.

The tricky part is maintaining a mapping between your LLM extractions and these coordinates.

One way to do it would be with two LLM passes:

  1. First pass: Extract all important information from the PDF
  2. Second pass: "Hey LLM, find where each extraction appears in these bounded text chunks"

Not the cheapest approach since you're hitting the API twice, but it's straightforward!

cccybernetic commented on Ingesting PDFs and why Gemini 2.0 changes everything sergey.fyi/articles/gemin... · Posted by u/serjester

cccybernetic · a year ago

Shameless plug: I'm working on a startup in this space.

But the bounding box problem hits close to home. We've found Unstructured's API gives pretty accurate box coordinates, and with some tweaks you can make them even better. The tricky part is implementing those tweaks without burning a hole in your wallet.

cccybernetic commented on Ask HN: Those making $500/month on side projects in 2024 – Show and tell · Posted by u/cvbox

laylower · a year ago

Could you please link your website?

cccybernetic · a year ago

Sure, you can try the demo at:

https://www.subsystem.ai/demo

cccybernetic commented on Ask HN: Those making $500/month on side projects in 2024 – Show and tell · Posted by u/cvbox

giarc · a year ago

How do you reduce errors or hallucinations? I recently uploaded a very clear PDF to meta.ai and asked it a few, very simple questions. It completely made up quotes, including page numbers, section numbers etc.

cccybernetic · a year ago

I don't feed documents directly to an LLM. First, extract and process the data in a structured way that maintains the hierarchy and metadata of the content (this is important!). Then convert this into a scheme that you can control — it doesn’t really matter what it is (JSON, XML, markdown). From there, feed this to the LLM in chunks. This will get you most of the way there.

There's different ways to validate, but that's why maintaining hierarchy and metadata is so important. If you track this information properly, you can cross-check responses across different LLMs!

cccybernetic commented on Ask HN: Those making $500/month on side projects in 2024 – Show and tell · Posted by u/cvbox

cccybernetic · a year ago

I built a web app that extracts data from documents, like PDFs, Word, etc. I've seen people say "GPT wrapper", but it consistently outperforms similar tools in the space. My main customer is a private equity fund that randomly reached out. I didn't know much at all about fintech, but it works and gets the job done.

I don't have a proper marketing site yet since I've been focused on building the app, but it's coming soon (hopefully...)

cccybernetic commented on Show HN: Documind – Open-source AI tool to turn documents into structured data github.com/DocumindHQ/doc... · Posted by u/Tammilore

asjfkdlf · a year ago

I am looking for a similar service that turns any document (PNG, PDf, DocX) into JSON (preserving the field relationships). I tried with ChatGPT, but hallucinations are common. Does anything exist?

cccybernetic · a year ago

I built a drag-and-drop document converter that extracts text into custom columns (for CSV) or keys (for JSON). You can schedule it to run at certain times and update a database as well.

I haven't had issues with hallucinations. If you're interested, my email is in my bio.

cccybernetic commented on Why won't some people pay for news? (2022) diaspora.glasswings.com/p... · Posted by u/dredmorbius

Johanx64 · 2 years ago

Because you don't need "news".

What do you need "news" for? To do what with it exactly?

Are any of the "news" items actionable in any sort of benefitial way to you?

What's the signal to noise ratio?

The answer of course is no, none of it is actionable, and almost all of it is garbage and noise.

This would be mostly true even for highly accurate news and high quality reporting.

And if the information was valuable, it wouldn't be called "news" to begin with.

cccybernetic · 2 years ago

I haven't seen it framed this way, but yeah - well put.

cccybernetic commented on I am starting an AI+Education company twitter.com/karpathy/stat... · Posted by u/bilsbie

champdebloom · 2 years ago

I’m a teacher turned web developer building tools to help other teachers automate their menial admin tasks. I’d love to chat when you have a moment!

cccybernetic · 2 years ago

Absolutely, email is in my profile. Please reach out.

cccybernetic commented on I am starting an AI+Education company twitter.com/karpathy/stat... · Posted by u/bilsbie

cccybernetic · 2 years ago

This is a problem I’m working on.

I’m a software engineer at major US research university developing AI-powered software to improve critical reading and writing skills in higher ed. The idea is to provide immediate, high-quality feedback to students, closing the “latency” of submitting something and waiting to hear back from you professor.

I do genuinely think AI can reshape teaching and learning, but it will be a slow iterative process. We can use it scale what works (personalized learning and tutoring, helping students develop mastery/automaticity on topics, targeting areas where they struggle). It can also automate time-consuming tasks that bog teachers down.

If you're interested in pedagogy, AI, and tech, please reach out.