jonathan-adly (u/jonathan-adly)

jonathan-adly commented on Are OpenAI and Anthropic losing money on inference? martinalderson.com/posts/... · Posted by u/martinald

jonathan-adly · 14 days ago

Basically- the same math as modern automated manufacturing. Super expensive and complex build-out - then a money printer once running and optimized.

I know there is lots of bearish sentiments here. Lots of people correctly point out that this is not the same math as FAANG products - then they make the jump that it must be bad.

But - my guess is these companies end up with margins better than Tesla (modern manufacturer), but less than 80%-90% of "pure" software. Somewhere in the middle, which is still pretty good.

Also - once the Nvidia monopoly gets broken, the initial build out becomes a lot cheaper as well.

jonathan-adly commented on Malleable Software mdubakov.me/malleable-sof... · Posted by u/tablet

jonathan-adly · 15 days ago

We have been running this playbook for the last 2 years in healthcare, and we have been super successful. Doubling every quarter over the last year. 70%+ profitability, almost 7 figures of revenue. 100% bootstrapped.

People are still mentally locked in to the world where code was expensive. Code now is extremely cheap. And if it is cheap, then it makes sense that every customer gets their own.

Before - we built factories to give people heavy machinery. Now, we run a 3d printer.

Everyday I thank SV product-led growth cargo cults for telling, sometimes even forcing our competition to not go there.

jonathan-adly commented on Locality of Behaviour (2020) htmx.org/essays/locality-... · Posted by u/jstanley

jonathan-adly · 2 months ago

One of the most pleasant experiences I had writing code, is early AI days when we did hyperscript SSE. Super locality of behavior, super interesting way of writing Server Sent Events code.

eventsource demo from http://server/demo

    on message as string
        put it into #div
    end

    on open
        log "connection opened."
    end

    on close
        log "connection closed."
    end

    on error
        log "handle error here..."
    end

end https://hyperscript.org/features/event-source/

jonathan-adly commented on Ask HN: How did Soham Parekh get so many jobs? · Posted by u/jshchnz

jonathan-adly · 2 months ago

Lots of YC companies copy each other process and selection criteria. Basically- they all have the same blind spots and look for the same type of engineer.

So, super easy to scam all of them with the same skillset and mannerism.

jonathan-adly commented on The Grug Brained Developer (2022) grugbrain.dev/... · Posted by u/smartmic

jonathan-adly · 3 months ago

I send this article as part of onboarding for all new devs we hire. It is super great to keep a fast growing team from falling into the typical cycle of more people, more complexity.

jonathan-adly commented on How we used GPT-4o for image detection with 350 similar illustrations olup-blog.pages.dev/stori... · Posted by u/olup

ResearchAtPlay · 8 months ago

Thanks for the link to the ColPali implementation - interesting! I am specifically interested in evaluation benchmarks for different image embedding models.

I see the ColiVara-Eval repo in your link. If I understand correctly, ColQwen2 is the current leader followed closely by ColPali when applying those models for RAG with documents.

But how do those models compare to each other and to the llama3.2-vision embeddings when applied to, for example, sentiment analysis for photos? Do benchmarks like that exist?

jonathan-adly · 8 months ago

The “equivalent” here would be Jina-Clip (architecture-wise), not necessarily performance.

The ColPali paper(1) does a good job explaining why you don’t really want to directly use vision embeddings; and how you are much better off optimizing for RAG with a ColPali like setup. Basically, it is not optimized for textual understanding, it works if you are searching for the word bird; and images of birds. But doesn’t work well to pull a document where it’s a paper about birds.

1. https://arxiv.org/abs/2407.01449

jonathan-adly commented on How we used GPT-4o for image detection with 350 similar illustrations olup-blog.pages.dev/stori... · Posted by u/olup

ResearchAtPlay · 8 months ago

Yes, you could implement image similarity search using embeddings: Create embeddings for the entire image set, save the embeddings in a database, and add embeddings incrementally as new images come in. To search for a similar image, create the embedding for the image that you are looking for and compute the cosine similarity between that embedding and the embeddings in your database. The closer the cosine similarity is to 1.0 the more similar the images.

For choosing a model, the article mentions the AWS Titan multimodal model, but you’d have to pay for API access to create the embeddings. Alternatively, self-hosting the CLIP model [0] to create embeddings would avoid API costs.

Follow-up question: Would the embeddings from the llama3.2-vision models be of higher quality (contain more information) than the original CLIP model?

The llama vision models use CLIP under the hood, but they add a projection head to align with the text model and the CLIP weights are mutated during alignment training, so I assume the llama vision embeddings would be of higher quality, but I don’t know for sure. Does anybody know?

(I would love to test this quality myself but Ollama does not yet support creating image embeddings from the llama vision models - a feature request with several upvotes has been opened [1].)

[0] https://github.com/openai/CLIP

[1] https://github.com/ollama/ollama/issues/5304

jonathan-adly · 8 months ago

So, there is a whole world with vision based RAG/search.

We have a good open-source repo here with a ColPali implementation: https://github.com/tjmlabs/ColiVara

jonathan-adly commented on Show HN: IR_evaluation – Information retrieval evaluation metrics in pure Python github.com/plurch/ir_eval... · Posted by u/plurch

jonathan-adly · 8 months ago

Great work! Honestly it helps so much just explaining these metrics for folks.

Early on RAG was an art, now when things are stabilized a bit, it’s more of a science - and vendors should at a minimum have some benchmarks.

jonathan-adly commented on Nvidia-Ingest: Multi-modal data extraction github.com/NVIDIA/nv-inge... · Posted by u/mihaid150

hammersbald · 8 months ago

Is there a OCR toolkit or a ML Model which is able to reliable extract tables from invoices?

jonathan-adly · 8 months ago

I would like to through our project in the ring. We use ColQwen2 over a ColPali implementation. Basically, search & extract pipeline: https://docs.colivara.com/guide/markdown