What was the total token count? The output is impressive (if accurate). I'm also curious if doing things like omitting function bodies would alter the output (obvious makes the process cheaper and would enable larger projects, but may lead to worse analyses).
From the Anthropic logs looks like 1827 input tokens and 1038 output tokens.
I'm still on the free trial API plan, but at Opus price of $15 per million input tokens and $75 per million output tokens that comes to about 10.5 cents.
Do you think it's possible to play a game of 20 questions and go from not knowing what HNSW is and in what context it is used, to become a sophisticated user of any library that implements this?
Thanks for the notes, I thought it would be a more detailed prompt :) Any reason why you choose Opus instead of other LLMs, was it because of the 200k context window?
I'm defaulting to Opus at the moment partly because it's brand new and so I need to spend time with it to get a feel for it - but also because so far it seems to be better than GPT-4 for code stuff. I've had a bunch of examples of it writing mistake-free code that GPT-4 had generated with small bugs in.
From a quick survey of the implementation probably not very well since, for example, it is using dynamic dispatching for all distance calculations and there are a lot of allocations in the hot-path.
Maybe it would be better to post this repository as a reference / teaching implementation of HNSW.
This is in pretty early stages. Might consider instant-distance which I wrote a few years ago and which is in production use at instantdomainsearch.com:
I can answer #3. HNSW will allow for incremental index rebuilding. So each additional insert is a sublinear, but greater than constant time, operation.
I can answer how it would be in Qdrant if interested. The index will take around 70GB RAM. New vectors are first placed in a non-indexed segment and are immediately available for search while the index is being built. The vectors and the index can be offloaded to disk. Search will take some milliseconds.
https://gist.github.com/simonw/9ff9a0ab8ab64e8aa8d160c4294c0...
You don't have a license on the code yet (weirdly Claude hallucinated MIT).
Notes on how I generated this in the comments on that Gist.
I'm still on the free trial API plan, but at Opus price of $15 per million input tokens and $75 per million output tokens that comes to about 10.5 cents.
As far as HNSW implementations go, this one appears to be almost entirely unfinished. Node insertion logic is missing (https://github.com/swapneel/hnsw-rust/blob/b8ef946bd76112250...) and so is the base layer beam search.
Maybe it would be better to post this repository as a reference / teaching implementation of HNSW.
https://github.com/instant-labs/instant-distance
There are Python bindings, too:
https://pypi.org/project/instant-distance/
- how much time to insert 15 millions of vectors of 768 f32?
- how much RAM needed for this operation?
- if inserting another vector, how incremental is the insertion? Is it faster than reindexing the 15M + 1 vectors from scratch?
- does the structure need to stay in RAM or can it be efficiently queried from a serialized représentation?
- how fast is the search in the 15M vectors on average?
Indexing time isn't great, but query time is surprisingly good for it being written in unoptimized python and numpy.