Using Llamafiles for embeddings in local RAG applications

I'm always curious how the licensing aspects will play out. It's quite clear to me that most of the LLMs contain copyrighted material without correct rights. And then they turn around and put up some restrictive licensing on it (for example the Salesforce model here).

I know they add valuable input to it, but CC-BY-NC is really rubbing me the wrong way.

andy99 · a year ago

It's really two different things. If using data to train is either fair use or not a use at all (I belive it is, it's for all intents and purposes the same as reading it), the copyright on the training data is irrelevant.

Whether weights can be copyrighted at all (which is the basis of these licenses) is also unclear. Though again I think they should be, they are just as creative a work as a computer program for any nontrivial model release (though really I think it's important to be able to enforce copyleft on them more than anything).

Also, all these laws work to the benefit of who ever has the deepest pockets, so salesforce will win against most others by virtue of this, regardless of how the law shakes out.

tossandthrow · a year ago

But then again, my program is merely reading these weights.

I’d love if Firefox would feed the text content of each website I visit locally and would allow me to RAG search this database. So often do I want to re-visit a website I visited weeks earlier but can’t find it again.

bravura · a year ago

You could use archivebox with the archivebox web extension. And then use a separate offline / batch process to embed and RAG your archive.

jnnnthnn · a year ago

You might not want them to have that information, but I think Google's history search now supports that for Chrome users: https://myactivity.google.com/myactivity

klavinski · a year ago

I have built a Chrome extension to do this one year ago: [0]

Here is the list of technological problems:

1. When is a page ready to be indexed? Many websites are dynamic.

2. How to find the relevant content? (To avoid indexing noise)

3. How to keep an acceptable performance? Computing embeddings on each page is enough to transform a laptop into a small helicopter with its fans. (I used 384 as the embedding dimension. Below, too imprecise; above, too compute-heavy).

4. How to chunk a page? It is not enough to split the content into sentences. You must add context to them.

5. How to rank the results of a search? PageRank is not applicable here.

[0] https://www.youtube.com/watch?v=GYwJu5Kv-rA

rdli · a year ago

I'm working on something like this! It's simple in concept, but there are lots of fiddly bits. A big one is performance (at least, without spending $$$$$ on GPUs.) I haven't found that much in terms of how to tune/deploy LLMs on commodity cloud hardware, which is what I'm trying this out on.

leobg · a year ago

You can use ONXX versions of embedding models. Those run faster on CPU.

Also, don’t discount plain old BM25 and fastText. For many queries, keyword or bag-of-words based search works just as well as fancy 1536 dim vectors.

You can also do things like tokenize your text using the tokenizer that GPT-4 uses (via tiktoken for instance) and then index those tokens instead of words in BM25.

pizza · a year ago

This style of embeddings could be quite lightweight/cheap/efficient https://github.com/cohere-ai/BinaryVectorDB

Tostino · a year ago

Embedding models are generally lightweight enough to run on CPU, can be done in the background while the user isn't using their device.

stavros · a year ago

https://historio.us

2024throwaway · a year ago

The OP asked for the data to be stored locally. You linked to a hosted service with a subscription model. Very much not the same.

bravura · a year ago

I've been curious about historious for a while, but I want to ask how it's different from pinboard.

From what I can see, they both have the ability to archive / FTS your bookmarks.

But in terms of API access, historious only allows WRITE access (ugh), where at least pinboard allows read/write.

What else am I missing?