Readit News logoReadit News
ianand commented on The spectrum of isolation: From bare metal to WebAssembly   buildsoftwaresystems.com/... · Posted by u/ThierryBuilds
ThierryBuilds · 2 months ago
I wrote this because I kept seeing developers (myself included) confuse language-level isolation like Python venv with OS-level isolation like Docker. I wanted to trace the actual technical boundaries between them.

The article maps out the differences between common execution environments—from physical bare metal and VMs to containers, process sandboxes, and virtual environments—to create a mental model of where the "isolation boundary" actually sits for each tool.

ianand · 2 months ago
Since you mention serverless it might be worth mentioning firecracker and v8 isolates.
ianand commented on The Q, K, V Matrices   arpitbhayani.me/blogs/qkv... · Posted by u/yashsngh
libraryofbabel · 2 months ago
This is ok (could use some diagrams!), but I don't think anyone coming to this for the first time will be able to use it to really teach themselves the LLM attention mechanism. It's a hard topic and requires two or three book chapters at least if you really want to start grokking it!

For anyone serious about coming to grips with this stuff, I would strongly recommend Sebastian Raschka's excellent book Build a Large Language Model (From Scratch), which I just finished reading. It's approachable and also detailed.

As an aside, does anyone else find the whole "database lookup" motivation for QKV kind of confusing? (in the article, "Query (Q): What am I looking for? Key (K): What do I contain? Value (V): What information do I actually hold?"). I've never really got it and I just switched to thinking of QKV as a way to construct a fairly general series of linear algebra transformations on the input of a sequence of token embedding vectors x that is quadratic in x and ensures that every token can relate to every other token in the NxN attention matrix. After all, the actual contents and "meaning" of QKV are very opaque: the weights that are used to construct them are learned during training. Furthermore, there is a lot of symmetry between Q and K in the algebra, which gets broken only by the causal mask. Or do people find this motivation useful and meaningful in some deeper way? What am I missing?

[edit: on this last question, the article on "Attention is just Kernel Smoothing" that roadside_picnic posted below looks really interesting in terms of giving a clean generalized mathematical approach to this, and also affirms that I'm not completely off the mark by being a bit suspicious about the whole hand-wavy "database lookup" Queries/Keys/Values interpretation]

ianand · 2 months ago
I'm not a fan of the database lookup analogy either.

The analogy I prefer when teaching attention is celestial mechanics. Tokens are like planets in (latent) space. The attention mechanism is like a kind of "gravity" where each token is influencing each other, pushing and pulling each other around in (latent) space to refine their meaning. But instead of "distance" and "mass", this gravity is proportional to semantic inter-relatedness and instead of physical space this is occurring in a latent space.

https://www.youtube.com/watch?v=ZuiJjkbX0Og&t=3569s

ianand commented on Intel: Winning and Losing   abortretry.fail/p/intel-w... · Posted by u/rbanffy
ianand · 10 months ago
The site’s domain name is the best use of a .fail tld ever.
ianand commented on Show HN: GPT-2 implemented using graphics shaders   github.com/nathan-barry/g... · Posted by u/nathan-barry
divan · 10 months ago
Someone needs to implement Excel using graphics shaders now.
ianand commented on Show HN: GPT-2 implemented using graphics shaders   github.com/nathan-barry/g... · Posted by u/nathan-barry
_dijs · 10 months ago
ianand, I immediately thought of you when I saw this post. Miss you friend.
ianand · 10 months ago
Dude, been forever. Thanks. Will DM you.
ianand commented on Show HN: GPT-2 implemented using graphics shaders   github.com/nathan-barry/g... · Posted by u/nathan-barry
ianand · 10 months ago
As the guy who did GPT2 in Excel, very cool and kudos!!

Curious why you chose WebGL over WebGPU? Just to show it can be done?

(Also see my other comment about fetching weights from huggingface)

ianand commented on Show HN: GPT-2 implemented using graphics shaders   github.com/nathan-barry/g... · Posted by u/nathan-barry
Philpax · 10 months ago
You may be able to fetch the weights directly from Hugging Face. I'd try that first.
ianand · 10 months ago
Checkout https://github.com/jseeio/gpt2-tfjs fetches the weights for GPT2 from huggingface on the fly.
ianand commented on Local LLM inference – impressive but too hard to work with   medium.com/@aazo11/local-... · Posted by u/aazo11
larodi · a year ago
Well, not sure about the final doc that went to the university, but this is the almost final draft.

https://docs.google.com/document/d/e/2PACX-1vSyWbtX700kYJgqe...

Since its in Cyrillic you should perhaps use a translation service. There are some screens showing results, though as I was really on a tight deadline, and its not a PHD but masters thesis, I decided to not go into in-depth evaluation of the proposed methodology against SPIDER (https://yale-lily.github.io/spider). Even though you can find the simplifed GBNF grammar, also some of the outputs. The grammar, interestingly it benefits/exploits a bug in llama.cpp which allows some sort of recursively-chained rules. Bibliography is in English, but really - there is so much written on the topic, by no means comprehensive.

Sadly no open inference engine (at time of writing) was both good enough in beam search, and grammars, so this whole things needs to perhaps be redone in pytorch.

If I find myself in a position to do this for commercial goals, I'd also explore the possibility of having human-catered SQLs against the particular schema, in order to guide the model better. And then do RAG on the DB for more context. Note: I'm already doing E/R model reduction to the minimal connected graph which includes all entities of particular interest to the present query.

And finally, since you got that far - the real real problem with restricting LLM output with grammars is the tokenization. Because all parsers work reading one char at a time, and tokens are very often few chars, so the parser in a way needs to be able to "lookahead", which it normally does not. I believe OpenAI wrote they realized this also, but I can't really find the article atm.

ianand · a year ago
Thanks. Took a quick look and definitely needed to use Google Translate but seems to have worked to get the gist of it.
ianand commented on Local LLM inference – impressive but too hard to work with   medium.com/@aazo11/local-... · Posted by u/aazo11
larodi · a year ago
Having done my masters on the topic of grammar-assisted text2sql let me add some additional context here:

- first of all local inference can never beat cloud inference for the very simple reason that costs go down with batching. it took me two years to actually understand what batching is - the LLM tensors flowing through transformer layers has a dimension designed specifically for processing data in parallel. so no matter if you process a 1 sequence or 128 sequences the costs are the same. i've read very few articles overstating this, so bear in mind - this is the primary stopper for competing local inference with cloud inference.

- second, and this is not a light one to take - LLM-assisted text2sql is not trivial, not at all. you may think it is, you may expect cutting-edge models to do it right, but there are ...plenty of reasons models fail so badly at this seemingly trivial task. you may start with arbitrary article such as https://arxiv.org/pdf/2408.14717 and dig the references, sooner or later you will stumble on one of dozens overview papers by mostly Chinese researchers (such as https://arxiv.org/abs/2407.10956) where overview of approaches is summarized. Caution: you may feel both inspired AI will not take over your job, or you may feel miserable how much effort is spent on this task and how badly everything fails in real-world scenarios

- finally, something we agreed with a professor advising a doctorate candidate whose thesis surprisingly was on the same topic. basically given GraphQL and other structured formats such as JSON, which LLMs are much better leaned on than the complex grammar of SQL which is not a regular grammar, but context-free one, which takes more complex machines to parse it and also very often recursion.

- which brings us to the most important question - why commercial GPTs fare so much better on it than local models. well, it is presumed top players, not only use MoEs but they also employ beam search, perhaps speculative inference and all sorts of optimizations on the hardware level. while this all is not beyond comprehension for a casual researcher at a casual university (like myself) you don't get to easily run this all locally. I have not written an inference engine myself, but I imagine MoE and beam search is super compled, as beam search basically means - you fork the whole LLM execution state and go back and forth. Not sure how this even works together with batching.

So basically - this is too expensive. Besides atm (to my knowledge) only vllm (the engine) has some sort of reasonably working local beam search. I would've loved to see llama.cpp's beam search get a rewrite, but it stalled. Trying to get beamsearch working with current python libs is nearly impossible for commodity hardware, even if you have 48gigs of ram, which already means a very powerful GPU.

ianand · a year ago
Sounds like an interesting masters thesis. Is your masters thesis available online somewhere?
ianand commented on Xee: A Modern XPath and XSLT Engine in Rust   blog.startifact.com/posts... · Posted by u/robin_reala
ianand · a year ago
Fun fact: A decade ago the designer of HAML and Sass created a modern alternative to XSLT. https://en.wikipedia.org/wiki/Tritium_(programming_language)

u/ianand

KarmaCake day371August 21, 2007
About
http://twitter.com/ianand https://www.linkedin.com/in/ishananand/
View Original