rao-v (u/rao-v) - Readit News

rao-v commented on Tesla insiders have sold more than 50% of their shares in the last year electrek.co/2025/08/18/te... · Posted by u/MilnerRoute

Arubis · a day ago

It’s a liability on most index funds. I’m too lazy to manage active trading and shorting but would buy an ETF that tracks the S&P 500 minus TSLA.

rao-v · a day ago

Why not just buy a put on TSLA of the right size? It's probably cheaper to Vanguard + a put than to buy a more expensive specialized ETF

rao-v commented on Shamelessness as a strategy (2019) nadia.xyz/shameless... · Posted by u/wdaher

rao-v · 7 days ago

This generally is a version of what economics and game theory knows as countersignalling. A classic paper is “Too Cool for School” https://host.kelley.iu.edu/riharbau/cs-randfinal.pdf

Always worth pondering when it works, and when, for whom, and how it fails.

rao-v commented on Launch HN: Embedder (YC S25) – Claude code for embedded software · Posted by u/bobwei1

rao-v · 9 days ago

How good is your esp32 support? This could make a lot of home automation hobbyists very happy!

rao-v commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

nh43215rgb · 11 days ago

270M is nice (and rare) addition. Is there a reason why this is not categorized as gemma3n model? I thought small models go under gemma3n category

rao-v · 10 days ago

Not at Google (anymore), but Gemma3n is a radically different (and very cool) architecture. The MatFormer approach essentially lets you efficiently change how many parameters of the model you use while inferencing. The 2B model they released is just the sub model embedded in the original 4B model. You can also fiddle with the model and pull a 2.5 or 3B version pu also!

This is a more traditional LLM architecture (like the original Gemma 3 4B but smaller) and trained on an insane (for the size) number of tokens.

rao-v commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

canyon289 · 11 days ago

Hi all, I built these models with a great team. They're available for download across the open model ecosystem so give them a try! I built these models with a great team and am thrilled to get them out to you.

From our side we designed these models to be strong for their size out of the box, and with the goal you'll all finetune it for your use case. With the small size it'll fit on a wide range of hardware and cost much less to finetune. You can try finetuning them yourself in a free colab in under 5 minutes

For picking a Gemma size this is a video I recorded for the 1b to 27b sizes earlier this year, 270m being the newest addition

https://www.youtube.com/watch?v=qcjrduz_YS8

Hacker News Disclaimer I really like working at Google so with that; All my opinions here are my own, I'm a researcher so I'll largely focus on technical questions, and I'll share what I can.

rao-v · 10 days ago

Fabulous stuff!

Oh my request … the vision head on the Gemma models is super slow on CPU inferencing (and via Vulcan), even via llama.cpp. Any chance your team can figure out a solve? Other ViTs don’t have the same problem.

rao-v commented on Windows XP Professional win32.run/... · Posted by u/pentagrama

tczMUFlmoNk · 18 days ago

A classic article about a no-delay solution to this problem, not mentioned in the linked thread:

https://bjk5.com/post/44698559168/breaking-down-amazons-mega...

rao-v · 18 days ago

Lovely and simple … you’d think it would have become the best practice in most libraries by now

rao-v commented on Gemini Embedding: Powering RAG and context engineering developers.googleblog.com... · Posted by u/simonpure

stillpointlab · 25 days ago

> Embeddings are crucial here, as they efficiently identify and integrate vital information—like documents, conversation history, and tool definitions—directly into a model's working memory.

I feel like I'm falling behind here, but can someone explain this to me?

My high-level view of embedding is that I send some text to the provider, they tokenize the text and then run it through some NN that spits out a vector of numbers of a particular size (looks to be variable in this case including 768, 1536 and 3072). I can then use those embeddings in places like a vector DB where I might want to do some kind of similarity search (e.g. cosine difference). I can also use them to do clustering on that similarity which can give me some classification capabilities.

But how does this translate to these things being "directly into a model's working memory'? My understanding is that with RAG I just throw a bunch of the embeddings into a vector DB as keys but the ultimate text I send in the context to the LLM is the source text that the keys represent. I don't actually send the embeddings themselves to the LLM.

So what is is marketing stuff about "directly into a model's working memory."? Is my mental view wrong?

rao-v · 25 days ago

The directly into working memory bit is nonsense of course, but it does point to a problem that is probably worth solving.

What would it take to make the KV cache more portable and cut/paste vs. highly specific to the query?

In theory today, I should be able to process <long quote from document> <specific query> and just stop after the long document and save the KV cache right? The next time around, I can just load it in, and continue from <new query>?

To keep going, you should be able to train the model to operate so that you can have discontinous KV cache segments that are unrelated, so you can drop in <cached KV from doc 1> <cached KV from doc 2> with <query related to both> and have it just work ... but I don't think you can do that today.

I seem remember seeing some papers that tried to "unRoPE" the KV and then "re-RoPE" it, so it can be reused ... but I have not seen the latest. Anybody know what the current state is?

Seems crazy to have to re-process the same context multiple times just to ask it a new query.

rao-v commented on Gemini with Deep Think achieves gold-medal standard at the IMO deepmind.google/discover/... · Posted by u/meetpateltech

jeffbee · a month ago

Advanced Gemini, not Gemini Advanced. Thanks, Google. Maybe they should have named it MathBard.

rao-v · a month ago

Don’t give them ideas

rao-v commented on XMLUI blog.jonudell.net/2025/07... · Posted by u/mpweiher

rao-v · a month ago

I kind of want Python to have a desktop / web neutral library that does this and always am annoyed to learn it does not exist.

Do folks understand why?

rao-v commented on Code-GUI bidirectional editing via LSP jamesbvaughan.com/bidirec... · Posted by u/jamesbvaughan

ethan_smith · 2 months ago

The interoperability problem stems from CAD kernels using proprietary B-rep representations and constraint solvers, with STEP's AP242 standard attempting to address this by including Product Manufacturing Information (PMI) and semantic annotations, though adoption remains fragmented.

rao-v · 2 months ago

I’d have assumed that somebody had a format for just the basic operations … but it looks like even that is just being considered