veselin (u/veselin) - Readit News

veselin commented on Qwen3-Coder: Agentic coding in the world qwenlm.github.io/blog/qwe... · Posted by u/danielhanchen

veselin · a month ago

Anybody knows if one can find an inference provider that offers input token caching? It should be almost required for agentic use - first speed, but also almost all conversations start where the previous ended, so cost may end up quite higher with no caching.

I would have expected good providers like Together, Fireworks, etc support it, but I can't find it, except if I run vllm myself on self-hosted instances.

veselin commented on I'm dialing back my LLM usage zed.dev/blog/dialing-back... · Posted by u/sagacity

veselin · 2 months ago

I think that people are just too quick to assume this is amazing, before it is there. Which doesn't mean it won't get there.

Somehow if I take the best models and agents, most hard coding benchmarks are at below 50% and even swe bench verified is like at 75 maybe 80%. Not 95. Assuming agents just solve most problems is incorrect, despite it being really good at first prototypes.

Also in my experience agents are great to a point and then fall off a cliff. Not gradually. Just the type of errors you get past one point is so diverse, one cannot even explain it.

veselin commented on Gemini 2.5 Pro vs. Claude 3.7 Sonnet: Coding Comparison composio.dev/blog/gemini-... · Posted by u/mraniki

veselin · 5 months ago

I noticed a similar trends in selling on X. Put a claim, peg on some product A with good sales - Cursor, Claude, Gemini, etc. Then say, the best way to use A is with our best product, guide, being MCP or something else.

For some of these I see something like 15k followers on X, but then no LinkedIn page for example. Website is always a company you cannot contact and they do everything.

veselin commented on AMD 3D V-Cache teardown shows majority of the Ryzen 7 9800X3D is dummy silicon tomshardware.com/pc-compo... · Posted by u/rbanffy

rob74 · 9 months ago

Dummy silicon, in the same way as the majority of Leonardo's Mona Lisa is dummy wood panel (which supports the painting applied on top of it and ensures its structural integrity)? But how is this different from other chips out there? Ok, I get it that this is because of the chiplet technology used, but in a "traditional" chip the dummy silicon that provides the structural integrity would just be part of the chip, while in this case it's separate?

veselin · 9 months ago

Yes. The article is click bait. With such a title I would have expected majority of the area to be dummy, but it is just structurally more silicon, exactly like a picture may be majority of its mass wood.

veselin commented on The lifecycle of a code AI completion sourcegraph.com/blog/the-... · Posted by u/tosh

jerrygoyal · a year ago

has anyone tried Cody and Github Copilot and compared? I'm using GitHub Copilot and wouldn't mind switching to a better alternative.

veselin · a year ago

I used them both.

I ended up disabling copilot. The reason is that the completions do not always integrate with the rest of the code, in particular with non-matching brackets. Often it just repeats some other part of the code. I had much fewer cases of this with Cody. But, arguably, the difference is not huge. But then add on top of this choice of models.

veselin commented on Why AWS Supports Valkey aws.amazon.com/blogs/open... · Posted by u/alexbilbie

veselin · a year ago

It seems recent years give us a lot of licenses (for core infra software) and now for LLMs. They all say in very legalese basically: these top 5-10 tech companies will not compete fairly with us, thus they are banned from using the software. The rest are welcome to use everything.

I wonder if US monopoly regulation actually starts to work well, which I see some signs of happening, will all this license revert back to fully open source?

veselin commented on Jpegli: A new JPEG coding library opensource.googleblog.com... · Posted by u/todsacerdoti

JyrkiAlakuijala · a year ago

Yes!

veselin · a year ago

When I saw the name, I knew immediately this is Jyrki's work.

veselin commented on Models all the way down knowingmachines.org/model... · Posted by u/jdkee

fenomas · a year ago

Reading this felt like a huge waste of my time.

TFA picks out 4-5 realistically unavoidable features of AI training - like that a certain quality threshold value was chosen arbitrarily, or that there's more training data for English than for other languages - and then they hand-wavingly suggest that each feature could have huge implications about.. something. And then they move on to the next topic, without making any argument why the thing they just discussed is important, or what implications it might actually have.

veselin · a year ago

Exactly. The whole thing reads like some propaganda. It pits interesting topics ahead then to move on and push some agenda that sounds super political to me.

Yes, some languages are underrepresented and there are some thresholds. But exactly, it is well known that putting the threshold just slightly above or below will probably not materially affect the model.

veselin commented on EagleX 1.7T: Soaring past LLaMA 7B 2T in both English and Multi-lang evals substack.recursal.ai/p/ea... · Posted by u/lhofer

YetAnotherNick · a year ago

> All evals are 0 shot

My bet is that this is the reason they are scoring high in "their" benchmarks. For model which are just trained on completely unlabelled data like llama, 0 shot won't work well.

e.g. For llama Hellaswag accuracy is 57.13% in their benchmark compared to 78.59% in [1].

[1]: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...

veselin · a year ago

I think this is simply the default of lm-evaluation-harness. They said they ran every single benchmark they could out of the box.

veselin commented on Culture Change at Google social.clawhammer.net/blo... · Posted by u/kfogel

tonfa · 2 years ago

Gmail wasn't a 20%, I don't know why anyone thinks it is. Buchheit was asked to work on it.

veselin · 2 years ago

The product they often presented as started in 20% time is Google news. I don't know the actual details, just this is what I remember from my time at Google (2006-2012).

u/veselin

KarmaCake day365September 23, 2013View Original