stingraycharles (u/stingraycharles)

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

websiteapi · 16 hours ago

my issue is once you have it in your workflow I'd be pretty latency sensitive. imagine those record-it-all apps working well. eventually you'd become pretty reliant on it. I don't want to necessarily be at the whims of the cloud

stingraycharles · 15 hours ago

Aren’t those “record it all” applications implemented as a RAG and injected into the context based on embedding similarity?

Obviously you’re not going to always inject everything into the context window.

stingraycharles commented on Go Proposal: Secret Mode antonz.org/accepted/runti... · Posted by u/enz

Thorrez · 17 hours ago

> If an offset in an array is itself secret (you have a data array and the secret key always starts at data[100]), don't create a pointer to that location (don't create a pointer p to &data[100]). Otherwise, the garbage collector might store this pointer, since it needs to know about all active pointers to do its job. If someone launches an attack to access the GC's memory, your secret offset could be exposed.

That doesn't make sense to me. How can the "offset in an array itself" be "secret" if it's "always" 100? 100 isn't secret.

stingraycharles · 15 hours ago

I think it may be about the absolute memory address to the secret being stored, which may itself be exploitable (ie you’re thinking about the offset value, rather than the pointer value). it’s about leaking even indirect information that could be exploited in different ways. From my understanding, this type of cryptography goes to extremely lengths to basically hide everything.

That’s my hunch at least, but I’m not a security expert.

The example could probably have been better phrased.

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

Kim_Bruning · 16 hours ago

I can't help you then. You can find a close analogue in the OSS/CIA Simple Sabotage Field Manual. [1]

For that reason, I don't trust Agents (human or ai, secret or overt :-P) who don't push back.

[1] https://www.cia.gov/static/5c875f3ec660e092cf893f60b4a288df/... esp. Section 5(11)(b)(14): "Apply all regulations to the last letter." - [as a form of sabotage]

stingraycharles · 16 hours ago

How is asking for clarification before pushing back a bad thing?

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

IgorPartola · 16 hours ago

Full instruction following looks like monkey’s paw/malicious compliance. A good way to eliminate a bug from a codebase is to delete the codebase, that type of thing. You want the model to have enough creative freedom to solve the problem otherwise you are just coding using an imprecise language spec.

I know what you mean: a lot of my prompts include “never use em-dashes” but all models forget this sooner or later. But in other circumstances I do want it to push back on something I am asking. “I can implement what you are asking but I just want to confirm that you are ok with this feature introducing an SQL injection attack into this API endpoint”

stingraycharles · 16 hours ago

My point is that it’s better that the model asks questions to better understand what’s going on before pushing back.

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

InsideOutSanta · 16 hours ago

I would assume that if the model made no assumptions, it would be unable to complete most requests given in natural language.

stingraycharles · 16 hours ago

Well yes, but asking the model to ask questions to resolve ambiguities is critical if you want to have any success in eg a coding assistant.

There are shitloads of ambiguities. Most of the problems people have with LLMs is the implicit assumptions being made.

Phrased differently, telling the model to ask questions before responding to resolve ambiguities is an extremely easy way to get a lot more success.

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

givinguflac · 17 hours ago

I think you’re missing the whole point, which is not using cloud compute.

stingraycharles · 16 hours ago

Because of privacy reasons? Yeah I’m not going to spend a small fortune for that to be able to use these types of models.

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

lordswork · 16 hours ago

As long as you're willing to wait up to an hour for your GPU to get scheduled when you do want to use it.

stingraycharles · 16 hours ago

I don’t understand what you’re saying. What’s preventing you from using eg OpenRouter to run a query against Kimi-K2 from whatever provider?

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

Kim_Bruning · 17 hours ago

> Isn’t “instruction following” the most important thing you’d want out of a model in general,

No. And for the same reason that pure "instruction following" in humans is considered a form of protest/sabotage.

https://en.wikipedia.org/wiki/Work-to-rule

stingraycharles · 17 hours ago

I don’t understand the point you’re trying to make. LLMs are not humans.

From my perspective, the whole problem with LLMs (at least for writing code) is that it shouldn’t assume anything, follow the instructions faithfully, and ask the user for clarification if there is ambiguity in the request.

I find it extremely annoying when the model pushes back / disagrees, instead of asking for clarification. For this reason, I’m not a big fan of Sonnet 4.5.

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

websiteapi · 17 hours ago

I get tempted to buy a couple of these, but I just feel like the amortization doesn’t make sense yet. Surely in the next few years this will be orders of magnitude cheaper.

stingraycharles · 17 hours ago

I don’t think it will ever make sense; you can buy so much cloud based usage for this type of price.

From my perspective, the biggest problem is that I am just not going to be using it 24/7. Which means I’m not getting nearly as much value out of it as the cloud based vendors do from their hardware.

Last but not least, if I want to run queries against open source models, I prefer to use a provider like Groq or Cerebras as it’s extremely convenient to have the query results nearly instantly.

stingraycharles commented on Kimi K2 1T model runs on 2 512GB M3 Ultras twitter.com/awnihannun/st... · Posted by u/jeudesprits

A_D_E_P_T · 17 hours ago

Kimi K2 is a really weird model, just in general.

It's not nearly as smart as Opus 4.5 or 5.2-Pro or whatever, but it has a very distinct writing style and also a much more direct "interpersonal" style. As a writer of very-short-form stuff like emails, it's probably the best model available right now. As a chatbot, it's the only one that seems to really relish calling you out on mistakes or nonsense, and it doesn't hesitate to be blunt with you.

I get the feeling that it was trained very differently from the other models, which makes it situationally useful even if it's not very good for data analysis or working through complex questions. For instance, as it's both a good prose stylist and very direct/blunt, it's an extremely good editor.

I like it enough that I actually pay for a Kimi subscription.

stingraycharles · 17 hours ago

> As a chatbot, it's the only one that seems to really relish calling you out on mistakes or nonsense, and it doesn't hesitate to be blunt with you.

My experience is that Sonnet 4.5 does this a lot as well, but this is more often than not due to a lack of full context, eg accusing the user of not doing X or Y when it just wasn’t told that was already done, and proceeding to apologize.

How is Kimi K2 in this regard?

Isn’t “instruction following” the most important thing you’d want out of a model in general, and a model pushing back more likely than not being wrong?