mopierotti (u/mopierotti)

mopierotti commented on Can I run AI locally? canirun.ai/... · Posted by u/ricardbejarano

comboy · 19 hours ago

What is the $/Mtok that would make you choose your time vs savings of running stuff locally?

Just to be clear, it may sound like a snarky comment but I'm really curious from you or others how do you see it. I mean there are some batches long running tasks where ignoring electricity it's kind of free but usually local generation is slower (and worse quality) and we all kind of want some stuff to get done.

Or is it not about the cost at all, just about not pushing your data into the clouds.

mopierotti · 18 hours ago

Good question. I agree with what I think you're implying, which is that local generation is not the right choice if you want to maximize results per time/$ spent. In my experience, hosted models like Claude Opus 4.6 are just so effective that it's hard to justify using much else.

Nevertheless, I spend a lot of time with local models because of:

1. Pure engineering/academic curiosity. It's a blast to experiment with low-level settings/finetunes/lora's/etc. (I have a Cog Sci/ML/software eng background.)

2. I prefer not to share my data with 3rd party services, and it's also nice to not have to worry too much about accidentally pasting sensitive data into prompts (like personal health notes), or if I'm wasting $ with silly experiments, or if I'm accidentally poisoning some stateful cross-session 'memories' linked to an account.

3. It's nice to be able solve simple tasks without having to reason about any external 'side-effects' outside my machine.

mopierotti commented on Can I run AI locally? canirun.ai/... · Posted by u/ricardbejarano

J_Shelby_J · 20 hours ago

It’s a hard problem. I’ve been working on it for the better part of a year.

Well, granted my project is trying to do this in a way that works across multiple devices and supports multiple models to find the best “quality” and the best allocation. And this puts an exponential over the project.

But “quality” is the hard part. In this case I’m just choosing the largest quants.

mopierotti · 18 hours ago

Supporting all the various devices does sound quite challenging.

I wouldn't expect a perfect single measurement of "quality" to exist, but it seems like it could be approximated enough to at least be directionally useful. (e.g. comparing subsequent releases of the same model family)

mopierotti commented on Can I run AI locally? canirun.ai/... · Posted by u/ricardbejarano

mopierotti · 21 hours ago

This (+ llmfit) are great attempts, but I've been generally frustrated by how it feels so hard to find any sort of guidance about what I would expect to be the most straightforward/common question:

"What is the highest-quality model that I can run on my hardware, with tok/s greater than <x>, and context limit greater than <y>"

(My personal approach has just devolved into guess-and-check, which is time consuming.) When using TFA/llmfit, I am immediately skeptical because I already know that Qwen 3.5 27B Q6 @ 100k context works great on my machine, but it's buried behind relatively obsolete suggestions like the Qwen 2.5 series.

I'm assuming this is because the tok/s is much higher, but I don't really get much marginal utility out of tok/s speeds beyond ~50 t/s, and there's no way to sort results by quality.

mopierotti commented on Elegance is Bullshit hichvg.substack.com/p/ele... · Posted by u/chvgchvg

chvgchvg · 2 months ago

Yes I agree, different words do hold different meanings, and less common words hold less specific meanings, and I love reading well crafted, elegant writing. I agree with your point. I guess I'm pouring out my hate on "fake crafted elegance". Like the one you mentioned: "using a thesaurus and agonizing over sentence structure" ONLY for the purpose of making it "sound" better. Not to make it sound truer to you, but to make it sound better to the ears.

mopierotti · 2 months ago

Thanks for the response! I hear what you're saying, and I apologize for my joke.

I re-read your essay, and what you say about needing sophistication reminds me of the concept of proof-of-work -- "sophistication" could be a way to convey that effort was spent by a writer, even if it doesn't add meaning. That is kind of inherently annoying, because it implies a lack of trust between the author and reader, and in the thesaurus example, the reader would be rightfully annoyed to spent time parsing a sentence only to find that the "proof of effort" was actually just a "facade of effort".

mopierotti commented on Elegance is Bullshit hichvg.substack.com/p/ele... · Posted by u/chvgchvg

mopierotti · 2 months ago

It's clever that the the author provides both his essay and an example at the same time! Sorry, that joke felt obligatory.

Miscellaneous reactions, in an elegant bulleted list:

- "Simple" sentences are certainly expressive, but "elegant" wording expands the set of meanings that can be conveyed. And vice-versa

- I think a lot of the meat of a sentence is conveyed in the connotations of words and not their literal meaning. "Simple" wording is necessarily more common, and therefore will necessarily have a less specific or reliable connotation. This is a blessing and a curse.

- More subjectively, I think ideal writing is also a window into the author's experience of the world (or moreso whatever topic they're writing about), and as a reader, I want that to come through in an authentic way that matches the author's experience. So, using a thesaurus and agonizing over sentence structure might end up 'elegant' but still vaguely bad, but on the other hand if you agonize over a sentence and come up with something more "sophisticated" that ultimately rings truer to you, then go for it.

- ^ The above points aren't direct rebuttals to TFA, but I think they relate to why elegance can be appealing.

mopierotti commented on Claude Opus 4.5 anthropic.com/news/claude... · Posted by u/adocomplete

stingraycharles · 4 months ago

All those are completely irrelevant. Quantization is just a cost optimization.

People are claiming that Anthropic et all changes the quality of the model after the initial release, which is entirely different and the industry as a whole has denied. When a model is released under a certain version, the model doesn’t change.

The only people who believe this are in the vibe coding community, believing that there’s some kind of big conspiracy, but any time you mention “but benchmarks show the performance stays consistent” you’re told you’re licking corporate ass.

mopierotti · 4 months ago

I might be misunderstanding your point, but quantization can have a dramatic impact on the quality of the model's output.

For example, in diffusion, there are some models where a Q8 quant dramatically changes what you can achieve compared to fp16. (I'm thinking of the Wan video models.) The point I'm trying to make is that it's a noticeable model change, and can be make-or-break.

mopierotti commented on Ask HN: Share your AI prompt that stumps every model · Posted by u/owendarko

econ · a year ago

It needs a bit more reasoning as it does find the answer but doesn't notice it found it.

The answer is: A trick question.

mopierotti · a year ago

Yeah. In the example I shared, my charitable interpretation would be that it's identifying the trick question as "a setup" where the punch line is the confusion the audience experiences. And in a meta sense, that would also describe the form of the entire chat.

mopierotti commented on Ask HN: Share your AI prompt that stumps every model · Posted by u/owendarko

mopierotti · a year ago

The recursive one that I have actually been really liking recently, and I think is a real enough challenge is: "Answer the question 'What do you get when you cross a joke with a rhetorical question?'".

I append my own version of a chain-of-thought prompt, and I've gotten some responses that are quite satisfying and frankly enjoyable to read.

mopierotti · a year ago

Here is an example of one such response in image form: https://imgur.com/a/Kgy1koi

mopierotti commented on Ask HN: Share your AI prompt that stumps every model · Posted by u/owendarko

TZubiri · a year ago

Recursive challenges are probably those where the difficulty is not really a representative of real challenges.

Could you answer a question of the type " what would you answer if I asked you this question?"

What I'm going after is that you might find questions that are impossible to resolve.

That said if the only unanswerables you can find are recursive, that's a signal the AI is smarter than you?

mopierotti · a year ago

The recursive one that I have actually been really liking recently, and I think is a real enough challenge is: "Answer the question 'What do you get when you cross a joke with a rhetorical question?'".

I append my own version of a chain-of-thought prompt, and I've gotten some responses that are quite satisfying and frankly enjoyable to read.

mopierotti commented on Diátaxis – A systematic approach to technical documentation authoring diataxis.fr/... · Posted by u/OuterVale

_acco · a year ago

We just applied this framework to the Sequin [1] docs two weeks ago. It has felt so nice to have a framework. I think our docs flow really well now, and it's been easier for us to add and maintain docs because we know where to put things.

The slightly ironic part is that the Diataxis docs themselves are a bit obtuse. It's a little verbose. So it took a couple passes for it all to click.

The analogy I gave my team that was helpful for everyone's understanding:

Imagine you're shopping for a piece of cooking equipment, like a pressure cooker.

The first thing you're going to look at is the "quickstart" (tutorial) – how does this thing work generally? You just want to see it go from A to B.

Then, you're going to wonder how to use it to cook a particular dish you like. That's a how-to.

If you're really curious about anything you've seen so far, then you'll flip to the reference to read more about it. For example, you might check the exact minutes needed for different types of beans.

And finally, when you're really invested in pressure cooking and want to understand the science behind it - why pressure affects cooking times, how the safety mechanisms work, etc. - that's when you'll read the explanatory content.

Comically, our docs were completely backwards: we lead with explanation ("How Sequin Works"). I think that's the natural impulse of an engineer: let me tell you how this thing works and why we built it this way so you can develop a mental model. Then you'll surely get it, right?

While that may be technically accurate, a person doesn't have the time or patience for that. You need to ramp them into your world. The quickstart -> how-to -> reference flow is a great way to do that. Then if you really have their attention, you can galvanize them about your approach with explanatory material.

[1] https://sequinstream.com/docs

PS: If you have any feedback on our docs, lmk :)

mopierotti · a year ago

I checked out your docs, and I agree that they flow very nicely! So often it seems that docs are either frustratingly vague to the point where it almost seems like the company is embarrassed to admit that their tool is a tool and not a "transformative synergy experience" (or similar nonsense), or docs immediately get overly specific without covering "why this product exists".

Minor note: The only thing in your docs that made me pause was the repeated use of "CDC", which I had to google the definition of. (For context, I have implemented "CDC" several times in my career, but wasn't familiar with the acronym)