nowittyusername (u/nowittyusername)

nowittyusername commented on DeepConf: Scaling LLM reasoning with confidence, not just compute arxiviq.substack.com/p/de... · Posted by u/che_shr_cat

nowittyusername · 2 days ago

Correct me if I am wrong, but by the looks of things on that chart the reduction in token use and the better score are all related to the fact that this method used 512 samples.... This doesn't seem to be of any use for local running agents or anything that has severe vram restrictions such as local models that people can run at home. So this would only benefit enterprise level systems no?

nowittyusername commented on Why LLMs can't really build software zed.dev/blog/why-llms-can... · Posted by u/srid

nowittyusername · 12 days ago

Saying LLMS are not good at x or y, is akin to saying a brain is useless without a body. Which is obvious. The success of agentic coding solutions depends on not just the model but also the system that the developers built around the model. And the companies that will succeed in this area are going to be the companies that focus on building sophisticated and capable systems that utilize said models. We are still in very early days where most organizations are only coming to terms with this realization... Only a few of them fully utilize this concept to the fullest, Claude code being the best example. The Claude models are specifically trained for tool calling and other capabilities and the Claude code cli compliments and takes advantage of those capabilities to the fullest, things like context management among other capabilities are extremely important ...

nowittyusername commented on AI is propping up the US economy bloodinthemachine.com/p/t... · Posted by u/mempko

nowittyusername · 21 days ago

Correction ... Nvidia is propping up the economy. its like 24% of the tech sector and is the only source of gpus for most companies. This is really , really bad. Talk about all eggs in one basket. If that company was to take a shit, the domino effect would cripple the whole sector and have unimaginable ramifications to the US economy.

nowittyusername commented on The Math Is Haunted overreacted.io/the-math-i... · Posted by u/danabramov

nowittyusername · a month ago

I wonder suppose you are not relying on any tricks or funky shenanigans. is it possible to just throw random stuff at Lean and find interesting observations based if it approves? like use an automated system or an llm that tries all types of wild proofs/theories? and sees if it works? maybe im asking wrong questions though or not stating it well.. this is above my understanding, i could barely get my head around prolog.

nowittyusername commented on Hierarchical Reasoning Model arxiv.org/abs/2506.21734... · Posted by u/hansmayer

nowittyusername · a month ago

I've been keeping an eye on this one as well. based on what the paper claims this would be huge. But i think like many here, we are waiting for either confirmation or denial of the claim via 3d parties. the concept behind them sounds legit, but id like to see it in practice.

nowittyusername commented on Ask HN: What are you working on? (July 2025) · Posted by u/david927

nowittyusername · a month ago

Working on a complex AI system that will eventually allow AI overseer subagents to create complex workflows internally on the fly for multi step reasoning capabilities. Its a vey complex system but easiest way I can describe it as a metacognitive framework for self organizing workflows depending on context and dynamic adjustment capabilities depending on environmental signals. lots of cool little systems that will do all types of fun stuff like feed logprobs to various Ai subagents to give extra bias signals or have the llms understand their own confidence in answering this or that query. Anyways I could write a whole decertation on all the various goodies in it. But currently at the moment starting small and working on developing automated hyperparameter reasoning evaluation system. its important to know the most affective hyperparameters per model and no better way to converge on those numbers then an automated system. After that using dspy or my own home brwe system to do same on converging for "best" system prompts for various tasks. And then setting up the various mcp servers that give these abilities to whatever llm uses them. Lots of work, but learning a lot in the process plus I love RND. I see potential in modern day systems for recursive self improvement just have to set up the system around the capability. thats the hard part, the vision is always easy...

nowittyusername commented on So you think you've awoken ChatGPT lesswrong.com/posts/2pkNC... · Posted by u/firloop

qsort · a month ago

It's still tabula rasa -- you're just initializing the context slightly differently every time. The problem is the constant anthropomorphization of these models, the insistence they're "minds" even though they aren't minds nor particularly mind-like, the suggestion that their failure modes are similar to those of humans even though they're wildly different.

nowittyusername · a month ago

The main problem is ignorance of the technology. 99.99% of people out there simply have no clue as to how this tech works, but once someone sits down with them and shows them in an easy to digest manner, the magic goes away. I did just that with one of my friends girlfriend. she was really enamored with chatGPT, talking to it as a friend, really believing this thing was conscious all that jazz.... I streamed her my Local LLM setup, and showed her what goes on under the hood, how the model responds to context, what happens when you change system prompt, the importance of said context. Within about 7 minutes all the magic was gone as she fully understood what these systems really are.

nowittyusername commented on The Metamorphosis of Prime Intellect (1994) localroger.com/prime-inte... · Posted by u/lawrenceyan

nowittyusername · 3 months ago

I loved this thing when I read it. Still do. Very interesting take on the "pain Olympics" especially. Overall just the setting tone and characters seemed creative at the time, serial killer friend and all that jazz...

nowittyusername commented on Positional preferences, order effects, prompt sensitivity undermine AI judgments cip.org/blog/llm-judges-a... · Posted by u/joalstein

leonidasv · 3 months ago

I somewhat agree, but I think that the language example is not a good one. As Anthropic have demonstrated[0], LLMs do have "conceptual neurons" that generalise an abstract concept which can later be translated to other languages.

The issue is that those concepts are encoded in intermediate layers during training, absorbing biases present in training data. It may produce a world model good enough to know that "green" and "verde" are different names for the same thing, but not robust enough to discard ordering bias or wording bias. Humans suffer from that too, albeit arguably less.

[0] https://transformer-circuits.pub/2025/attribution-graphs/bio...

nowittyusername · 3 months ago

I've read the paper before I made the statement. And I still made the statement because there are issues with their paper. The first problem is that the way in which anthropic trains their models and the architecture of their models is different from most of the open source models people use. they are still transformer based, but they are not structurally put together the same as most models, so you cant extrapolate their findings on their models to other models. Their training methods also use a lot more regularization of the data trying to weed out targeted biases as much as possible. meaning that the models are trained on more synthetic data which tries to normalize the data as much as possible between languages, tone, etc.. Same goes for their system prompt, their system prompt is treated differently versus open source models which append the system prompt in front of the users query internally. The attention ais applied differently among other things. Second the way that their models "internalize" the world is vastly different then what humans would thing of "building a world model" of reality. Its hard to put it in to words but basically their models do have a underlying representative structure but its not anything that would be of use in the domains humans care about, "true reasoning". Grokking the concept if you will. Honestly I highly suggest folks take a lot of what anthropic studies with a grain of salt. I feel that a lot of information they present is purposely misinterpreted by their teams for media or pr/clout or who knows what reasons. But the biggest reason is the one i stated at the beginning, most models are not of the same ilk as Anthropic models. I would suggest folks focus on reading interpretability research on open source models as those are most likely to be used by corporations for their cheap api costs. And those models have no where near the care and sophistication put in to them as anthropic models.

nowittyusername commented on Positional preferences, order effects, prompt sensitivity undermine AI judgments cip.org/blog/llm-judges-a... · Posted by u/joalstein

nowittyusername · 3 months ago

I've done experiments and basically what I found was that LLM models are extremely sensitive to .....language. Well, duh but let me explain a bit. They will give a different quality/accuracy of answer depending on the system prompt order, language use, length, how detailed the examples are, etc... basically every variable you can think of is responsible for either improving or causing detrimental behavior in the output. And it makes sense once you really grok that LLM;s "reason and think" in tokens. They have no internal world representation. Tokens are the raw layer on which they operate. For example if you ask a bilingual human what their favorite color is, the answer will be that color regardless of what language they used to answer that question. For an LLM, that answer might change depending on the language used, because its all statistical data distribution of tokens in training that conditions the response. Anyway i don't want to make a long post here. The good news out of this is that once you have found the best way in asking questions of your model, you can consistently get accurate responses, the trick is to find the best way to communicate with that particular LLM. That's why i am hard at work on making an auto calibration system that runs through a barrage of ways in finding the best system prompts and other hyperparameters for that specific LLM. The process can be fully automated, just need to set it all up.