Readit News logoReadit News
irthomasthomas commented on ThinkMesh: A Python lib for parallel thinking in LLMs   github.com/martianlantern... · Posted by u/martianlantern
irthomasthomas · 13 hours ago
Very cool I'm going to try and play with this later. It looks like llm-consortium [0] but with some nice new features, like confidence gating and pluggable verifiers. So, if a response confidence is below a threshold it is eliminated entirely? Is that the gating?

[0] https://x.com/karpathy/status/1870692546969735361

irthomasthomas commented on AGI is an engineering problem, not a model training problem   vincirufus.com/posts/agi-... · Posted by u/vincirufus
black_knight · 16 hours ago
Off topic, but I remember as a child I would play around with that kind of recursive thinking. I would think about something, then think about that I thought about it, then think about that I though about thinking about it. Then, after a few such repetitions I would recognise that this could go on forever. Then I would think about the fact that I recognise that this could go on forever, then think about that… then realise that this meta pattern could go on forever. Etc…

Later I connected this game with the ordinals. 0,1,2… ω, ω+1, ω+2,…,2ω,2ω+1,2ω+2,…,3ω,…,4ω,…,4ω,…, ω*ω,…

irthomasthomas · 14 hours ago
Tesla used to experience visual hallucinations. Any time an object was mentioned in conversation it would appear before him as if it where real. He started getting these hallucinations randomly and began to obsess about their origins. Over time he was able to trace the source of every hallucination to something which he had heard or seen earlier. He then noticed that the same was true for all of his thoughts, every one could be traced to some external stimuli. From this he concluded that he was an automata controlled by remote input. This inspired him to invent the first remote control vehicle.
irthomasthomas commented on The theory and practice of selling the Aga cooker (1935) [pdf]   comeadwithus.wordpress.co... · Posted by u/phpnode
shrubble · 2 days ago
Also the author of a well known book, “Confessions of An Advertising Man”.
irthomasthomas · a day ago
And 'Ogilvy on Advertising'. Both are brilliant and should be required reading, especially for those interested in business or practical psychology.
irthomasthomas commented on DeepSeek-v3.1   api-docs.deepseek.com/new... · Posted by u/wertyk
merelysounds · 2 days ago
Would the model owners be able to identify the benchmarking session among many other similar requests?
irthomasthomas · 2 days ago
Depends. Something like arc-agi might be easy as it follows a defined format. I would also guess that the usage pattern for someone running a benchmark will be quite distinct from that of a normal user, unless they take specific measures to try to blend in.
irthomasthomas commented on DeepSeek-v3.1   api-docs.deepseek.com/new... · Posted by u/wertyk
seunosewa · 3 days ago
Sometimes it will randomly generate something like this in the body of the text: ``` <tool_call>executeshell <arg_key>command</arg_key> <arg_value>echo "" >> novels/AI_Voodoo_Romance/chapter-1-a-new-dawn.txt</arg_value> </tool_call> ```

or this: ``` <|toolcallsbegin|><|toolcallbegin|>executeshell<|toolsep|>{"command": "pwd && ls -la"}<|toolcallend|><|toolcallsend|> ```

Prompting it to use the right format doesn't seem to work. Claude, Gemini, GPT5, and GLM 4.5, don't do that. To accomodate DeepSeek, the tiny agent that I'm building will have to support all the weird formats.

irthomasthomas · 2 days ago
Can't you use logit bias to help with this? Might depend how they are tokenized.
irthomasthomas commented on DeepSeek-v3.1   api-docs.deepseek.com/new... · Posted by u/wertyk
dmos62 · 3 days ago
How do you propose that would work? A pipeline that goes through query-response pairs to deduce response quality and then uses the low-quality responses for further training? Wouldn't you need a model that's already smart enough to tell that previous model's responses weren't smart enough? Sounds like a chicken and egg problem.
irthomasthomas · 2 days ago
It just means that once you send your test questions to a model API, that company now has your test. So 'private' benchmarks take it on faith that the companies won't look at those requests and tune their models or prompts to beat them.
irthomasthomas commented on AGENTS.md – Open format for guiding coding agents   agents.md/... · Posted by u/ghuntley
CharlesW · 5 days ago
This should've been an .agents¹ with an index.md.

For tiny, throwaway projects, a monolithic .md file is fine. A folder allows more complex projects to use "just enough hierarchy" to provide structure, with index.md as the entry point. Along with top-level universal guidance, it can include an organization guide (easily maintained with the help of LLMs).

  index.md
  ├── auth.md
  ├── performance.md
  ├── code_quality
  ├── data_layer
  ├── testing
  └── etc
In my experience, this works loads better than the "one giant file" method. It lets LLMs/agents add relevant context without wasting tokens on unrelated context, reduces noise/improves response accuracy, and is easier to maintain for both humans and LLMs alike.

¹ Ideally with a better name than ".agents", like ".codebots" or ".context".

irthomasthomas · 5 days ago
This is what I do. Everywhere my agent works it uses a .agent dir to store its logs and intermediary files. This way the main directories aren't polluted with cruft all the time.
irthomasthomas commented on GPT-5 API injects hidden instructions   twitter.com/xundecidabili... · Posted by u/irthomasthomas
NitpickLawyer · 9 days ago
Yeah, it's in the "gpt5 system card" as they call it now [1]. Page 9 has the details about system > dev > user.

1 - https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...

irthomasthomas · 9 days ago

  3.5 Instruction Hierarchy
  The deployment of these models in the API allows developers to specify a custom developer message that is included with every prompt from one of their end users. This could potentially allow developers to circumvent system message guardrails if not handled properly. Similarly, end users may try to circumvent system or developer message guidelines.
 
  Mitigations
  To mitigate this issue, we teach models to adhere to an Instruction Hierarchy[2]. At a high level, we have three classifications of messages sent to the models: system messages, developer messages, and user messages. We test that models follow the instructions in the system message over developer messages, and instructions in developer messages over user messages.
Is this what you meant? I can see that this is part of the mechanism, I can't see where it states that openai will inject their own instructions.

irthomasthomas commented on GPT-5 API injects hidden instructions   twitter.com/xundecidabili... · Posted by u/irthomasthomas
NitpickLawyer · 9 days ago
> openai giving it instructions before me?

Uhhh, yes. It's in the devblogs. They call it prompt adherence hierarchy or something, where system instructions (oAI) > dev instructions (you) > user requests. They've been training this way specifically, and test for it in their "safety" analysis. Same for their -oss versions, so tinkerers might look there for a tinker friendly environment where they could probably observe the same kinds of behaviour.

irthomasthomas · 9 days ago
Please can you link me to the documentation on this.

u/irthomasthomas

KarmaCake day2864December 29, 2019
About
undecidability.com

crispysky.com

x.com/xundecidability

github.com/irthomasthomas/llm-consortium

View Original