irthomasthomas (u/irthomasthomas)

irthomasthomas commented on ThinkMesh: A Python lib for parallel thinking in LLMs github.com/martianlantern... · Posted by u/martianlantern

irthomasthomas · 13 hours ago

Very cool I'm going to try and play with this later. It looks like llm-consortium [0] but with some nice new features, like confidence gating and pluggable verifiers. So, if a response confidence is below a threshold it is eliminated entirely? Is that the gating?

[0] https://x.com/karpathy/status/1870692546969735361

irthomasthomas commented on AGI is an engineering problem, not a model training problem vincirufus.com/posts/agi-... · Posted by u/vincirufus

black_knight · 16 hours ago

Off topic, but I remember as a child I would play around with that kind of recursive thinking. I would think about something, then think about that I thought about it, then think about that I though about thinking about it. Then, after a few such repetitions I would recognise that this could go on forever. Then I would think about the fact that I recognise that this could go on forever, then think about that… then realise that this meta pattern could go on forever. Etc…

Later I connected this game with the ordinals. 0,1,2… ω, ω+1, ω+2,…,2ω,2ω+1,2ω+2,…,3ω,…,4ω,…,4ω,…, ω*ω,…

irthomasthomas · 14 hours ago

Tesla used to experience visual hallucinations. Any time an object was mentioned in conversation it would appear before him as if it where real. He started getting these hallucinations randomly and began to obsess about their origins. Over time he was able to trace the source of every hallucination to something which he had heard or seen earlier. He then noticed that the same was true for all of his thoughts, every one could be traced to some external stimuli. From this he concluded that he was an automata controlled by remote input. This inspired him to invent the first remote control vehicle.

irthomasthomas commented on The theory and practice of selling the Aga cooker (1935) [pdf] comeadwithus.wordpress.co... · Posted by u/phpnode

shrubble · 2 days ago

Also the author of a well known book, “Confessions of An Advertising Man”.

irthomasthomas · a day ago

And 'Ogilvy on Advertising'. Both are brilliant and should be required reading, especially for those interested in business or practical psychology.

irthomasthomas commented on DeepSeek-v3.1 api-docs.deepseek.com/new... · Posted by u/wertyk

merelysounds · 2 days ago

Would the model owners be able to identify the benchmarking session among many other similar requests?

irthomasthomas · 2 days ago

Depends. Something like arc-agi might be easy as it follows a defined format. I would also guess that the usage pattern for someone running a benchmark will be quite distinct from that of a normal user, unless they take specific measures to try to blend in.

irthomasthomas commented on DeepSeek-v3.1 api-docs.deepseek.com/new... · Posted by u/wertyk

seunosewa · 3 days ago

Sometimes it will randomly generate something like this in the body of the text: ``` <tool_call>executeshell <arg_key>command</arg_key> <arg_value>echo "" >> novels/AI_Voodoo_Romance/chapter-1-a-new-dawn.txt</arg_value> </tool_call> ```

or this: ``` <｜toolcallsbegin｜><｜toolcallbegin｜>executeshell<｜toolsep｜>{"command": "pwd && ls -la"}<｜toolcallend｜><｜toolcallsend｜> ```

Prompting it to use the right format doesn't seem to work. Claude, Gemini, GPT5, and GLM 4.5, don't do that. To accomodate DeepSeek, the tiny agent that I'm building will have to support all the weird formats.

irthomasthomas · 2 days ago

Can't you use logit bias to help with this? Might depend how they are tokenized.

irthomasthomas commented on DeepSeek-v3.1 api-docs.deepseek.com/new... · Posted by u/wertyk

dmos62 · 3 days ago

How do you propose that would work? A pipeline that goes through query-response pairs to deduce response quality and then uses the low-quality responses for further training? Wouldn't you need a model that's already smart enough to tell that previous model's responses weren't smart enough? Sounds like a chicken and egg problem.

irthomasthomas · 2 days ago

It just means that once you send your test questions to a model API, that company now has your test. So 'private' benchmarks take it on faith that the companies won't look at those requests and tune their models or prompts to beat them.

irthomasthomas commented on AGENTS.md – Open format for guiding coding agents agents.md/... · Posted by u/ghuntley

CharlesW · 5 days ago

This should've been an .agents¹ with an index.md.

For tiny, throwaway projects, a monolithic .md file is fine. A folder allows more complex projects to use "just enough hierarchy" to provide structure, with index.md as the entry point. Along with top-level universal guidance, it can include an organization guide (easily maintained with the help of LLMs).

  index.md
  ├── auth.md
  ├── performance.md
  ├── code_quality
  ├── data_layer
  ├── testing
  └── etc

In my experience, this works loads better than the "one giant file" method. It lets LLMs/agents add relevant context without wasting tokens on unrelated context, reduces noise/improves response accuracy, and is easier to maintain for both humans and LLMs alike.

¹ Ideally with a better name than ".agents", like ".codebots" or ".context".

irthomasthomas · 5 days ago

This is what I do. Everywhere my agent works it uses a .agent dir to store its logs and intermediary files. This way the main directories aren't polluted with cruft all the time.

irthomasthomas commented on GPT-5 API injects hidden instructions twitter.com/xundecidabili... · Posted by u/irthomasthomas

NitpickLawyer · 9 days ago

Yeah, it's in the "gpt5 system card" as they call it now [1]. Page 9 has the details about system > dev > user.

1 - https://cdn.openai.com/pdf/8124a3ce-ab78-4f06-96eb-49ea29ffb...

irthomasthomas · 9 days ago

  3.5 Instruction Hierarchy
  The deployment of these models in the API allows developers to specify a custom developer message that is included with every prompt from one of their end users. This could potentially allow developers to circumvent system message guardrails if not handled properly. Similarly, end users may try to circumvent system or developer message guidelines.
 
  Mitigations
  To mitigate this issue, we teach models to adhere to an Instruction Hierarchy[2]. At a high level, we have three classifications of messages sent to the models: system messages, developer messages, and user messages. We test that models follow the instructions in the system message over developer messages, and instructions in developer messages over user messages.

Is this what you meant? I can see that this is part of the mechanism, I can't see where it states that openai will inject their own instructions.

irthomasthomas commented on GPT-5 API injects hidden instructions twitter.com/xundecidabili... · Posted by u/irthomasthomas

NitpickLawyer · 9 days ago

> openai giving it instructions before me?

Uhhh, yes. It's in the devblogs. They call it prompt adherence hierarchy or something, where system instructions (oAI) > dev instructions (you) > user requests. They've been training this way specifically, and test for it in their "safety" analysis. Same for their -oss versions, so tinkerers might look there for a tinker friendly environment where they could probably observe the same kinds of behaviour.

irthomasthomas · 9 days ago

Please can you link me to the documentation on this.