kinduff (u/kinduff) - Readit News

kinduff commented on Training language models to be warm and empathetic makes them less reliable arxiv.org/abs/2507.21919... · Posted by u/Cynddl

kinduff · 17 days ago

We want an oracle, not a therapist or an assistant.

kinduff commented on Ollama's new app ollama.com/blog/new-app... · Posted by u/BUFU

mchiang · a month ago

We work closely with majority of research labs / model creates directly. Most of the times we will support models on release day. There are sometimes where the release window for major models are fairly close - and we just have to elect to support models where we believe will better support a majority of users.

Nothing out of spite, and purely limited by the amount of effort required to support these models.

We are hopeful too -- where users can technically add models to Ollama directly. Although there is definitely some learning curve.

kinduff · a month ago

Would love to add models direclty. And don't worry, we will figure it out!

kinduff commented on Programming vehicles in games wassimulator.com/blog/pro... · Posted by u/Bogdanp

rightbyte · a month ago

Have you played Warthog Launch? It makes satire of the game physics.

kinduff · a month ago

Oh my, its been a minute since I played this game. I remember playing it as a teen.

kinduff commented on Building MCP servers for ChatGPT and API integrations platform.openai.com/docs/... · Posted by u/kevinslin

kinduff · a month ago

I like how they plugged the MCP support into their existing tools. Pretty smart!

kinduff commented on Baba Is Eval fi-le.net/baba/... · Posted by u/fi-le

kadoban · 2 months ago

A human can easily struggle at solving a poorly communicated puzzle, especially if paper/pencil or something isn't available to convert to a better format. LLMs can look back at what they wrote, but it seems kind of like a poor format for working out a better representation to me.

kinduff · 2 months ago

I found some papers [n] about this. And I think the answer is yes, the format matters asnd hence the representation.

I wonder if the author would be willing to try with another representation.

[1]: Does Prompt Formatting Have Any Impact on LLM Performance? https://arxiv.org/html/2411.10541v1

[2]: Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding - A Survey https://arxiv.org/html/2402.17944v2

kinduff commented on Baba Is Eval fi-le.net/baba/... · Posted by u/fi-le

k2xl · 2 months ago

Baba is You is a great game part of a collection of 2D grid puzzle games.

(Shameless plug: I am one of the developers of Thinky.gg (https://thinky.gg), which is a thinky puzzle game site for a 'shortest path style' [Pathology] and a Sokoban variant [Sokoath] )

These games are typically NP Hard so the typical techniques that solvers have employed for Sokoban (or Pathology) have been brute forced with varying heuristics (like BFS, dead-lock detection, and Zobrist hashing). However, once levels get beyond a certain size with enough movable blocks you end up exhausting memory pretty quickly.

These types of games are still "AI Proof" so far in that LLMs are absolutely awful at solving these while humans are very good (so seems reasonable to consider for for ARC-AGI benchmarks). Whenever a new reasoning model gets released I typically try it on some basic Pathology levels (like 'One at a Time' https://pathology.thinky.gg/level/ybbun/one-at-a-time) and they fail miserably.

Simple level code for the above level (1 is a wall, 2 is a movable block, 4 is starting block, 3 is the exit):

000

020

023

041

Similar to OP, I've found Claude couldn’t manage rule dynamics, blocked paths, or game objectives well and spits out random results.

kinduff · 2 months ago

In Factorio's paper [1] page 3, the agent receives a semantic representation with coordinates. Have you tried this data format?

[1]: https://arxiv.org/pdf/2503.09617

kinduff commented on Baba Is Eval fi-le.net/baba/... · Posted by u/fi-le

kinduff · 2 months ago

Do you think the performance can be improved if the representation of the level is different?

I've seen AI struggle with ASCII, but when presented as other data structures, it performs better.

edit:

e.g. JSON with structured coordinates, graph based JSON, or a semantic representation with the coordinates