Readit News logoReadit News
kinduff commented on Training language models to be warm and empathetic makes them less reliable   arxiv.org/abs/2507.21919... · Posted by u/Cynddl
kinduff · 17 days ago
We want an oracle, not a therapist or an assistant.
kinduff commented on Ollama's new app   ollama.com/blog/new-app... · Posted by u/BUFU
mchiang · a month ago
We work closely with majority of research labs / model creates directly. Most of the times we will support models on release day. There are sometimes where the release window for major models are fairly close - and we just have to elect to support models where we believe will better support a majority of users.

Nothing out of spite, and purely limited by the amount of effort required to support these models.

We are hopeful too -- where users can technically add models to Ollama directly. Although there is definitely some learning curve.

kinduff · a month ago
Would love to add models direclty. And don't worry, we will figure it out!
kinduff commented on Programming vehicles in games   wassimulator.com/blog/pro... · Posted by u/Bogdanp
rightbyte · a month ago
Have you played Warthog Launch? It makes satire of the game physics.
kinduff · a month ago
Oh my, its been a minute since I played this game. I remember playing it as a teen.
kinduff commented on Building MCP servers for ChatGPT and API integrations   platform.openai.com/docs/... · Posted by u/kevinslin
kinduff · a month ago
I like how they plugged the MCP support into their existing tools. Pretty smart!

Deleted Comment

kinduff commented on Baba Is Eval   fi-le.net/baba/... · Posted by u/fi-le
kadoban · 2 months ago
A human can easily struggle at solving a poorly communicated puzzle, especially if paper/pencil or something isn't available to convert to a better format. LLMs can look back at what they wrote, but it seems kind of like a poor format for working out a better representation to me.
kinduff · 2 months ago
I found some papers [n] about this. And I think the answer is yes, the format matters asnd hence the representation.

I wonder if the author would be willing to try with another representation.

[1]: Does Prompt Formatting Have Any Impact on LLM Performance? https://arxiv.org/html/2411.10541v1

[2]: Large Language Models(LLMs) on Tabular Data: Prediction, Generation, and Understanding - A Survey https://arxiv.org/html/2402.17944v2

kinduff commented on Baba Is Eval   fi-le.net/baba/... · Posted by u/fi-le
k2xl · 2 months ago
Baba is You is a great game part of a collection of 2D grid puzzle games.

(Shameless plug: I am one of the developers of Thinky.gg (https://thinky.gg), which is a thinky puzzle game site for a 'shortest path style' [Pathology] and a Sokoban variant [Sokoath] )

These games are typically NP Hard so the typical techniques that solvers have employed for Sokoban (or Pathology) have been brute forced with varying heuristics (like BFS, dead-lock detection, and Zobrist hashing). However, once levels get beyond a certain size with enough movable blocks you end up exhausting memory pretty quickly.

These types of games are still "AI Proof" so far in that LLMs are absolutely awful at solving these while humans are very good (so seems reasonable to consider for for ARC-AGI benchmarks). Whenever a new reasoning model gets released I typically try it on some basic Pathology levels (like 'One at a Time' https://pathology.thinky.gg/level/ybbun/one-at-a-time) and they fail miserably.

Simple level code for the above level (1 is a wall, 2 is a movable block, 4 is starting block, 3 is the exit):

000

020

023

041

Similar to OP, I've found Claude couldn’t manage rule dynamics, blocked paths, or game objectives well and spits out random results.

kinduff · 2 months ago
In Factorio's paper [1] page 3, the agent receives a semantic representation with coordinates. Have you tried this data format?

[1]: https://arxiv.org/pdf/2503.09617

kinduff commented on Baba Is Eval   fi-le.net/baba/... · Posted by u/fi-le
kinduff · 2 months ago
Do you think the performance can be improved if the representation of the level is different?

I've seen AI struggle with ASCII, but when presented as other data structures, it performs better.

edit:

e.g. JSON with structured coordinates, graph based JSON, or a semantic representation with the coordinates

u/kinduff

KarmaCake day1670August 19, 2014
About
web dev, home @ https://kinduff.com
View Original