nestorD (u/nestorD) - Readit News

nestorD commented on Curating a Show on My Ineffable Mother, Ursula K. Le Guin hyperallergic.com/curatin... · Posted by u/bryanrasmussen

leejoramo · 3 days ago

I remember Le Guin speaking at my university around 1990. She was amazingly open about her writing process. While she did not directly answer questions about the “meaning” of her writing, she did facilitate the discussion about her work’s meaning, and asked the audience challenging questions.

Of all my time at uni, I wish I had a recording of this event.

I understood from students who had attended a writing workshop with her earlier in the day, that she was gifted teacher.

nestorD · 2 days ago

Her book Steering the Craft, is very much her writing workshop distilled into book form.

nestorD commented on Task-free intelligence testing of LLMs marble.onl/posts/tapping/... · Posted by u/amarble

vintermann · a month ago

Interesting, but couldn't a model "cheat" in this task by being very good at telling model outputs apart? How far do you get with a classifier simply trained to distinguish models by their output?

It seems to me many models - maybe by design - have a recognizable style which would be much easier to detect than evaluating the factual quality of answers.

nestorD · a month ago

In theory, yes! If this metric ever becomes a widely used standard, one would have to start accounting for that...

But, in practice, when asking a model to pick the best answer they see a single question / answers pair and focus on determining what they think is best.

nestorD commented on Task-free intelligence testing of LLMs marble.onl/posts/tapping/... · Posted by u/amarble

esafak · a month ago

Doesn't that presume that one model dominates the other?

nestorD · a month ago

It presumes some models are better than others (and we do find that providing data with a wide mix of model strengths improves convergence) but it does not need to be one model, and it does not even need to be transitive.

nestorD commented on Task-free intelligence testing of LLMs marble.onl/posts/tapping/... · Posted by u/amarble

nestorD · a month ago

On alternative ways to measure LLM intelligence, we had good success with this: https://arxiv.org/abs/2509.23510

In short: start with a dataset of question and answer pairs, where each question has been answered by two different LLMs. Ask the model you want to evaluate to choose the better answer for each pair. Then measure how consistently it selects winners. Does it reliably favor some models over the questions, or does it behave close to randomly? This consistency is a strong proxy for the model’s intelligence.

It is not subject to dataset leaks, lets you measure intelligence in many fields where you might not have golden answers, and converges pretty fast making it really cheap to measure.

nestorD commented on Show HN: I Ching simulator with accurate Yarrow Stalk probabilities castiching.com/... · Posted by u/jackzhuo

z2 · 2 months ago

Naive question: could this have been survivorship bias? Could certain ones not have been written down or kept with the others?

nestorD · 2 months ago

I doubt it. The I Ching does not really have bad / low interest hexagrams. Also historians who studied the topic seem pretty sure that the yarrow stalk method is a recent introduction (by I Ching standards, we are talking about a bronze age divination tool...).

nestorD commented on Show HN: I Ching simulator with accurate Yarrow Stalk probabilities castiching.com/... · Posted by u/jackzhuo

nestorD · 2 months ago

Fun fact: archaeological evidence on I Ching divinatory records shows an hexagram distribution different from the one produced by the yarrow stalk method. Meaning that, while it is now considered the traditional method, it was likely not the original approach.

nestorD commented on What if you don't need MCP at all? mariozechner.at/posts/202... · Posted by u/jdkee

nestorD · 3 months ago

So far I have seen two genuinely good arguments for the use of MCPs:

* They can encapsulate (API) credentials, keeping those out of reach of the model,

* Contrary to APIs, they can change their interface whenever they want and with little consequences.

nestorD commented on A new Google model is nearly perfect on automated handwriting recognition generativehistory.substac... · Posted by u/scrlk

throwup238 · 3 months ago

> The fun part has been build tools to turn Claude code and Codex CLI into capable research assistant for that type of projects.

What does that look like? How well does it work?

I ended up writing a research TUI with my own higher level orchestration (basically have the thing keep working in a loop until a budget has been reached) and document extraction.

nestorD · 3 months ago

I started with a UI that sounded like it was built along the same lines as yours, which had the advantage of letting me enforce a pipeline and exhaustivity of search (I don't want the 10 most promising documents, I want all of them).

But I realized I was not using it much because it was that big and inflexible (plus I keep wanting to stamp out all the bugs, which I do not have the time to do on a hobby project). So I ended up extracting it into MCPs (equipped to do full-text search and download OCR from the various databases I care about) and AGENTS.md files (defining pipelines, as well as patterns for both searching behavior and reporting of results). I also put together a sub-agent for translation (cutting away all tools besides reading and writing files, and giving it some document-specific contextual information).

That lets me use Claude Code and Codex CLI (which, anecdotally, I have found to be the better of the two for that kind of work; it seems to deal better with longer inputs produced by searches) as the driver, telling them what I am researching and maybe how I would structure the search, then letting them run in the background before checking their report and steering the search based on that.

It is not perfect (if a search surfaces 300 promising documents, it will not check all of them, and it often misunderstands things due to lacking further context), but I now find myself reaching for it regularly, and I polish out problems one at a time. The next goal is to add more data sources and to maybe unify things further.

nestorD commented on A new Google model is nearly perfect on automated handwriting recognition generativehistory.substac... · Posted by u/scrlk

throwup238 · 3 months ago

I really hope they have because I’ve also been experimenting with LLMs to automate searching through old archival handwritten documents. I’m interested in the Conquistadors and their extensive accounts of their expeditions, but holy cow reading 16th century handwritten Spanish and translating it at the same time is a nightmare, requiring a ton of expertise and inside field knowledge. It doesn’t help that they were often written in the field by semi-literate people who misused lots of words. Even the simplest accounts require quite a lot of detective work to decipher with subtle signals like that pound sign for the sugar loaf.

> Whatever it is, users have reported some truly wild things: it codes fully functioning Windows and Apple OS clones, 3D design software, Nintendo emulators, and productivity suites from single prompts.

This I’m a lot more skeptical of. The linked twitter post just looks like something it would replicate via HTML/CSS/JS. Whats the kernel look like?

nestorD · 3 months ago

Oh! That's a nice use-case and not too far from stuff I have been playing with! (happily I do not have to deal with handwriting, just bad scans of older newspapers and texts)

I can vouch for the fact that LLMs are great at searching in the original language, summarizing key points to let you know whether a document might be of interest, then providing you with a translation where you need one.

The fun part has been build tools to turn Claude code and Codex CLI into capable research assistant for that type of projects.

nestorD commented on A non-diagonal SSM RNN computed in parallel without requiring stabilization github.com/glassroom/goom... · Posted by u/fheinsen

nestorD · 4 months ago

The paper[0] is actually about their logarithmic number system. Deep learning is given as an example, and their reference implementation is in PyTorch, but it is far from the only application.

Anything involving a large number of multiplications that produce extremely small or extremely large numbers could make use of their number representation.

It builds on existing complex number implementations, making it fairly easy to implement in software and relatively efficient. They provide implementations of a number of common operations, including dot product (building on PyTorch's preexisting, numerically stabilized by experts, log-sum-of-exponentials) and matrix multiplication.

The main downside is that this is a very specialized number system: if you care about things other than chains of multiplications (say... addition?), then you should probably use classical floating-point numbers.

[0]: https://arxiv.org/abs/2510.03426