Readit News logoReadit News
unignorant commented on AI vs. Professional Authors Results   mark---lawrence.blogspot.... · Posted by u/biffles
breuleux · 4 months ago
The only one I was fairly sure was human was #6, and that was the only one I kinda enjoyed. In any case, as someone who reads a good deal, I agree. I didn't think any of the stories was particularly great (not enough to bother ranking them, beyond favourite) so I don't care all that much about the result.

> AI can do much better than what is posted above for flash if given more sophisticated prompting.

How sophisticated, compared to just writing the thing yourself?

unignorant · 4 months ago
In another reply I gave an example of something you can do: https://news.ycombinator.com/item?id=44937774

I enjoy writing so a system like this would never replace that for me. But for someone who doesn't enjoy writing (or maybe can't generate work that meets their bar in the Ira Glass sense of taste) I think this kind of setup works okay for generating flash even with today's models.

unignorant commented on AI vs. Professional Authors Results   mark---lawrence.blogspot.... · Posted by u/biffles
biffles · 4 months ago
Could you expand on your point re more sophisticated prompting?

I have found it hard to replicate high quality human-written prose and was a bit surprised by the results of this test. To me, AI fiction (and most AI writing in general) has a certain “smell” that becomes obvious after enough exposure to it. And yet I scored worse than you did on the test, so what do I know…

unignorant · 4 months ago
For flash you can get much better results by asking the system to first generate a detailed scaffold. Here's an example of some metadata you might try to generate before actually writing the story: genres the story should fit into; pov of the story; high level structure of the story; list of characters in the story along with significant details; themes and topics present in the story; detailed style notes

From there you have a second prompt to generate a story that follows those details. You can also generate many candidates and have another model instance rate the stories based on both general literary criteria and how well the fit the prompt, then you only read the best.

This has produced some work I've been reasonably impressed by, though it's not at the level of the best human flash writers.

Also, one easy way to get stuff that completely avoids the "smell" you're talking about by giving specific guidance on style and perspective (e.g., GPT-5 Thinking can do "literary stream-of-consciousness 1st person teenage perspective" reasonably well and will not sound at all like typical model writing).

unignorant commented on AI vs. Professional Authors Results   mark---lawrence.blogspot.... · Posted by u/biffles
codechicago277 · 4 months ago
I had similar results, and story 4 is so trope heavy I wonder if it’s just an amalgamation of similar stories. The human stories all felt original, where none of the AI ones did.
unignorant · 4 months ago
I'm not sure I agree that the human stories felt original. I was pretty unimpressed with all of the stories except maybe 6, and even that one dealt in some common tropes. 5 had fewer tropes than 6 (and maybe as a result of that received the highest average scores from his readers) but I could tell from the style it was AI
unignorant commented on AI vs. Professional Authors Results   mark---lawrence.blogspot.... · Posted by u/biffles
unignorant · 4 months ago
Here are my notes and guesses on the stories in case people here find it interesting. Like some others in the blog post comments I got 6/8 right:

1.) probably human, low on style but a solid twist (CORRECT) 2.) interesting imagery but some continuity issues, maybe AI (INCORRECT) 3.) more a scene than a story, highly confident is AI given style (CORRECT) 4.) style could go either way, maybe human given some successful characterization (INCORRECT) 5.) I like the style but it's probably AI, the metaphors are too dense and very minor continuity errors (CORRECT) 6.) some genuinely funny stuff and good world building, almost certainly human (CORRECT) 7.) probably AI prompted to go for humor, some minor continuity issues (CORRECT) 8.) nicely subverted expectations, probably human (CORRECT)

My personal ranking for scores (again blind to author) was:

6 (human); 8 (human); 4 (AI); 1 (human) and 5 (AI) -- tied; 2 (human); 3 and 7 (AI) -- tied

So for me the two best stories were human and the two worst were AI. That said, I read a lot of flash fiction, and none of these stories really approached good flash imo. I've also done some of my own experiments, and AI can do much better than what is posted above for flash if given more sophisticated prompting.

Deleted Comment

unignorant commented on The cultural decline of literary fiction   oyyy.substack.com/p/the-c... · Posted by u/libraryofbabel
unignorant · 6 months ago
I really enjoyed this article but the claim of no literary fiction making the Publishers Weekly yearly top 10 lists since 2001 isn't really true:

https://en.wikipedia.org/wiki/Publishers_Weekly_list_of_best...

https://en.wikipedia.org/wiki/Publishers_Weekly_list_of_best...

It is true that there isn't that much literary stuff that breaks through, and the stuff that does is usually somewhat crossover (e.g., All the Light We Cannot See in 2015 or Song of Achilles in 2021) but it exists. These two books are shelved under literary codes (though also historical). Song of Achilles in particular is beautifully written and a personal favorite of mine, at least among books published in recent years.

Then there are other works like Little Fires Everywhere and The Midnight Library that I might not consider super literary but nonetheless are also often considered so by book shops or libraries (e.g., https://lightsailed.com/catalog/book/the-midnight-library-a-... the lit fic code is FIC019000).

I was really surprised that Ferrante's Neapolitan series, the best example (I would have thought) of recent work with both high literary acclaim and popular appeal, did not actually make the top 10 list for any year.

unignorant commented on Surprisingly fast AI-generated kernels we didn't mean to publish yet   crfm.stanford.edu/2025/05... · Posted by u/mfiguiere
ekelsen · 7 months ago
"the reference code is in the default FP32, and given a tolerance threshold (1e-02)"

that's a huge tolerance and allows them to use fp16 operations to replace the "fp32" kernel.

unignorant · 7 months ago
yeah, it seems likely the underlying task here (one reasoning step away) was: replace as many fp32 operations as possible in this kernel with fp16. i'm not sure exactly how challenging a port like that is, but intuitively seems a bit less impressive

maybe this intuition is wrong but would be great for the work to address it explicitly if so!

unignorant commented on I think it's time to give Nix a chance   maych.in/blog/its-time-to... · Posted by u/pacmansyyu
krapht · 7 months ago
I tried to give Nix a chance but I never figured out how to package dependencies that didn't already have a derivation written for them.

Particularly since I do a lot of ML work - I never figured out how to handle mixed Python/C++ code with dependencies on CUDA.

It's just way easier, even if not reproducible, to build an environment imperatively in Docker.

unignorant · 7 months ago
I do a lot of ML work too and recently gave NixOS a try. It's actually not too hard to just use conda/miniconda/micromamba to manage python environments as you would on any other linux system with just a few lines of configuration. Pretty much just add micromamba to your configuration.nix plus a few lines of config for nix-ld. Many other python/ML projects are setup to use docker, and that's another easy option.

I don't have the time or desire to switch all my python/ML work to more conventional Nix, and haven't really had any issues so far.

unignorant commented on AlphaEvolve: A Gemini-powered coding agent for designing advanced algorithms   deepmind.google/discover/... · Posted by u/Fysi
xianshou · 7 months ago
Calling it now - RL finally "just works" for any domain where answers are easily verifiable. Verifiability was always a prerequisite, but the difference from prior generations (not just AlphaGo, but any nontrivial RL process prior to roughly mid-2024) is that the reasoning traces and/or intermediate steps can be open-ended with potentially infinite branching, no clear notion of "steps" or nodes and edges in the game tree, and a wide range of equally valid solutions. As long as the quality of the end result can be evaluated cleanly, LLM-based RL is good to go.

As a corollary, once you add in self-play with random variation, the synthetic data problem is solved for coding, math, and some classes of scientific reasoning. No more modal collapse, no more massive teams of PhDs needed for human labeling, as long as you have a reliable metric for answer quality.

This isn't just neat, it's important - as we run out of useful human-generated data, RL scaling is the best candidate to take over where pretraining left off.

unignorant · 7 months ago
This technique doesn't actually use RL at all! There’s no policy-gradient training, value function, or self-play RL loop like in AlphaZero/AlphaTensor/AlphaDev.

As far as I can read, the weights of the LLM are not modified. They do some kind of candidate selection via evolutionary algorithms for the LLM prompt, which the LLM then remixes. This process then iterates like a typical evolutionary algorithm.

u/unignorant

KarmaCake day4057June 4, 2009View Original