fpgaminer (u/fpgaminer)

fpgaminer commented on Porn censorship is going to destroy the internet mashable.com/article/age-... · Posted by u/Teever

a4isms · 5 days ago

Long before modern porn, there were laws that a man could not be charged with raping his wife. Society largely looked the other way when a man beat his wife. There was a time when underage women could be trafficked by their parents into marriage against their will. There was a time when a woman who accused a man of rape would basically end up on trial herself while his lawyer dragged out every single romantic or sexual relationship she had in graphic detail so that the jury would believe "she was asking for it."

I am under the impression that "unhealthy ideas about sex and the opposite sex" have been with us for a very, very long time. If we observe that porn addicts have such unhealthy ideas, are we confusing correlation with causation?

fpgaminer · 5 days ago

At least in the U.S. the equality of women in society (and in law) has slowly risen over the last 100 years. Over that same period the availability of pornographic images has also slowly risen (from magazines, to VHS, to the Internet, to streaming videos, to VR).

So if we're looking at correlation, doesn't the data imply that _more_ porn is associated with _more_ rights for women?

(Conversely, the vast majority of people calling for and enacting policies for more restrictions on pornography are also rolling back rights for women.)

fpgaminer commented on Training language models to be warm and empathetic makes them less reliable arxiv.org/abs/2507.21919... · Posted by u/Cynddl

nis0s · 12 days ago

An important and insightful study, but I’d caution against thinking that building pro-social aspects in language models is a damaging or useless endeavor. Just speaking from experience, people who give good advice or commentary can balance between being blunt and soft, like parents or advisors or mentors. Maybe language models need to learn about the concept of tough love.

fpgaminer · 12 days ago

"You don't have to be a nice person to be a good person."

fpgaminer commented on Gemini CLI blog.google/technology/de... · Posted by u/sync

joelm · 2 months ago

Been using Claude Code (4 Opus) fairly successfully in a large Rust codebase, but sometimes frustrated by it with complex tasks. Tried Gemini CLI today (easy to get working, which was nice) and it was pretty much a failure. It did a notably worse job than Claude at having the Rust code modifications compile successfully.

However, Gemini at one point output what will probably be the highlight of my day:

"I have made a complete mess of the code. I will now revert all changes I have made to the codebase and start over."

What great self-awareness and willingness to scrap the work! :)

fpgaminer · 2 months ago

Claude will do the same start over if things get too bad. At least I've seen it when its edits went haywire and trashed everything.

fpgaminer commented on I read all of Cloudflare's Claude-generated commits maxemitchell.com/writings... · Posted by u/maxemitchell

fpgaminer · 3 months ago

I used almost 100% AI to build a SCUMM-like parser, interpreter, and engine (https://github.com/fpgaminer/scumm-rust). It was a fun workflow; I could generally focus on my usual work and just pop in occasionally to check on and direct the AI.

I used a combination of OpenAI's online Codex, and Claude Sonnet 4 in VSCode agent mode. It was nice that Codex was more automated and had an environment it could work in, but its thought-logs are terrible. Iteration was also slow because it takes awhile for it to spin the environment up. And while you _can_ have multiple requests running at once, it usually doesn't make sense for a single, somewhat small project.

Sonnet 4's thoughts were much more coherent, and it was fun to watch it work and figure out problems. But there's something broken in VSCode right now that makes its ability to read console output inconsistent, which made things difficult.

The biggest issue I ran into is that both are set up to seek out and read only small parts of the code. While they're generally good at getting enough context, it does cause some degradation in quality. A frequent issue was replication of CSS styling between the Rust side of things (which creates all of the HTML elements) and the style.css side of things. Like it would be working on the Rust code and forget to check style.css, so it would just manually insert styles on the Rust side even though those elements were already styled on the style.css side.

Codex is also _terrible_ at formatting and will frequently muck things up, so it's mandatory to use it with an autoformatter and instructions to use it. Even with that, Codex will often say that it ran it, but didn't actually run it (or ran it somewhere in the middle instead of at the end) so its pull requests fail CI. Sonnet never seemed to have this issue and just used the prevailing style it saw in the files.

Now, when I say "almost 100% AI", it's maybe 99% because I did have to step in and do some edits myself for things that both failed at. In particular neither can see the actual game running, so they'd make weird mistakes with the design. (Yes, Sonnet in VS Code can see attached images, and potentially can see the DOM of vscode's built in browser, but the vision of all SOTA models is ass so it's effectively useless). I also stepped in once to do one major refactor. The AIs had decided on a very strange, messy, and buggy interpreter implementation at first.

fpgaminer commented on Crawlers impact the operations of the Wikimedia projects diff.wikimedia.org/2025/0... · Posted by u/edward

fpgaminer · 4 months ago

Maybe this is an insane idea, but ... how about a spider P2P network?

At least for local AIs it might not be a terrible idea. Basically a distributed cache of the most common sources our bots might pull from. That would mean only a few fetches from each website per day, and then the rest of the bandwidth load can be shared amongst the bots.

Probably lots of privacy issues to work around with such an implementation though.

fpgaminer commented on OpenAI looked at buying Cursor creator before turning to Windsurf cnbc.com/2025/04/17/opena... · Posted by u/mfiguiere

fpgaminer · 4 months ago

Usability/Performance/etc aside, I get such a sense of magic and wonder with the new Agent mode in VSCode. Watching a little AI actually wander around the code and making decisions on how to accomplish a task. It's so unfathomably cool.

fpgaminer commented on OpenAI o3 and o4-mini openai.com/index/introduc... · Posted by u/maheshrijal

fpgaminer · 4 months ago

On the vision side of things: I ran my torture test through it, and while it performed "well", about the same level as 4o and o1, it still fails to handle spatial relationships well, and did hallucinate some details. OCR is a little better it seems, but a more thorough OCR focused test would be needed to know for sure. My torture tests are more focused on accurately describing the content of images.

Both seem to be better at prompt following and have more up to date knowledge.

But honestly, if o3 was only at the same level as o1, it'd still be an upgrade since it's cheaper. o1 is difficult to justify in the API due to cost.

fpgaminer commented on Evidence suggesting Quasar Alpha is OpenAI's new model blog.kilocode.ai/p/quasar... · Posted by u/heymax054

andai · 4 months ago

Are you willing to share this code? I'm working on a project where I'm optimizing the prompt manually, I wonder if it could be automated. I guess I'd have to find a way to actually objectively measure the output quality.

fpgaminer · 4 months ago

https://gist.github.com/fpgaminer/8782dd205216ea2afcd3dda29d...

That's the model automation. To evaluate the prompts it suggests I have a sample of my dataset with 128 examples. For this particular run, all I cared about was optimizing a prompt for Llama 3.1 that would get it to write responses like those I'm finetuning for. That way the finetuning has a better starting point.

So to evaluate how effective a given prompt is, I go through each example and run <user>prompt</user><assistant>responses</assistant> (in the proper format, of course) through llama 3.1 and measure the NLL on the assistant portion. I then have a simple linear formula to convert the NLL to a score between 0 and 100, scaled based on typical NLL values. It should _probably_ be a non-linear formula, but I'm lazy.

Another approach to prompt optimization is to give the model something like:

  I have some texts along with their corresponding scores. The texts are arranged in ascending order based on their scores from worst (low score) to best (higher score).
  
  Text: {text0}
  Score: {score0}
  Text: {text1}
  Score: {score1}
  ...
  
  Thoroughly read all of the texts and their corresponding scores.
  Analyze the texts and their scores to understand what leads to a high score. Don't just look for literal patterns of words/tokens. Extensively research the data until you understand the underlying mechanisms that lead to high scores. The underlying, internal relationships. Much like how an LLM is able to predict the token not just from the literal text but also by understanding very complex relationships of the "tokens" between the tokens.
  Take all of the texts into consideration, not just the best.
  Solidify your understanding of how to optimize for a high score.
  Demonstrate your deep and complete understanding by writing a new text that maximizes the score and is better than all of the provided texts.
  Ideally the new text should be under 20 words.

Or some variation thereof. That's the "one off" approach where you don't keep a conversation with the model and instead just call it again with the updated scores. Supposedly that's "better" since the texts are in ascending order, letting the model easily track improvements, but I've had far better luck with the iterative, conversational approach.

Also the constraint on how long the "new text" can be is important, as all models have a tendency of writing longer and longer prompts with each iteration.