apwell23 (u/apwell23)

apwell23 commented on Three Years from GPT-3 to Gemini 3 oneusefulthing.org/p/thre... · Posted by u/JumpCrisscross

stavros · 18 days ago

> did you not run into this problem described by ilya below

I used to run into a related issue, where fixing a bug would add more bugs, to the point where it would not be able to progress past a given codebase complexity. However, Codex is much better at not doing that. There are some cases where the model kept going back and forth between two bugs, but I discovered that that was because I had misunderstood the constraints and was telling the model to do something impossible.

> how did you discover that and why it slip out.

Sentry alerted me but I thought it was an edge case, and I didn't pay attention until hours later.

I use a spiral allocation algorithm to allocate plots, so new users are clustered around the center. Sometimes plots are emptied (when the user isn't active), so you can have gaps in the spiral, which the algorithm tries to fill, and it's meant to go to the next plot if the current one can't be assigned.

For one specific plot, however, conditions were such that the database was giving an integrity error. The exception handling code that was supposed to handle that didn't take into account that it needed to roll back before resuming, so the entire request failed, instead of resuming gracefully. Just adding an atomic() context manager fixed it.

> looks like site wasn't working at all when you posted that comment?

It was working for a few hundreds (thousands?) of visitors, then the allocation code hit the plot that caused the bug, and signup couldn't proceed after that.

apwell23 · 18 days ago

> Just adding an atomic() context manager fixed it.

ok looks like you are intimately familiar with the code that is being produced and are AI as code generator rather than pure vibe coding. That makes sense to me.

Btw did AI add that line when you explained what the error was or did you add that in manually.

apwell23 commented on Three Years from GPT-3 to Gemini 3 oneusefulthing.org/p/thre... · Posted by u/JumpCrisscross

mirekrusin · 18 days ago

Have you also looked at the rest 1h 36m or just those out of context 30s?

apwell23 · 18 days ago

have you ever made a non annoying comment

apwell23 commented on Three Years from GPT-3 to Gemini 3 oneusefulthing.org/p/thre... · Posted by u/JumpCrisscross

mirekrusin · 18 days ago

We're living in such interesting times - you can talk to a computer and it works, in many cases at extraordinary level - yet you still see intellectually constipated opinions arguing against basic facts established years ago - incredible.

apwell23 · 18 days ago

atleast you are self aware

apwell23 commented on CS234: Reinforcement Learning Winter 2025 web.stanford.edu/class/cs... · Posted by u/jonbaer

TNWin · 18 days ago

I didn't get the reference. Please elaborate.

apwell23 · 18 days ago

he said RL sucks because it narrowly optimizes to solve a certain set of problems in certain sets of conditions.

he compared it to students who win at math competition but cant do anything practical .

apwell23 commented on Guests ejected mid-stay from bankrupt hotel chain Sonder bbc.com/news/articles/c36... · Posted by u/onemoresoop

orwin · a month ago

Didn't Altman fail upward multiple times before working at YC?

apwell23 · 18 days ago

yea paul graham knew with 10 mins of meeting teenage sam that sama is going to be the next bill gates .

True story from silicon valley ( real one not HBO one)

apwell23 commented on Three Years from GPT-3 to Gemini 3 oneusefulthing.org/p/thre... · Posted by u/JumpCrisscross

mirekrusin · 19 days ago

Original post alone mentions multiple projects and links https://pine.town as no code directly written by the author.

From perspective of personally using it daily, seeing what my team is using it for it's quite shocking to still see those kind of comments, it's like we're living on different planets - again, gives flat earther like vibe.

apwell23 · 18 days ago

god you are so annoying. the site that you posted doesn't even work. so wtf are you even gloating about.

apwell23 commented on Three Years from GPT-3 to Gemini 3 oneusefulthing.org/p/thre... · Posted by u/JumpCrisscross

mirekrusin · 19 days ago

Stochastic parrot? Autocomplete on steroids? Fancy autocorrect? Bullshit generator? AI snake oil? Statistical mimicry?

You don't hear that anymore.

Feels like whole generation of skeptics evaporated.

apwell23 · 18 days ago

> Feels like whole generation of skeptics evaporated.

https://www.youtube.com/watch?v=aR20FWCCjAs&list=PLd7-bHaQwn...

Ilya Sutskever this week.

apwell23 commented on Three Years from GPT-3 to Gemini 3 oneusefulthing.org/p/thre... · Posted by u/JumpCrisscross

stavros · 19 days ago

> how many prompts did it take you to make this?

Probably hundreds, I'd say.

> how did you make sure that each new prompt didn't break some previous functionality?

For the backend, I reviewed the code and steered it to better solutions a few times (fewer than I thought I'd need to!). For the frontend, I only tested and steered, because I don't know much about React at all.

This was impossible with previous models, I was really surprised that Codex didn't seem to completely break down after a few iterations!

> did you have a precise vision

I had a fairly precise vision, but the LLM made some good contributions. The UI aesthetic is mostly the LLM, as I'm not very good at that. The UX and functionality is almost entirely me.

apwell23 · 18 days ago

did you not run into this problem described by ilya below

https://www.youtube.com/watch?v=aR20FWCCjAs&list=PLd7-bHaQwn...

this has been my experience purely vibecoding. i am surprised it works well for others.

btw the current production bug. how did you discover that and why it slip out. looks like site wasn't working at all when you posted that comment?

apwell23 commented on Three Years from GPT-3 to Gemini 3 oneusefulthing.org/p/thre... · Posted by u/JumpCrisscross

stavros · 20 days ago

It's gotten more and more shippable, especially with the latest generation (Codex 5.1, Sonnet 4.5, now Opus 4.5). My metric is "wtfs per line", and it's been decreasing rapidly.

My current preference is Codex 5.1 (Sonnet 4.5 as a close second, though it got really dumb today for "some reason"). It's been good to the point where I shipped multiple projects with it without a problem (with eg https://pine.town being one I made without me writing any code).

apwell23 · 19 days ago

> https://pine.town

how many prompts did it take you to make this?

how did you make sure that each new prompt didn't break some previous functionality?

did you have a precise vision for it when you started or did you just go with whatever was being given to you?