JoshPurtell (u/JoshPurtell)

JoshPurtell commented on Show HN: Smooth – Faster, cheaper browser agent API smooth.sh/... · Posted by u/liukidar

JoshPurtell · 2 days ago

Looks really good!

JoshPurtell commented on MAID in Canada nathansnelgrove.com/2025/... · Posted by u/surprisetalk

jmacd · 3 days ago

The median age for MAID recipients is over 77.

JoshPurtell · 3 days ago

50% is not a vast majority, so that's a red herring

JoshPurtell commented on MAID in Canada nathansnelgrove.com/2025/... · Posted by u/surprisetalk

14 · 3 days ago

The vast majority of people choosing maid are those who enter hospice and wish to go peacefully before their terminal illness degrades their life to a state of loss of bodily control and pain.

JoshPurtell · 3 days ago

34% are between 18 and 65

JoshPurtell commented on Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated tool github.com/bkdevs/async-s... · Posted by u/wjsekfghks

wjsekfghks · 4 days ago

Interesting. So, do you just start multiple instances of Claude Code and ask the same prompt on all of them? Manually cherry picking from 5 different worktrees sounds complicated. Will see what I can do :)

JoshPurtell · 3 days ago

Yeah, exactly, same prompt.

I agree, it's more complex. But, I feel like the potential with a claude code wrapper is precisely in enabling workflows that are a pain to self-implement but nonetheless are incredibly powerful

JoshPurtell commented on Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated tool github.com/bkdevs/async-s... · Posted by u/wjsekfghks

JoshPurtell · 4 days ago

Something I'd consider a game-changer would be making it really easy to kick off multiple claude instances to tackle a large researched task and then to view the results and collect them into a final research document.

IME no matter how well I prompt, a single claude/codex will never get a successful implementation of a significant feature single-shot. However, what does work is having 5 Claudes try it, reading the code and cherry picking the diff segments I like into one franken-spec I give to a final claude instance with essentially just "please implement something like this"

It's super manual nd annoying with git work-trees for me, but sounds like your setup could make it slick

JoshPurtell commented on Launch HN: Skope (YC S25) – Outcome-based pricing for software products · Posted by u/benjsm

JoshPurtell · 8 days ago

The basic idea is this. The customer has some "curve" that represents how much he values different outcomes. Maybe he values good outcomes at $1, and great outcomes at $100. The supplier also has a cost curve - by definition, it will cost him more to supply a great outcome than a good outcome (otw he'd just always supply the great outcome).

Setting a fixed price is a simple way to help these two parties transact. But hypothetically, it may be more efficient - e.g. you will let more mutually-beneficial events happen - to ask both parties for what their number is for a given event, and having both transact when the numbers are far enough apart (cost is $10, value is $100).

The problem is, you can't directly ask the parties, because they don't want to reveal how high/low they're willing to go for no reason. So, you should essentially structure your questions into a pre-defined algorithm so that everyone is incentivized to reveal at least the ballpark of where their cost/value is. The study of how to structure those questions is a subset of mechanism design / information design, which is a branch of Econ related to game theory

JoshPurtell · 8 days ago

FWIW, if this sounds like arcane academic musing ... applied mechanism design for a while was essentially just the study of google ad auctions, and Google invested very very heavily in researchers to figure out how to do this for them

JoshPurtell commented on Launch HN: Skope (YC S25) – Outcome-based pricing for software products · Posted by u/benjsm

benjsm · 8 days ago

Thank you - yes! Would love to lean more about mechanism design. Mind diving deeper?

JoshPurtell · 8 days ago

The basic idea is this. The customer has some "curve" that represents how much he values different outcomes. Maybe he values good outcomes at $1, and great outcomes at $100. The supplier also has a cost curve - by definition, it will cost him more to supply a great outcome than a good outcome (otw he'd just always supply the great outcome).

Setting a fixed price is a simple way to help these two parties transact. But hypothetically, it may be more efficient - e.g. you will let more mutually-beneficial events happen - to ask both parties for what their number is for a given event, and having both transact when the numbers are far enough apart (cost is $10, value is $100).

The problem is, you can't directly ask the parties, because they don't want to reveal how high/low they're willing to go for no reason. So, you should essentially structure your questions into a pre-defined algorithm so that everyone is incentivized to reveal at least the ballpark of where their cost/value is. The study of how to structure those questions is a subset of mechanism design / information design, which is a branch of Econ related to game theory

JoshPurtell commented on Launch HN: Skope (YC S25) – Outcome-based pricing for software products · Posted by u/benjsm

JoshPurtell · 8 days ago

This is a really great idea!

It sounds like your first approach is to verify that events met an agreed-upon threshold.

Have you looked into Mechanism Design and getting customers to e.g. pay more for great outcomes, and a little for ok outcomes?

JoshPurtell commented on Dispelling misconceptions about RLHF aerial-toothpaste-34a.not... · Posted by u/fpgaminer

williamtrask · 12 days ago

Nit: the author says that supervised fine tuning is a type of RL, but it is not. RL is about delayed reward. Supervised fine tuning is not in any way about delayed reward.

JoshPurtell · 12 days ago

RL is not about delayed reward. Multi-armed bandit problems have no credit assignment component, but are often the first RL problem taught.

In its most general, RL is about learning a policy (state -> action mapping). Which often requires inferring value, etc.

But copying a strong reference policy ... is still learning a policy. Whether by SFT or not