Readit News logoReadit News
JoshPurtell commented on Show HN: Smooth – Faster, cheaper browser agent API   smooth.sh/... · Posted by u/liukidar
JoshPurtell · 2 days ago
Looks really good!

Deleted Comment

JoshPurtell commented on MAID in Canada   nathansnelgrove.com/2025/... · Posted by u/surprisetalk
jmacd · 3 days ago
The median age for MAID recipients is over 77.
JoshPurtell · 3 days ago
50% is not a vast majority, so that's a red herring
JoshPurtell commented on MAID in Canada   nathansnelgrove.com/2025/... · Posted by u/surprisetalk
14 · 3 days ago
The vast majority of people choosing maid are those who enter hospice and wish to go peacefully before their terminal illness degrades their life to a state of loss of bodily control and pain.
JoshPurtell · 3 days ago
34% are between 18 and 65
JoshPurtell commented on Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated tool   github.com/bkdevs/async-s... · Posted by u/wjsekfghks
wjsekfghks · 4 days ago
Interesting. So, do you just start multiple instances of Claude Code and ask the same prompt on all of them? Manually cherry picking from 5 different worktrees sounds complicated. Will see what I can do :)
JoshPurtell · 3 days ago
Yeah, exactly, same prompt.

I agree, it's more complex. But, I feel like the potential with a claude code wrapper is precisely in enabling workflows that are a pain to self-implement but nonetheless are incredibly powerful

JoshPurtell commented on Show HN: Async – Claude code and Linear and GitHub PRs in one opinionated tool   github.com/bkdevs/async-s... · Posted by u/wjsekfghks
JoshPurtell · 4 days ago
Something I'd consider a game-changer would be making it really easy to kick off multiple claude instances to tackle a large researched task and then to view the results and collect them into a final research document.

IME no matter how well I prompt, a single claude/codex will never get a successful implementation of a significant feature single-shot. However, what does work is having 5 Claudes try it, reading the code and cherry picking the diff segments I like into one franken-spec I give to a final claude instance with essentially just "please implement something like this"

It's super manual nd annoying with git work-trees for me, but sounds like your setup could make it slick

JoshPurtell commented on Launch HN: Skope (YC S25) – Outcome-based pricing for software products    · Posted by u/benjsm
JoshPurtell · 8 days ago
The basic idea is this. The customer has some "curve" that represents how much he values different outcomes. Maybe he values good outcomes at $1, and great outcomes at $100. The supplier also has a cost curve - by definition, it will cost him more to supply a great outcome than a good outcome (otw he'd just always supply the great outcome).

Setting a fixed price is a simple way to help these two parties transact. But hypothetically, it may be more efficient - e.g. you will let more mutually-beneficial events happen - to ask both parties for what their number is for a given event, and having both transact when the numbers are far enough apart (cost is $10, value is $100).

The problem is, you can't directly ask the parties, because they don't want to reveal how high/low they're willing to go for no reason. So, you should essentially structure your questions into a pre-defined algorithm so that everyone is incentivized to reveal at least the ballpark of where their cost/value is. The study of how to structure those questions is a subset of mechanism design / information design, which is a branch of Econ related to game theory

JoshPurtell · 8 days ago
FWIW, if this sounds like arcane academic musing ... applied mechanism design for a while was essentially just the study of google ad auctions, and Google invested very very heavily in researchers to figure out how to do this for them
JoshPurtell commented on Launch HN: Skope (YC S25) – Outcome-based pricing for software products    · Posted by u/benjsm
benjsm · 8 days ago
Thank you - yes! Would love to lean more about mechanism design. Mind diving deeper?
JoshPurtell · 8 days ago
The basic idea is this. The customer has some "curve" that represents how much he values different outcomes. Maybe he values good outcomes at $1, and great outcomes at $100. The supplier also has a cost curve - by definition, it will cost him more to supply a great outcome than a good outcome (otw he'd just always supply the great outcome).

Setting a fixed price is a simple way to help these two parties transact. But hypothetically, it may be more efficient - e.g. you will let more mutually-beneficial events happen - to ask both parties for what their number is for a given event, and having both transact when the numbers are far enough apart (cost is $10, value is $100).

The problem is, you can't directly ask the parties, because they don't want to reveal how high/low they're willing to go for no reason. So, you should essentially structure your questions into a pre-defined algorithm so that everyone is incentivized to reveal at least the ballpark of where their cost/value is. The study of how to structure those questions is a subset of mechanism design / information design, which is a branch of Econ related to game theory

JoshPurtell commented on Launch HN: Skope (YC S25) – Outcome-based pricing for software products    · Posted by u/benjsm
JoshPurtell · 8 days ago
This is a really great idea!

It sounds like your first approach is to verify that events met an agreed-upon threshold.

Have you looked into Mechanism Design and getting customers to e.g. pay more for great outcomes, and a little for ok outcomes?

JoshPurtell commented on Dispelling misconceptions about RLHF   aerial-toothpaste-34a.not... · Posted by u/fpgaminer
williamtrask · 12 days ago
Nit: the author says that supervised fine tuning is a type of RL, but it is not. RL is about delayed reward. Supervised fine tuning is not in any way about delayed reward.
JoshPurtell · 12 days ago
RL is not about delayed reward. Multi-armed bandit problems have no credit assignment component, but are often the first RL problem taught.

In its most general, RL is about learning a policy (state -> action mapping). Which often requires inferring value, etc.

But copying a strong reference policy ... is still learning a policy. Whether by SFT or not

u/JoshPurtell

KarmaCake day14August 10, 2024View Original