I algorithmically donated $5000 to Open Source

kibwen · 9 months ago

> Value increases with # total downloads and LTM downloads on PyPI.

While I applaud the OP for the initiative, if this ever takes off it will cause people to exploit the system in the following ways:

1. Hammer the package registries with fake downloads, which will increase the financial burden on the registries both in terms of increased resource usage and in employing countermeasures to stop bad actors.

2. People spamming the repositories of popular packages with PRs that just so happen to add a dependency on their own personal package, so that they can pick up transitive downloads. This increases the burden on package authors who will need to spend time rejecting all of these.

So this approach carries the risk of possibly making things even worse for OSS maintainers.

If a metric can be gamed, and there is a financial incentive to game it, it will be gamed. I coin this the "this is why we can't have nice things" law.

TuringTest · 9 months ago

> While I applaud the OP for the initiative, if this ever takes off it will cause people to exploit the system in the following ways

It's true that the metrics used in this story could lead to being exploited. But the value of the initiative is not in the specific method used to donate, but in the idea of finding worthy yet non-obvious projects to donate and in leading by example.

If the initiative catches on, the community can find better, harder-to-exploit methods to find deserving targets, as for example it has happend with NGOs. This idea could create a healthy ecosystem that supports FLOSS software, just like the idea of a stock exchange supported the emergence of public traded corporations in the XVIII and XIX centuries.

kvinogradov · 9 months ago

Exactly! The idea is to use available data for evaluating the value and risk of OSS and then allocate donations accordingly to the wide algo-based systemic index, not to a narrow set of manually picked projects (usually large or popular ones).

The current algorithm is far from being perfect (it's an MVP) and will never be, but with more measurable inputs and after multiple iterations with the help of the community, it can lead to an analogue of "S&P500" for OSS, that's worth using for donating to reduce the risk of the global OSS supply chain we all rely on.

As with publicly traded companies, having a decentralized set of private donors with skin in the game helps a lot to efficiently evolve the approach and make it harder to exploit in the future. And on the contrary, I would not trust an algorithm created and maintained by some state-owned or simply very large institution.

kiba · 9 months ago

If everyone use their own idiosyncratic algorithm for choosing OSS to donate to, it's going to be awfully hard to exploit.

ddulaney · 9 months ago

The mechanism is kinda like the Spotify fake songs case: https://www.justice.gov/usao-sdny/pr/north-carolina-musician...

In the same way, there was a fixed pot of money available split up by popularity, so making thousands of songs and streaming them as much as possible with bot accounts is profitable, even though each bot account cost a few dollars per month.

Here, the bots you use to juice your numbers don’t even need a subscription fee!

reaperman · 9 months ago

Which is why spotify should pay a percentage of MY subscription fee to only the artists that I listen to. My money shouldn’t go to Taylor Swift if I don’t listen to Taylor Swift.

That would eliminate direct financial payment from botting. But botting could still affect trending or “related” recommendations for indirect financial boost.

imtringued · 9 months ago

I find that court case very off putting, since it was Spotify that stole the royalties, because the same mechanic applies to simply being popular. When will someone sue Taylor Swift for stealing royalties?

Also, since they didn't change the economics, they have done nothing to prevent this from happening again. Any economist that sees that he can earn $12 from a $11 payment would keep doing this until the risk adjusted return is equal to the interest rate. Ironically this will remain profitable until the cross subsidy is gone. I.e. there is an incentive to use the bots to boost real musicians who lose out from not being the recipient of the cross subsidy.

tempodox · 9 months ago

Exactly, and Goodhart's law drives the nails in the coffin.

https://en.wikipedia.org/wiki/Goodhart%27s_law

Deleted Comment

Pooge · 9 months ago

Makes me think of the "cobra effect", like the Great Hanoi Rat Massacre.[1]

Set arbitrary metrics like download count -> bad actors make bots to download their package -> they profit while the registry suffers from very heavy load.

[1]: https://en.wikipedia.org/wiki/Great_Hanoi_Rat_Massacre

bee_rider · 9 months ago

Rather than trying to donate to the most popular packages, people could try to donate to the packages they use, and then their dependencies (it would be nicer, though, if repos had a way for packages to list their dependencies and automatically propagate donations they received down—which would be a usable by the top level packages but, eh, you have to trust people at some point).

sesm · 9 months ago

2 is already happening, I have seen this multiple times.

uncomplexity_ · 9 months ago

sweet sweet human nature.

kvinogradov · 9 months ago

Hey HN community, thanks a lot for your great feedback and actionable critique!

It was a simple MVP for personal OSS donations, and I have many considerations on how to evolve it and especially to prevent it from becoming a victim of Goodhart's Law at scale. Some of them:

1) Value and Risk scores shall include more metrics: dependencies, known funding, time since the last dev activity, active contributors, etc. A wider set of connected but relatively independent metrics is harder to fake. Also, it will help to exclude edge cases — for instance, I auto-donated to Pydantic (it's a great OSS), but such support is unlikely needed as they have raised $12.5M Series A from Sequoia this year.

2) Algorithmic does not mean automatic. While I see a strict, measurable, and ideally fully transparent methodology crucial for donating, it does not mean that all inputs shall be automatically generated. For instance, in the stock ETF world, one can generally rely on such metrics as "annual financials" for trading because they are annually audited (although it does not prevent fraud in 100% of cases). In the OSS world, data from trusted ecosystem actors can also be part of the equation.

3) Many guardrails are possible: limited budget per project, manual checks of top repos with the most anomalous changes in metrics. Also, if we target the sustainable maintenance of OSS the world relies on (I do!), then new projects (1-2 years) will unlikely get high scores - that adds another protection layer.

Given the interest in this topic, I am going to continue developing this algorithm further and expand it to other ecosystems (e.g. JS/TS and Rust). Your feedback here is very valuable to me, and those who would like to help make the algo better or donate through it are invited to the newly created gist:

https://gist.github.com/vinogradovkonst/27921217d25390f1bf5e...

mudkipdev · 9 months ago

Great idea, I also think it'd be interesting to systematically donate to the projects with the lowest bus factor (or as that one XKCD describes it: "the project that a random person in Nebraska has been maintaining since 2005")

kvinogradov · 9 months ago

Yes, that would be a very useful risk metric! Assuming access only to public APIs (GitHub, package managers, etc.), how would you define the bus factor in terms of data? I am thinking about # unique contributors over the last X years.

It's funny that the experiment uncovered exactly such a case: a person from Nebraska got my donation as the first income from his open source contributions over the last 18 years and shared this on Linkedin:

https://www.linkedin.com/feed/update/urn:li:activity:7269812...

leoc · 9 months ago

It's a great idea; I have some similar thoughts. The looming problem, though, is that Goodheart's Law is likely to strike if this ever gets scaled up significantly.

swatcoder · 9 months ago

Worse (and very unfortunately), any kind of unmonitored/low-attention payment scheme that starts funneling meaningful money to countless small libraries turns those libraries into revenue streams whose ownership can be valued and sold.

And when that happens, you'll quickly start to see professional schemers and hustlers coming in to buy them, the most aggressive of whom will start surreptitiously converting some of those projects into things like data siphons or bot shells, may relicense and restructure them for further monetization or to set up targeted lawsuits, etc

It creates a whole new and dangerous incentive scheme, not unlike the one that's already corrupted the browser extension ecosystem.

To avoid that from happening, people need to be actually paying attention to what they're paying for, when they're paying for it, so that they can make sure they're still getting what they expect. Automated/algorithmic schemes like this specifically avoid that.

They're a great idea in theory, meant to address a real issue in what reward open source developers might see from popular work, but as you suggest, it opens a very wide and inviting door for trouble at scale.

leoc · 9 months ago

I generally agree, though some of those specific risks already exist and some (like license changes) are if anything probably less likely if they mean giving up significant donation money.

airstrike · 9 months ago

I think you can solve that by funding the dependencies you rely on and have them fund their dependencies and so on...

A related project I recently found out about is https://www.drips.network/ The more I think about it, the more I like it.

In fact, TFA says

> But how should one decide which users to sponsor and how much to donate to each one? It requires data on their importance, and I used PyPI to roughly estimate it for Python packages.

It's better to have one of the slabs in the XKCD comic fund the ones immediately below it than to have users look at the whole thing from the outside and try to arbitrarily allocate importance via some metric like PyPI downloads, GitHub stars and whatnot

leoc · 9 months ago

It's a good start, but it's vulnerable to sticky fingers, patronage relationships and so on if the money becomes serious. For example, what if a project writes internal code instead of having a dependency on someone else's library? Do they get to keep the money which would have gone to an external contributor, creating an incentive to pull everything in-house? Or do they still have to push the same amount of money upstream? That creates the opposite incentive, a bag of free money which can be directed to third-party libraries which purely by coincidence happen to be staffed by members of the downstream project and/or their pals.

But as I said elsewhere, I'm not using this to dismiss the idea or assert that it can't improve things overall. The status quo seems to be pretty bad, so an alternative certainly doesn't have to be flawless to be better overall.

cbeach · 9 months ago

In case anyone else was wondering:

Goodhart's law is an adage often stated as, "When a measure becomes a target, it ceases to be a good measure"

https://en.wikipedia.org/wiki/Goodhart's_law

leoc · 9 months ago

In concrete terms, what you would or will see, if enough money starts going down this track, is open-source contributors changing their behaviour as they seek to make projects "donation-optimised" and to maximise their personal share of donations, and likely also "donation-bait" projects which exist simply to game the system. But even though all this could get quite bad, it's still quite likely to be less bad than the status quo. EDIT: If you're thinking of making such a contribution yourself I don't think the downsides should deter you yet, at least unless you're lucky enough to have control of a truly large bag.

Wilduck · 9 months ago

I think this is a really interesting model for providing funding to open source software. There's something about the "Index Fund" approach that is really appealing. I also think it's interesting that the author was both balancing "value" and "risk". I do wonder, if this became a more dominant strategy for providing funding for open source how you would deal with a couple potentially adverse incentives:

1. Publishing the exact formula for funding is great for transparency, but then leads to an incentive to game the system to capture funding. Doing things like breaking your project up into many small packages, or reducing the number of maintainers are not good for the ecosystem but may lead to more funding. Also, there starts to be an incentive to juice download numbers.

2. In general, rewarding "risk" with additional funding seems like it creates an adverse incentive. This seems like a really tricky problem, because lack of funding is a major source of risk. It seems like there could be a pendulum effect here if you're not careful. Is there a way to structure the funding so it encourages long term "de-risking"?

3abiton · 9 months ago

But the issue is, whatever the criterias are, they will become the main kpis opensource projects (at least the not-so-driven ones) will target for growth. Goodhart's law.

Wilduck · 9 months ago

I think Goodhart's law only applies if you have fixed, published criteria for funding. That's why I mentioned transparency explicitly. I wonder if you could avoid some of the worst of Goodhart's law by saying something like "the formula changes every year, and we will publish it only after 5 years, but the goal is to reward value provided, and de-risk the ecosystem". The idea being you're explicitly trying to incentivize broadly valuable work rather than specific metrics.

It's a bit like the SEO dance. Publishing the exact formula makes it much easier to game SEO, so instead search engines say stuff like "we're trying to gauge the overall quality of the site, we use metrics like [...] but not exclusively, focus on making good, responsive, accessible content above all else". Obviously it doesn't work perfectly and the more money there is, the more incentive to game the system, but it seems better than the alternative of publishing the exact ranking algorithm.

pessimizer · 9 months ago

> Publishing the exact formula for funding is great for transparency, but then leads to an incentive to game the system to capture funding.

I say think of it as an authorization protocol, and watch how people break it in order to figure out how to fix it.

ants_everywhere · 9 months ago

> There's something about the "Index Fund" approach that is really appealing

This is an approach I'm actively working on, although I haven't actively donated anywhere yet.

NelsonMinar · 9 months ago

This reminds me of when Redhat went public in the late 90s and did a generous thing with the friends and family round for the IPO. They included every open source contributor they could find in the Redhat sources. Including me, a grad student at the time. I made a few thousand dollars flipping the stock which probably doubled my salary for the year. (My contribution was an early HTML mode for emacs.) It was a really nice gesture.

Reddit did something similar last year in their IPO. I'd love to read an article on how people benefitted from it.

mrtksn · 9 months ago

Ha, maybe there should be a tool that calculates your "bill" based on the OS stuff you are using and help you make a single payment that distributes it to the rightful owners. That bill thingy can be calculated as how much you actually used this stuff and how much donation they currently receive and then you pick how much you feel like paying thing month.

Distribution can favor projects that need funding to be sustained. Maybe you are using niche library than only 20 other people are using it but you are getting great value out of it, then maybe it can be reasonable to be billed $100(or not a strict sum but high coefficient to make your donation go mostly to this particular library).

hinkley · 9 months ago

I still think it's much more efficient to let your teams vote for 8 or 10 libraries, once or twice a year, send N dollars for every vote over <enough to make it worthwhile to track down donation information and cut a check> and carry over remainders from one vote to the next, so that everything below the cutoff gets some love every couple of intervals.

A lot of people will vote for the obvious ones, a few people will vote for the underdogs and it'll come out in the wash.

That also fights the common complaint here of people gaming the system by splitting up their libraries too much. Sindre, for instance, would get some money for p-limit, p-retry, and maybe p-queue, but not much else for his astounding menagerie of micro-libraries.

treve · 9 months ago

One thing is a bit frustrating for me personally is that I have a few packages that get tons of downloads yet very little stars. This is because I solve a niche well. Im usually deep in the dependency tree, rarely as a direct dependency. I would definitely be ignored by this scheme.

kvinogradov · 9 months ago

There are some relevant tools for this, such as https://thanks.dev. While it doesn't work as a usage-based billing, at least it provides a way to fund all dependencies.

However, the issue is that most organizations relying on OSS are not tech companies. They mainly have no clue about OSS sustainability (e.g., airports and hospitals) and are unlikely to ever fund their own software supply chains, unfortunately. That's why there should be a data-driven index to address the global OSS supply chain, not only any particular ones.

teddyh · 9 months ago

Title is cut off; it is missing “… via GitHub Sponsors and PyPI data ”. My project, for instance, does not use GitHub, nor is it a Python library or even Python-only.

I suspect that a better measurement might be based on what software people actually have installed, perhaps using the Debian Popcon data.

iandanforth · 9 months ago

I love the idea! Anyone here from MS/Github who could integrate this into Github sponsors? That way you could "Donate to Open Source" and see the allocation distribution without having to do all this work.