Readit News logoReadit News
stabbles · 3 months ago
For Python (or PyPI) this is easier, since their data is available on Google BigQuery [1], so you can just run

    SELECT * FROM `bigquery-public-data.pypi.distribution_metadata` ORDER BY length(version) DESC LIMIT 10
The winner is: https://pypi.org/project/elvisgogo/#history

The package with most versions still listed on PyPI is spanishconjugator [2], which consistently published ~240 releases per month between 2020 and 2024.

[1] https://console.cloud.google.com/bigquery?p=bigquery-public-...

[2] https://pypi.org/project/spanishconjugator/#history

Rygian · 3 months ago
Regarding spanishconjugator, commit ec4cb98 has description "Remove automatic bumping of version".

Prior to that commit, a cronjob would run the 'bumpVersion.yml' workflow four times a day, which in turn executes the bump2version python module to increase the patch level. [0]

Edit: discussed here: https://github.com/Benedict-Carling/spanish-conjugator/issue...

[0] https://github.com/Benedict-Carling/spanish-conjugator/commi...

dijksterhuis · 3 months ago
i love the package owner’s response in that issue xD
breakingcups · 3 months ago
Tangential, but I've only heard about BigQuery from people being surprised with gargantuan bills for running one query on a public dataset. Is there a "safe" way to use it with a cost limit, for example?
abxyz · 3 months ago
Yes you can set price caps. The cost of a query is understandable ahead of time with the default pricing model ($6 per TB of data processed in a query). People usually get caught out by running expensive queries recursively. BigQuery is very cost effective and can be used safely.
thesystemisbust · 3 months ago
You can also query for free at clickpy.clickhouse.com. If you click on any of the links on the visuals you can see the query used.

The underlying dataset is hosted at sql.clickhouse.com e.g. https://sql.clickhouse.com/?query=U0VMRUNUIGNvdW50KCkgICBGUk...

disclaimer: built this a a while ago but we maintain this at clickhouse

oh and rubygems data is also there.

darkamaul · 3 months ago
Here [0] is the partial query on the ClickHouse dataset, with different results due to a quota error [1].

[0] https://sql.clickhouse.com?query=U0VMRUNUIHByb2plY3QsIE1BWCh...

[1] Quota read limit exceeded. Results may be incomplete.

passivegains · 3 months ago
I decided my life could not possibly go on until I knew what "elvisgogo" does, so I downloaded the tarball and poked around. it's a pretty ordinary numpy + pandas + matplotlib project that makes graphs from csv. one line jumped out at me: str_0 = ['refractive_index','Na','Mg','Al','Si','K','Ca','Ba','Fe','Type'] the university of st. andrews has a laser named "elvis" that goes on a remote controlled submarine: https://www.st-andrews.ac.uk/~bds2/elvislaser.htm I was hoping it'd be about go-go dancing to elvis music, but physics experiments on light in seawater is pretty cool too.
n4r9 · 3 months ago
> spanishconjugator [2], which consistently published ~240 releases per month between 2020 and 2024

They also stopped updating major and minor versions after hitting 2.3 in Sept 2020. Would be interesting to hear the rationale behind the versioning strategy. Feels like you might as well use a datetimestamp for the version.

0x500x79 · 3 months ago
deps.dev has a similar bigquery dataset across a couple more languages if someone wanted to do analysis across the other ecosystems they support.
jonchurch_ · 3 months ago
The author has run into the same problem that anyone who wants to do analysis on the NPM registry runs into, there's just no good first party API for this stuff anymore.

It seems this was their first time going down this rabbit hole, so for them and anyone else, I'd urge you to use the deps.dev Google BigQuery dataset [0] for this kind of analysis. It does indeed include NPM and would have made the author's work trivial.

Here's a gist with the query and the results https://gist.github.com/jonchurch/9f9283e77b4937c8879448582b...

[0] - https://docs.deps.dev/bigquery/v1/

bapak · 3 months ago
> there are over 2800 legacy mixed-case packages, many of which have the same spelling as other existing lowercase packages

This is insane

BobbyTables2 · 3 months ago
Sounds like a typo-squatter’s paradise!
dotancohen · 3 months ago

  > This is insane
Not for the JavaScript world.

I hate to deride the entire community, but many of the collective community decisions are smells. I think that the low barrier to entry means that the community has many inexperienced influential people.

krapp · 3 months ago
A lot of these decisions were made after Javascript went "enterprise" to make it seem more like a "serious" programming language to SV entrepreneurs by a small number of corporations, not necessarily the community.

The bar for entry was always low with javascript, but it also used to be a lot more sane when it was a publicly-driven language.

sundarurfriend · 3 months ago
The Julia General registry is locally stored as a tar.gz and has version info for all registered packages, so I tried this out for Julia packages. The top 5 are:

    DiffEqBase                  6.189.1   
    LoopVectorization           0.12.172  
    Reactant                    0.2.161   
    Mooncake                    0.4.159   
    Distributions               0.25.120  
So, no crazy numbers or random unknown packages, all are major packages that have just had a lot of work and history to them. Out of the top 10, pretty much half were from the SciML ecosystem.

Caveats/constraints: Like the post, this ignores non-SemVer packages (which mostly used date-based versions) and also jll (binary wrapper) packages which just use their underlying C libraries' versions. Among jlls, the largest that isn't a date afaict is NEO_jll with 25.31.34666+0 as its version.

dotancohen · 3 months ago
You might want to try a different storing strategy. 0.25 is above 0.4. These are, I believe, what are called in Unix flags "human numbers".
Savageman · 3 months ago
I understood the list is ordered by biggest number, aka 189 > 172 > 161 > 159 > 120
int_19h · 3 months ago
This would seem to imply that the vast majority of Julia packages are 0.x?
sundarurfriend · 3 months ago
There are many that are, but I feel like your comment is based on the same faulty assumption as your sibling comment - that this is an ordering of version numbers as a whole. It's not, the ordering is on the same basis as in the post, the largest single number within the MAJOR.Minor.patch trio.
aragonite · 3 months ago
Incidentally I once ran into a mature package that had lived in the 0.0.x lane forever and treated every release as a patch, racking up a huge version number, and I had to remind the maintainer that users depending with caret ranges won't get those updates automatically. (In semver caret ranges never change the leftmost non-zero digit; in 0.0.x that digit is the patch version, so ^0.0.123 is just a hard pin to 0.0.123). There may occasionally be valid reasons to stay on 0.0.x though (e.g. @types/web).
robin_reala · 3 months ago
Presumably they’re following https://0ver.org/
BobbyTables2 · 3 months ago
Isn’t vim or bash kinda like that? One of them publishes something like a few hundred patches on top the released tarball…
jve · 3 months ago
Maybe that is intentional? Which package is it?
aragonite · 3 months ago
It's the type definitions for developing chrome extensions. They'd been incrementing in the 0.0.x lane for almost a decade and bumped it to 0.1.0 after I raised the issue, so I doubt it was intentional:

https://www.npmjs.com/package/@types/chrome?activeTab=versio...

CITIZENDOT · 3 months ago
threejs ?
franky47 · 3 months ago
Anthony Fu’s epoch versioning scheme (to differentiate breaking change majors from "marketing" majors) could yield easy winners here, at least on the raw version number alone (not the number of sequential versions released):

https://antfu.me/posts/epoch-semver

bapak · 3 months ago
> People often assume that a zero-major version indicates that the software is not ready for production

I wonder why. Conventions that are being broken, maybe.

remedan · 3 months ago
I don't know if this is the origin, but the semver spec says 0.x.y is unstable. Sure, not everybody uses semver, but it is popular enough for people to make incorrect assumptions.

https://semver.org/#spec-item-4

dotancohen · 3 months ago
I agree with that sentiment.

If the guy writing and maintaining the software is stating "this software is not stable yet" then who am I to disagree?

nosefurhairdo · 3 months ago
The "winner" just had its 3000th release on GitHub, already a few patch versions past the version referenced in this article (which was published today): https://github.com/wppconnect-team/wa-version
genshii · 3 months ago
After double-checking some things, the real winner is actually: https://github.com/nice-registry/all-the-package-names

I made a fairly significant (dumb) mistake in the logic for extracting valid semver versions. I was doing a falsy check, so if any of major/minor/patch in the version was a 0, the whole package was ignored.

The post has been updated to reflect this.

TZubiri · 3 months ago
Brief reminder/clarification that these tools are used to circumvent WhatsApp ToS, and that they are used to:

1- Spam 2- Scam 3- Avoid paying for Whatsapp API (which is the only form of monetization)

And that the reason this thing gets so many updates is probably because of a mouse and cat game where Meta updates their software continuously to avoid these types of hacks and the maintainers do so as well, whether in automated or manual fashion.

afiori · 3 months ago
Considering the 18 billions price tag and the current mixing of user data between meta and WhatsApp I believe that meta has now revenue streams in mind than just the API pricing
oconnore · 3 months ago
This package also seems to just have a misbehaving github action that is in a loop.
genshii · 3 months ago
Hmm yeah, I decided that one counts because the new packages have (slightly) different content, although it might be the case that the changes are junk/pointless anyway.
whilenot-dev · 3 months ago
> Time to fetch version data for each one of those packages: ~12 hours (yikes)

The author could improve the batching in fetchAllPackageData by not waiting for all 50 (BATCH_SIZE) promises to resolve at once. I just published a package for proper promise batching last week: https://www.npmjs.com/package/promises-batched

winrid · 3 months ago
What's the benefit of promises like this here?

Just spin up a loop of 50 call chains. When one completes you just do the next on next tick. It's like 3 lines of code. No libraries needed. Then you're always doing 50 at a time. You can still use await.

async work() { await thing(); nextTick(work); }

for(to 50) { work(); }

then maybe a separate timer to check how many tasks are active I guess.

whilenot-dev · 3 months ago
Promise.all waits for all 50 promises to resolve, so if one of these promises takes 3s, while the other 49 are taking 0.5s, you're waisting 2.5s awaiting each batch.

The implementation is rather simple, but more than 3 LoC: https://github.com/whilenot-dev/promises-batched/blob/main/s...

1gn15 · 3 months ago
Worried about being rate limited or DoSing the server.
whilenot-dev · 3 months ago
Sure, the need for backpressure occurs anyway, regardless of batching optimization.

Couldn't find any specific rate limit numbers besides the one mentioned here[0] from 2019:

> Up to five million requests to the registry per month are considered acceptable at this time

[0]: https://blog.npmjs.org/post/187698412060/acceptible-use.html

genshii · 3 months ago
Ah this is cool, thanks!