[1] https://github.com/Benjamin-Lee/vdsearch/blob/2045b29928f7b4...
[1] https://github.com/Benjamin-Lee/vdsearch/blob/2045b29928f7b4...
For the kind of tasks the author outlines, I'd use AI. It excels at this: these are really simple, well-defined tasks it won't screw up.
So what I would do is pick a faster language - I'd pick Rust - then ask AI to script it and then repeat for as many tasks as you need.
If you get a chance, please do publish this package in something like the Journal of Open Source Software so that I can cite it. Thanks for sharing this tool!
[1] https://www.cell.com/cell/fulltext/S0092-8674(22)01582-3#gr4
For reference, there’s a very popular variant of the Speedmaster Professional that uses a sapphire crystal rather than a hesalite (basically plastic) crystal. Despite never having been used in space (since the shattering crystal is a risk) the sapphire model is still considered a “professional” edition and highly sought after. People who wear it enjoy the aesthetic of the spacefaring version with the earthly practicality of scratch-proof sapphire. The Swatch version’s desirability is just the same logic a taken a few more steps. It’s made by Omega, has the same basic design, and evokes the imagery of the space race.
Alternatively, consider the popularity of the Tesla toy car for kids. It’s not anything like the real Tesla in terms of functionality but is still a cool electric vehicle, especially if you already have a Tesla.
I still follow the plan to this day, albeit with more sophisticated logging. I use a combinatorial optimizer web app that tells me what to eat every day so that it completely takes the element of choice (and ability to screw up) out of the equation. I've developed it for the last five years and am hoping to release it as a product and/or open source eventually. If anyone's interested, shoot me an email (link on my website) and I'll share access.
I can do `mypackage.mymethod` but it will only be in my own code, because it's not the convention
Is this the scientific version of "rich people problems"?
But I have a problem with talking real life application and using that to claim
> it will be impossible for pure Python to beat pure Nim at raw performance
Because it maybe be true. I didn't try it but there are many things that can be optimized in the python code example but away from that. In real life application in scientific computing I don't think anyone wouldn't use numpy to deal with that which will make things much better. Also the power of python in data analysis and scientific computing is the ecosystem and community. This will be very hard to beat. And there are more mature alternatives like Julia.
Edit: The author code for reading the data reading is creating a new file object for each iteration. I would guess that in nim this would be a similar problem but I am not sure how it actually work or if has the same effect. But anyway you don't do this in real life application with python. Also it would be nice to use a list comprehension to count the occurrences of 'C' and 'G' in each line.
> Is this the scientific version of "rich people problems"?
Author here. Yes, most certainly. In fact, it was one of the things that drew me towards the NIH for my PhD. My overall point in the post was to show that a somewhat naive Python implementation and a much faster Nim version have a small Levenshtein distance. For many people in bioinformatics who don't have a background in software engineering (that would be a significant fraction, if not a majority of them), this could be a huge boon. Combined with the fact that most bioinformatics researchers don't have the privilege of the world's largest biomedical HPC cluster at their disposal, I still think Nim would be great drop-in replacement for quick single-threaded line-oriented string processing. For numerical stuff, probably not.
However, I am mostly writing in Rust these days for longer term projects that require threading and good ecosystem support. Perhaps I'll write a follow-up retrospective on Rust versus Nim in this area.