benjamin-lee (u/benjamin-lee)

benjamin-lee commented on I use Nim instead of Python for data processing (2021) benjamindlee.com/posts/20... · Posted by u/archargelod

elashri · 2 years ago

> so it’s usually not worth spending a ton of time optimizing single-threaded performance for a single experiment when I can just perform a big MapReduce.

Is this the scientific version of "rich people problems"?

But I have a problem with talking real life application and using that to claim

> it will be impossible for pure Python to beat pure Nim at raw performance

Because it maybe be true. I didn't try it but there are many things that can be optimized in the python code example but away from that. In real life application in scientific computing I don't think anyone wouldn't use numpy to deal with that which will make things much better. Also the power of python in data analysis and scientific computing is the ecosystem and community. This will be very hard to beat. And there are more mature alternatives like Julia.

Edit: The author code for reading the data reading is creating a new file object for each iteration. I would guess that in nim this would be a similar problem but I am not sure how it actually work or if has the same effect. But anyway you don't do this in real life application with python. Also it would be nice to use a list comprehension to count the occurrences of 'C' and 'G' in each line.

benjamin-lee · 2 years ago

>> so it’s usually not worth spending a ton of time optimizing single-threaded performance for a single experiment when I can just perform a big MapReduce. Is this the scientific version of "rich people problems"?

> Is this the scientific version of "rich people problems"?

Author here. Yes, most certainly. In fact, it was one of the things that drew me towards the NIH for my PhD. My overall point in the post was to show that a somewhat naive Python implementation and a much faster Nim version have a small Levenshtein distance. For many people in bioinformatics who don't have a background in software engineering (that would be a significant fraction, if not a majority of them), this could be a huge boon. Combined with the fact that most bioinformatics researchers don't have the privilege of the world's largest biomedical HPC cluster at their disposal, I still think Nim would be great drop-in replacement for quick single-threaded line-oriented string processing. For numerical stuff, probably not.

However, I am mostly writing in Rust these days for longer term projects that require threading and good ecosystem support. Perhaps I'll write a follow-up retrospective on Rust versus Nim in this area.

benjamin-lee commented on I use Nim instead of Python for data processing (2021) benjamindlee.com/posts/20... · Posted by u/archargelod

nick__m · 2 years ago

while unexplored in this article, the Nim type system is pretty nice too, particularly the subrange type, the enum and the object variants.

benjamin-lee · 2 years ago

Author here. I love the type system. By using distinct strings to represent DNA, RNA, and protein, I can avoid silly errors while still using the optimized implementations under the hood. This is what the `bioseq` library (about two hundred lines) does [1] and I find it incredibly elegant.

[1] https://github.com/Benjamin-Lee/vdsearch/blob/2045b29928f7b4...

benjamin-lee commented on I use Nim instead of Python for data processing (2021) benjamindlee.com/posts/20... · Posted by u/archargelod

julianeon · 2 years ago

I don't think the advantage here is so much that Nim is fast, as Python is slow. If you're willing to dump Python you have many compiled language options, but I'll pick two: C and Rust.

For the kind of tasks the author outlines, I'd use AI. It excels at this: these are really simple, well-defined tasks it won't screw up.

So what I would do is pick a faster language - I'd pick Rust - then ask AI to script it and then repeat for as many tasks as you need.

benjamin-lee · 2 years ago

Author here. This is basically the approach I am using these days to get maximum multithreaded performance for when it really counts (inner loops) [1]. I draft in Python and use Copilot to convert it to Rust, then optimize from there. However, Nim is still better than Rust in my opinion for simple scripts that I don't want to spend a bunch of time writing. Its only major downside is its relative lack of support for parallelism and bioinformatics (i.e., why I used Rust for a more serious project).

[1] https://github.com/Benjamin-Lee/circkit

benjamin-lee commented on What's an obelisk, anyway? science.org/content/blog-... · Posted by u/herodotus

frozenport · 2 years ago

Where are the TEM images of the proposed structures?

benjamin-lee · 2 years ago

Not the author of this paper but am current PhD student focused on viroid discovery. There's no TEM but there are good methods such as RNAfold [0] for predicting their structures. In the case of rod-shaped RNAs, the prediction methods are quite good since it basically comes down to looking for substrings of reverse complement sequences within the circular RNA.

[0]: http://dx.doi.org/10.1186/1748-7188-6-26

benjamin-lee commented on Show HN: PyCirclize – Circular Visualization in Python github.com/moshi4/pyCircl... · Posted by u/moshi4

benjamin-lee · 3 years ago

This looks excellent for my use case, which is visualizing viroid and viroid-like circular genomes. Right now, I reluctantly use the R version but am eager to try this out. I spent countless hours making this figure [1] for my last paper and am quite confident that would have saved me a ton of time.

If you get a chance, please do publish this package in something like the Journal of Open Source Software so that I can cite it. Thanks for sharing this tool!

[1] https://www.cell.com/cell/fulltext/S0092-8674(22)01582-3#gr4

benjamin-lee commented on Omega Swatch Speedmaster swatch.com/pl-pl/bioceram... · Posted by u/kwikiel

jp0d · 4 years ago

Apparently there was a huge queue outside the Swatch store here in Melbourne and they sold out all of those "Omega" models in less than 7 minutes. As far as I can tell, this is just another Swatch with quartz movement. Does anyone know why this is so popular apart from the Omega branding?

benjamin-lee · 4 years ago

I’m a huge fan of the moonwatch both for its beauty and history with the space program. Ever since I was 10 I’ve wanted one. However, I’m a PhD student and there’s no way I can afford to get a “real” one. The Swatch release is really attractive to me since it captures the spirit of the watch while making it accessible.

For reference, there’s a very popular variant of the Speedmaster Professional that uses a sapphire crystal rather than a hesalite (basically plastic) crystal. Despite never having been used in space (since the shattering crystal is a risk) the sapphire model is still considered a “professional” edition and highly sought after. People who wear it enjoy the aesthetic of the spacefaring version with the earthly practicality of scratch-proof sapphire. The Swatch version’s desirability is just the same logic a taken a few more steps. It’s made by Omega, has the same basic design, and evokes the imagery of the space race.

Alternatively, consider the popularity of the Tesla toy car for kids. It’s not anything like the real Tesla in terms of functionality but is still a cool electric vehicle, especially if you already have a Tesla.

benjamin-lee commented on The Hacker's Diet fourmilab.ch/hackdiet/... · Posted by u/jarbus

benjamin-lee · 4 years ago

I followed this plan and lost over 100 lbs (45 kg) in about a year and have kept it off for the last five years. Even more impressively, I did it while eating the very same junk food that made me fat, albeit in different quantities. I tried every diet you could imagine before reading the book and failed every single time. As a programmer, the approach Walker uses just clicks with me. If you're overweight and have an engineering mindset, it's absolutely worth the read. It changed my life.

I still follow the plan to this day, albeit with more sophisticated logging. I use a combinatorial optimizer web app that tells me what to eat every day so that it completely takes the element of choice (and ability to screw up) out of the equation. I've developed it for the last five years and am hoping to release it as a product and/or open source eventually. If anyone's interested, shoot me an email (link on my website) and I'll share access.

benjamin-lee commented on Nim 1.6.2 nim-lang.org/blog/2021/12... · Posted by u/kindaAnIdiot

rakoo · 4 years ago

Thank you for the link but it doesn't address the issue I have. It's not about types, or about the compiler being "unsure". It's about me, as a developer, reading code someone else wrote, not knowing directly what package a call is from. I need to leave my current context to have the answer.

I can do `mypackage.mymethod` but it will only be in my own code, because it's not the convention

benjamin-lee · 4 years ago

Ah that makes sense. I agree with you; I’m not a huge fan of trying to infer where the types came from myself either when reading code on GitHub since it doesn’t have the inference that my IDE does.

benjamin-lee commented on Nim 1.6.2 nim-lang.org/blog/2021/12... · Posted by u/kindaAnIdiot

rakoo · 4 years ago

My personal blocker is that identifiers are all imported globally by convention, so when you see that there is a call to a method called "get", you have to get to the top of the file or mouse over the call to see what lib it is from. A "get" from the http lib is not the same as a "get" from the kv store lib.

benjamin-lee · 4 years ago

There is some logic as to why that is. Here [1] is an explanation for why it makes sense but the tldr is that you don't want to be manually importing functions such as `$` and `+`. In languages like Python, those are defined as methods on the object being imported (e.g. `.__str__()`) so they come along for free. Not so in Nim. If there's a conflict (same name, same signature), the compiler will warn you but it's extremely rare.

[1] https://narimiran.github.io/2019/07/01/nim-import.html