scoresmoke (u/scoresmoke)

scoresmoke commented on Show HN: Improving search ranking with chess Elo scores zeroentropy.dev/blog/impr... · Posted by u/ghita_

npip99 · 2 months ago

In our case training and inferencing the models takes days, calculating all of the ELOs take 1min haha. So we didn't need to optimize the calculation.

But, we did need to work on numeric stability!

I have our calculations here: - https://hackmd.io/@-Gjw1zWMSH6lMPRlziQFEw/B15B4Rsleg

tldr; wikipedia iterates on <e^elo>, but that can go to zero or infinity. Iterating on <elo> stays between -4 and 4 in all of our observed pairwise matrices, so it's very well-bounded.

scoresmoke · 2 months ago

I am working on post-training and evaluation tasks mostly, and I built Evalica as a convenient tool for my own use cases. The computation is fast enough to not bother the user, but the library does not stand in my way during the analysis.

scoresmoke commented on Show HN: Improving search ranking with chess Elo scores zeroentropy.dev/blog/impr... · Posted by u/ghita_

swyx · 2 months ago

would you consider JS bindings? should be easy to vibe code given what you have. bonus points if it runs in the browser (eg export the wasm binary). thank you!

scoresmoke · 2 months ago

I am thinking about this for a while and I think I’ll vibecode them. Not sure about WASM, though, as the underlying libraries should support it, too, and I am not sure about all of them at the same time.

scoresmoke commented on Show HN: Improving search ranking with chess Elo scores zeroentropy.dev/blog/impr... · Posted by u/ghita_

scoresmoke · 2 months ago

You might also consider a fast implementation of Elo and Bradley–Terry that I have been developing for some time: https://github.com/dustalov/evalica (Rust core, Python bindings, 100% test coverage, and nice API).

Posted by u/scoresmoke 4 months ago

JetBrains releases Mellum, an 'open' AI coding model techcrunch.com/2025/04/30...

scoresmoke commented on Pre-Trained Large Language Models Use Fourier Features for Addition (2024) arxiv.org/abs/2406.03445... · Posted by u/Kye

wongarsu · 7 months ago

Curious that they chose to use GPT-2-XL, given the age of that model. I guess they wanted a small model (1.5B) and started work on this quite a while ago. Today there is a decent selection of much more capable 1.5B models (Quen2-1.5B, DeepSeek-R1-Distill-Qwen-1.5B, TinySwallow 1.5B, Stella 1.5B, Qwen2.5-Math-1.5B, etc). But they are all derived from the Qwen series of models, which wasn't available when they started this research.

scoresmoke · 7 months ago

GPT-2 follows the very well-studied architecture of Transformer decoder, so the outcomes of this study might be applicable to the more complicated models.

scoresmoke commented on Rye: A Hassle-Free Python Experience rye.astral.sh/... · Posted by u/jcbhmr

scoresmoke · a year ago

Ruff and uv are both excellent tools, which are developed by a VC-backed company, Astral: https://astral.sh/about. I wonder what their pitch was.

scoresmoke commented on NumPy 2.0 numpy.org/devdocs/release... · Posted by u/scoresmoke

fbdab103 · a year ago

Any notable highlights for a consumer of Numpy who rarely interfaces directly with it? Most of my work is pandas+scipy, with occasionally dropping into the specific numpy algorithm when required.

I am much more of an "upgrade when there is a X.1" release kind of guy, so my hat off to those who will bravely be testing the version on my behalf.

scoresmoke · a year ago

The most important changes are deprecations of certain public APIs: https://numpy.org/devdocs/release/2.0.0-notes.html#deprecati...

One new interesting feature, though, is the support for string routines: https://numpy.org/devdocs/reference/routines.strings.html#mo...

Posted by u/scoresmoke a year ago

NumPy 2.0 numpy.org/devdocs/release...

Posted by u/scoresmoke 2 years ago

Cuped Explained statsig.com/blog/cuped...

Posted by u/scoresmoke 2 years ago

Hydra: Lessons from the Largest Darknet Market onlinelibrary.wiley.com/d...

u/scoresmoke

KarmaCake day225June 15, 2018View Original