Readit News logoReadit News
alexchamberlain commented on Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?   twitter.com/karpathy/stat... · Posted by u/JnBrymn
lyu07282 · 2 months ago
The way modern tokenizers are constructed is by iteratively doing frequency analysis of arbitrary length sequences using a large corpus. So what you suggested is already the norm, tokens aren't n-grams. Words and any sequence really that is common enough will already be one token only, the less frequent a sequence is the more tokens it needs. That's the Byte-pair encoding algorithm:

https://en.wikipedia.org/wiki/Byte-pair_encoding

It's also not lossy compression at all, it's lossless compression if anything, unlike what some people have claimed here.

Shocking comments here, what happened to HN? People are so clueless it reads like reddit wtf

alexchamberlain · 2 months ago
Thanks, that's really interesting. Do they correct for spelling mistakes or internationalised spellings? For example, does `colour` and `color` end up in the same token stream?
alexchamberlain commented on Karpathy on DeepSeek-OCR paper: Are pixels better inputs to LLMs than text?   twitter.com/karpathy/stat... · Posted by u/JnBrymn
alexchamberlain · 2 months ago
I'm probably one of the least educated software engineers on LLMs, so apologies if this is a very naive question. Has anyone done any research into just using words as the tokens rather than (if I understand it correctly) 2-3 characters? I understand there would be limitations with this approach, but maybe the models would be smaller overall?
alexchamberlain commented on Show HN: SQLite Online – 11 years of solo development, 11K daily users   sqliteonline.com/... · Posted by u/sqliteonline
knowitnone3 · 2 months ago
As a rabbit, I'd rather carrots...

Isn't this comment a form of Brit defaultism?

alexchamberlain · 2 months ago
My point was more about the original comment is fine from the perspective of an American, but for the rest of the world, it doesn’t really matter if it is USD or rubles - it’s still a foreign transaction. I appreciate that for a large percentage of the world, consumers can probably do an approximation of the USD conversion in their head, and not a rubles one, and therefore, USD may be more friendly. That being said, the sales page has already got the approximation in USD anyway, which would be enough for me.
alexchamberlain commented on Show HN: SQLite Online – 11 years of solo development, 11K daily users   sqliteonline.com/... · Posted by u/sqliteonline
gregsadetsky · 2 months ago
hey, not to give you "armchair" advice, but I feel like a tool that's existed for 11 years and has 11k daily users is a super serious achievement.

I'd vicariously love for you to be able to make some/more revenue with this!

+1 on @redox99's comment that charging in rubles is most probably confusing, and that a flat $10 usd/month would be easier. I also would think that renewal should actually be on by default, not off - if people want the service and/or to support you, having auto renewal off is more of a hassle for them (the customers who want to pay you!) as they'd have to have to... re-enable their service? every 30-90 days?

and another point I wanted to bring up is that it feels to me like a small text-based advertisement from ethicalads.io (the folks behind the ads on Read the Docs sites) or carbonads.net (btw I have no affiliation to either) could definitely... bring in some not-bad revenue pretty much immediately?

again, huge congrats on your project and I truly wish you'll be able to find some path to monetization. cheers!

alexchamberlain · 2 months ago
> charging in rubles is most probably confusing, and that a flat $10 usd/month would be easier

As a Brit, I'd rather GBP...

Isn't this comment a form of US defaultism?

alexchamberlain commented on EU age verification app not planning desktop support   github.com/eu-digital-ide... · Posted by u/sschueller
bonoboTP · 3 months ago
Another recent news about mandated app use: Ryanair now (from November) requires using their app for the boarding pass, no more printouts from the desktop. Also, they refuse to show the QR code for the boarding pass in a mobile browser via the website, you must use their app.

https://www.msn.com/en-ie/travel/news/ryanair-s-new-check-in...

alexchamberlain · 3 months ago
But what if my battery runs out?
alexchamberlain commented on Git: Introduce Rust and announce it will become mandatory in the build system   lore.kernel.org/git/20250... · Posted by u/WhyNotHugo
monkeyelite · 3 months ago
How does this help me as a user of git?
alexchamberlain · 3 months ago
The developers of git will continue to be motivated to contribute to it. (This isn’t specific to Rust, but rather the technical choices of OSS probably aren’t generally putting the user at the top of the priority list.)
alexchamberlain commented on Grapevine canes can be converted into plastic-like material that will decompose   sdstate.edu/news/2025/08/... · Posted by u/westurner
AlecSchueler · 3 months ago
CO2 usage, I get you, but what about the plastic waste?
alexchamberlain · 3 months ago
This is the problem though, right? It’s not one league table of environmental goodness - there are tradeoffs that as an educated consumer are impose to navigate.
alexchamberlain commented on Grapevine canes can be converted into plastic-like material that will decompose   sdstate.edu/news/2025/08/... · Posted by u/westurner
mcv · 3 months ago
Yeah, I still don't understand why brown paper bags aren't more standard for everything.

I do see some manufacturers reducing plastic, fortunately. For example, my box of tea bags used to come wrapped in plastic, and now it suddenly doesn't, and I'm wondering why it ever needed plastic. But there's still so much stuff that comes wrapped in plastic, and often multiple layers of it.

Just ban it. There are excellent alternatives.

alexchamberlain · 3 months ago
I think banning plastic completely in packaging is a much harder ask, as whether it is needed is rather nuanced (if I understand it correctly). For example, it's perfectly possible to deliver cucumbers to an end customer without them being shrinkwrapped. However, to deliver enough cucumbers to enough customers for a supermarket scale, I understand from several documentaries that plastic is still required in that case. (For those outside the UK, plastic covered cucumber is the social barometer for plastic packaging.) Banning plastic bags was easy and simple, and our laws don't tend to deal with nuance very well...
alexchamberlain commented on Grapevine canes can be converted into plastic-like material that will decompose   sdstate.edu/news/2025/08/... · Posted by u/westurner
alexchamberlain · 3 months ago
The UK banned single use plastic bags at major supermarkets. We all moaned about it for a few minutes, forgot our reusable bags a couple of times and then got on with it. Even the small plastic bags you put fruit or pastries in are now gone in a few super markets - initially, they replaced them with transparent paper-based windowed bags, but then I think people realised you really don't need to see inside the bag, and brown paper bags are back.
alexchamberlain commented on Where's the shovelware? Why AI coding claims don't add up   mikelovesrobots.substack.... · Posted by u/dbalatero
com2kid · 3 months ago
Multiple things can be true at the same time:

1. LLMs do not increase general developer productivity by 10x across the board for general purpose tasks selected at random.

2. LLMs dramatically increases productivity for a limited subset of tasks

3. LLMs can be automated to do busy work and although they may take longer in terms of clock time than a human, the work is effectively done in the background.

LLMs can get me up to speed on new APIs and libraries far faster than I can myself, a gigantic speedup. If I need to write a small bit of glue code in a language I do not know, LLMs not only save me time, but they make it so I don't have to learn something that I'll likely never use again.

Fixing up existing large code bases? Productivity is at best a wash.

Setting up a scaffolding for a new website? LLMs are amazing at it.

Writing mocks for classes? LLMs know the details of using mock libraries really well and can get it done far faster than I can, especially since writing complex mocks is something I do a couple times a year and completely forget how to do in-between the rare times I am doing it.

Navigating a new code base? LLMs are ~70% great at this. If you've ever opened up an over-engineered WTF project, just finding where HTTP routes are defined at can be a problem. "Yo, Claude, where are the route endpoints in this project defined at? Where do the dependency injected functions for auth live?"

Right tool, right job. Stop using a hammer on nails.

alexchamberlain · 3 months ago
> Setting up a scaffolding for a new website? LLMs are amazing at it.

Weren't the code generators before this even better though? They generated consistent results and were dead quick at doing it.

u/alexchamberlain

KarmaCake day3942October 20, 2011
About
Team Leader @ Bloomberg LP

Python, TS, JS, Rust & C++

https://twitter.com/alexchamberlain

Opinions are my own.

[ my public key: https://keybase.io/alexchamberlain; my proof: https://keybase.io/alexchamberlain/sigs/rp1yI38mNDhgcabgbUrlX2gPgPWEKDPQ-l71yKpWmuk ]

meet.hn/city/51.4893335,-0.14405508452768728/London

View Original