Readit News logoReadit News
int_19h commented on It’s not wrong that "\u{1F926}\u{1F3FC}\u200D\u2642\uFE0F".length == 7 (2019)   hsivonen.fi/string-length... · Posted by u/program
Ekaros · 2 days ago
Like say Ä it might be either Ä a single byte, or combination of ¨ and A. Both are now supported, but if you can have more than two such things going in one thing it makes a mess.
int_19h · 2 days ago
That quickly explodes if you need more than one diacritic per letter (e.g. Vietnamese often has two, and then there's https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...).
int_19h commented on It’s not wrong that "\u{1F926}\u{1F3FC}\u200D\u2642\uFE0F".length == 7 (2019)   hsivonen.fi/string-length... · Posted by u/program
zwnow · 2 days ago
I actually want string length. Just give me the length of a word. My human brain wants a human way to think about problems. While programming I never think about bytes.
int_19h · 2 days ago
Humans speak many different languages. Not all of them are English, and not all of them have writing systems which make it meaningful to talk about "string length" without disambiguating further.
int_19h commented on It’s not wrong that "\u{1F926}\u{1F3FC}\u200D\u2642\uFE0F".length == 7 (2019)   hsivonen.fi/string-length... · Posted by u/program
guappa · 2 days ago
Eh in macedonian they have some letters that in russian are just 2 separate letters
int_19h · 2 days ago
That's not really any different than the distinction (or lack thereof) between "ae" and "æ". For that matter, in Russian there is a letter "ы" which is historically a digraph consisting of two separately letters "ъ" and "i" that just happens to be treated as a single letter for so long that few people would even recognize it as a digraph. This kind of stuff is all language-specific, which is why for Worlde etc you always need to be aware of the context, and this context will then unambiguously decide what constitutes a single letter.
int_19h commented on It’s not wrong that "\u{1F926}\u{1F3FC}\u200D\u2642\uFE0F".length == 7 (2019)   hsivonen.fi/string-length... · Posted by u/program
PapaPalpatine · 2 days ago
I don’t know about advanced Unicode features… but I use them all the time as a backend developer to validate data input.

I want to make sure that the password is between a given number of characters. Same with phone numbers, email addresses, etc.

This seems to have always been known as the length of the string.

This thread sounds like a bunch of scientists trying to make a simple concept a lot harder to understand.

int_19h · 2 days ago
If you restrict the input to ASCII, then it makes sense to talk about "string length" in this manner. But we're not talking about Unicode strings at all then.

If you do allow Unicode characters in whatever it is you're validating, then your approach is almost certainly wrong for some valid input.

int_19h commented on The US Department of Agriculture Bans Support for Renewables   insideclimatenews.org/new... · Posted by u/mooreds
jmyeet · 2 days ago
Oh I think the wealthiest will be the first with heads on pikes when it all comes tumbling down.

The wealthiest people aren’t descendants of Julius Caesar, the Medicis, the Hapsburgs, Rollo (who is an ancestor to every European monarch), the Astors, the Vanderbilts, the Morgans, etc.

Some of these are moderately wealthy now (eg the Rothchilds) but they don’t dominate the world’s wealth.

Part of this is that can be hard to maintain a lineage over time. Also, foolish fail sons will squander family wealth.

But some wealthy people just go the French Revolution way.

I don’t believe the Gateses, Musks, Bezoses, etc will survive the upheaval, violence and revolution they are making inevitable.

int_19h · 2 days ago
The wealthiest live in gated communities with private security, and are the ones who can scramble to their private jets when SHTF. The ones whose heads mostly end up on the spikes are the richest proles (i.e. "top middle class") - those who have enough money that it is obvious they're each, yet not enough to buy actual security or to be truly isolated from the rest of society if they so wish.
int_19h commented on The issue of anti-cheat on Linux (2024)   tulach.cc/the-issue-of-an... · Posted by u/todsacerdoti
int_19h · 2 days ago
TL;DR: the issue of anti-cheat on Linux is that Linux actually gives the user full control of their OS, which precludes all even remotely effective anti-cheat mechanisms by design.
int_19h commented on Why are anime catgirls blocking my access to the Linux kernel?   lock.cmpxchg8b.com/anubis... · Posted by u/taviso
WesolyKubeczek · 4 days ago
I disagree with the post author in their premise that things like Anubis are easy to bypass if you craft your bot well enough and throw the compute at it.

Thing is, the actual lived experience of webmasters tells that the bots that scrape the internets for LLMs are nothing like crafted software. They are more like your neighborhood shit-for-brain meth junkies competing with one another who makes more robberies in a day, no matter the profit.

Those bots are extremely stupid. They are worse than script kiddies’ exploit searching software. They keep banging the pages without regard to how often, if ever, they change. If they were 1/10th like many scraping companies’ software, they wouldn’t be a problem in the first place.

Since these bots are so dumb, anything that is going to slow them down or stop them in their tracks is a good thing. Short of drone strikes on data centers or accidents involving owners of those companies that provide networks of botware and residential proxies for LLM companies, it seems fairly effective, doesn’t it?

int_19h · 4 days ago
It is the way it is because there are easy pickings to be made even with this low effort, but the more sites adopt such measures, the less stupid your average bot will be.
int_19h commented on Visualizing GPT-OSS-20B embeddings   melonmars.github.io/Laten... · Posted by u/melonmars
numpad0 · 4 days ago
Is this handling Unicode correctly? Seems like a lot of even Latin alphabets are getting mangled.
int_19h · 4 days ago
It looks like it's not handling UTF-8 at all and displaying it as if it were Latin-1
int_19h commented on Analysis of the GFW's Unconditional Port 443 Block on August 20, 2025   gfw.report/blog/gfw_uncon... · Posted by u/kotri
pas · 4 days ago
There's nothing inevitable about this. Civil society needs to organize, coordinate, and spend money on PR about this.

Right now liberal people mostly sit back and wait for things to get better, it's not enough. (Also going and walking up and down is not really effective.)

int_19h · 4 days ago
It's inevitable because we've seen time and again that all it takes to get the public opinion behind this kind of thing is to talk about how it is needed to catch pedophiles and terrorists.

And if you talk back? Why, you must be a pedophile or a terrorist, otherwise why would you have anything to hide?

It's gotten bad enough that people here on HN - Hacker News! - non-ironically make more or less this argument.

int_19h commented on Analysis of the GFW's Unconditional Port 443 Block on August 20, 2025   gfw.report/blog/gfw_uncon... · Posted by u/kotri
hackernewsdhsu · 5 days ago
That's what's so great about LoRA. Decentralized txt msgs, ultra cheap radios people run at home or wherever. $10-35USD ON AMAZON. Least txts get through.
int_19h · 4 days ago
That would be LoRa. LoRA is a different thing.

u/int_19h

KarmaCake day25396September 20, 2012View Original