routerl (u/routerl) - Readit News

routerl commented on A DOGE staffer appears to be posting DOGE work on his public GitHub twitter.com/SollenbergerR... · Posted by u/amarcheschi

routerl · a year ago

From the perspective of wanting to maintain the integrity of the American federal government, it seems like all this DOGE stuff (and the whole Trumpist movement in general) serves the purpose of a red team, in the cybersecurity sense; people with nebulous intent have gotten access to everything.

So now, if Americans care about the integrity of their government, there needs to be a blue team: how can this catastrophic level of access be dealt with, and how can it be safeguarded against in the future. Alas, I'm not seeing this perspective being enacted. The obvious security compromise is being allowed to stand and continue, usually on the basis that "separation of powers" and "checks and balances" are relied on to be effective; congress will stop this, or the courts will stop this. But we're watching these mechanisms fail.

So, what's the plan here? Where's the counter-offensive? We're watching a system being hacked, and I've yet to see anyone talk about a recovery plan, or a prevention plan.

routerl commented on The number line freaks me out (2016) mathwithbaddrawings.com/2... · Posted by u/mananaysiempre

routerl · a year ago

It seems to be an article about all those "harmless" lies we tell students.

The vast majority of people think mathematics is about numbers, when it is actually about relations, and numbers are just some of the entities whose relations mathematics studies.

Nobody is born with this misconception; we teach it, and test it, and thereby ingrain it in the minds of every student, most of whom will never study mathematics at a level that makes them go "wait, what?". The overwhelming majority of people never get to this level.

I suspect this is also why statistics feels so counterintuitive to so many people, including me. The Monty Hall problem is only a problem to those who are naive about probability, which is most people, because most of us don't learn any of this stuff early enough to form long lasting, correct instincts.

It's not fair to students to bake "harmless" lies into their early education, as a way to simplify the topic such that it becomes more easily teachable. We've only done this because teaching is hard, and thus expensive. Education is expensive, at every step. It's not fair or productive to build a gate around proper education that makes it available only to those who can afford it at the level where the early misconceptions get corrected. Even those people end up spending a lot of cognitive capital on all those "wait, what?" moments, when their cognitive capital would be better spent elsewhere.

routerl commented on Show HN: Mandarin Word Segmenter with Translation mandobot.netlify.app/... · Posted by u/routerl

mindvirus · a year ago

This is great. I'd love it for flashcard creation - paste in a block of text I'm reading and extract vocabulary from it.

routerl · a year ago

OP here, I'm adding a feature that will allow users to save specific words to lists, and export the lists in formats that can be imported to flashcard apps.

routerl commented on Show HN: Mandarin Word Segmenter with Translation mandobot.netlify.app/... · Posted by u/routerl

gs17 · a year ago

What are you using for machine translation? I'm surprised anything could mistranslate 气功师 like that.

routerl · a year ago

For anonymous users, I'm using OpenNMT, via Argos. Logged in users get DeepL translations, which correctly translates 气功师.

routerl commented on Show HN: Mandarin Word Segmenter with Translation mandobot.netlify.app/... · Posted by u/routerl

georgeplusplus · a year ago

Have you used the app Pleco?

That app has been invaluable as someone learning Chinese.

that app breaks down mandarin sentences into individual characters. I believe it’s made by a Taiwanese developer too.

I tried your app with a few sentences and it works really well!

routerl · a year ago

Thank you, and thanks for checking it out!

I use Pleco almost every day :)

routerl commented on Show HN: Mandarin Word Segmenter with Translation mandobot.netlify.app/... · Posted by u/routerl

carom · a year ago

Did you find the library jieba? That is what I am using for segmentation. It seems to work fine on simplified despite not advertising it.

routerl · a year ago

I did! Jieba is the first step in my segmentation pipeline. As far as I can tell, Jieba's default config tends to work better for simplified, but in my case the custom dictionary I feed it has significantly more traditional entries than simplified entries, especially for historical terms and slang.

routerl commented on Show HN: Mandarin Word Segmenter with Translation mandobot.netlify.app/... · Posted by u/routerl

rahimnathwani · a year ago

This is cool. If you haven't already, you might like to take a look at Du Chinese and The Chairman's Bao. They might provide ideas or inspiration.

Also the 'clip reader' feature in Pleco is decent.

Also, supporting simplified as well as traditional might increase your potential audience.

routerl · a year ago

It supports traditional and simplified, as well as pinyin and bopomofo :)

It's already possible to switch instantly between pinyin and bopomofo, and I'm working on letting users switch between simplified/traditional, but this is also a non-trivial problem. For now, the app will follow the user's lead: if you enter traditional text, it will return traditional text, and same goes for simplified.

routerl commented on Show HN: Mandarin Word Segmenter with Translation mandobot.netlify.app/... · Posted by u/routerl

greyman · a year ago

OP, thank you for your work, I will continue to watch it.

I tried to built something similar, but what I didn't discover and think is crucial is the proper FE: yes, word segmenting is useful, but if I have to click on each word to see its meaning, for how I learn Chinese by reading texts, I still find Zhongwen Chrome extension to be more useful, since I see English meaning quicker, just by hover cursor over the word.

In my project, I was trying to display English translation under each Chinese word, which would I think require AI to determine the correct translation, since one cannot just put CC-CEDIT entry there.

P.S: I dont know how you built your dictionary, it translated 气功师 as "Aerosolist", which I am not sure what is exactly, but this should be actually two words, not one - correct segmentation and translation is 气功师, "qigong master".

routerl · a year ago

Thanks for the kind words, and the bug report!

The (awful and incorrect) translation you've pointed out comes from the segmenter being too greedy, not finding the (non-existent) word in any dictionary, and therefore dispatching the word to be machine translated, without context. This is the final fallback in the segmentation pipeline, to avoid displaying nothing at all, and my priority right now is making the segmentation pipeline more robust so this rarely (or never) happens, since it sometimes produces hilariously bad results!

routerl commented on Nontraditional Red Teams zachholman.com/posts/red-... · Posted by u/feross

roughly · a year ago

Tech has a long history of declaring things useless and then gradually bootstrapping them back up as we learn all the lessons that led to those things existing.

routerl · a year ago

'"Tradition" is a set of solutions for which we have forgotten the problems. Throw away the solution and you get the problem back.'

This is, by far, my most conservative opinion. Credit to Donald Kingsbury for the quote.

Honorable mention re: the same problem, "dogfooding"[0] is gone from the software industry, which is why users often feel like they're getting suckered by the companies they patronize; the decision makers, who don't themselves use the product, absolutely see the users as suckers.

[0] https://en.wikipedia.org/wiki/Eating_your_own_dog_food