hyeonwho22 (u/hyeonwho22)

hyeonwho22 commented on Language models and linguistic theories beyond words nature.com/articles/s4225... · Posted by u/Anon84

og_kalu · 2 years ago

Paraphrasing and summarizing parts of this article, https://hedgehogreview.com/issues/markets-and-the-good/artic...

Some ~72 years ago in 1951, Claude Shannon released his Paper, "Prediction and Entropy of Printed English", an extremely fascinating read now.

It begins with a game. Claude pulls a book down from the shelf, concealing the title in the process. After selecting a passage at random, he challenges his wife, Mary to guess its contents letter by letter. The space between words will count as a twenty-seventh symbol in the set. If Mary fails to guess a letter correctly, Claude promises to supply the right one so that the game can continue.

In some cases, a corrected mistake allows her to fill in the remainder of the word; elsewhere a few letters unlock a phrase. All in all, she guesses 89 of 129 possible letters correctly—69 percent accuracy.

Discovery 1: It illustrated, in the first place, that a proficient speaker of a language possesses an “enormous” but implicit knowledge of the statistics of that language. Shannon would have us see that we make similar calculations regularly in everyday life—such as when we “fill in missing or incorrect letters in proof-reading” or “complete an unfinished phrase in conversation.” As we speak, read, and write, we are regularly engaged in predication games.

Discovery 2: Perhaps the most striking of all, Claude argues that that a complete text and the subsequent “reduced text” consisting of letters and dashes “actually…contain the same information” under certain conditions. How?? (Surely, the first line contains more information!).The answer depends on the peculiar notion about information that Shannon had hatched in his 1948 paper “A Mathematical Theory of Communication” (hereafter “MTC”), the founding charter of information theory.

He argues that transfer of a message's components, rather than its "meaning", should be the focus for the engineer. You ought to be agnostic about a message’s “meaning” (or “semantic aspects”). The message could be nonsense, and the engineer’s problem—to transfer its components faithfully—would be the same.

a highly predictable message contains less information than an unpredictable one. More information is at stake in (“villapleach, vollapluck”) than in (“Twinkle, twinkle”).

Does "Flinkle, fli- - - -" really contain less information than "Flinkle, flinkle" ?

Shannon concludes then that the complete text and the "reduced text" are equivalent in information content under certain conditions because predictable letters become redundant in information transfer.

Fueled by this, Claude then proposes an illuminating thought experiment: Imagine that Mary has a truly identical twin (call her “Martha”). If we supply Martha with the “reduced text,” she should be able to recreate the entirety of Chandler’s passage, since she possesses the same statistical knowledge of English as Mary. Martha would make Mary’s guesses in reverse.

Of course, Shannon admitted, there are no “mathematically identical twins” to be found, but and here's the reveal, “we do have mathematically identical computing machines.”

Those machines could be given a model for making informed predictions about letters, words, maybe larger phrases and messages. In one fell swoop, Shannon had demonstrated that language use has a statistical side, that languages are, in turn, predictable, and that computers too can play the prediction game.

hyeonwho22 · 2 years ago

There was a fun recent variant on this game using LLMs, asking GPT3 (3.5?) to encode text in a way that it will be able to decode the meaning. Some of the encodings are insane:

https://www.piratewires.com/p/compression-prompts-gpt-hidden...

hyeonwho22 commented on Software that supports your body should always respect your freedom fsf.org/blogs/community/s... · Posted by u/jlpcsl

leghifla · 2 years ago

In EU (and probably elsewhere), there are strict rules for the stability of power wheelchair. One such rule is "On a incline of x% (x chosen by the manufacturer), pushing for max speed from stop should not lift the front wheels"

To achieve that, the max acceleration must be quite low (software controlled), and the whole experience is sluggish, like trying to steer a car by pulling on rubber bands attached to the wheel.

From the moment I found a way to overcome this, I never went back. I know that I can hurt myself if I do something stupid, but I prefer this hypothetical risk instead of cursing 100 times a day because I cannot move how I want. It has been 10 years and I never got hurt.

I understand that such "high" risk device cannot be sold, but forbidding someone to change this is like inflicting a second handicap on him.

hyeonwho22 · 2 years ago

That is a very poor regulation. Why enforce wheel lift? What matters is that the chair doesn't tip over - that the center of gravity remains in the center of the four wheels.

hyeonwho22 commented on Public funds being swallowed up by scientific journals with dubious articles english.elpais.com/scienc... · Posted by u/belter

robwwilliams · 2 years ago

Sure, predatory publishing at its worst.

The best universities just want to know about the handful of important works (code, primary papers, reviews) you have contributed for annual evaluations. If you spend half your time in research you should be able to average one to two paper-equivalents per year. If publishing in top tier journals with small numbers of coauthors, then fewer is fine.

Large numbers of papers in Hindawi and MDPI makes me roll my eyes. Frontiers is in the grey zone with PLoS One—lots of serious good work, but spotty reviewing process.

The ugly truth—you have to read the papers to make an informed decision—and this applies to an even greater degree to papers in Nature, Science, and Cell as well. The reason “to an even greater degree” is because dubious science with very high impact is functionally worse than crap published in MDPI.

hyeonwho22 · 2 years ago

Hindawi is the worst. They solicit book chapters, pretend to be peer reviewed, but any peer review is cursory. You can tell because they have published books on homeopathy and on the theory of Chinese traditional medicine (qi).

hyeonwho22 commented on Joint Statement on AI Safety and Openness open.mozilla.org/letter/... · Posted by u/DerekBickerton

api · 2 years ago

What happens if someone develops a highly effective distributed training algorithm permitting a bunch of people with gaming PCs and fast broadband to train foundation models in a manner akin to Folding@Home?

If that happened open efforts could marshal tens or hundreds of thousands of GPUs.

Right now the barrier is that training requires too much synchronization bandwidth between compute nodes, but I’m not aware of any hard mathematical reason there couldn’t be an algorithm that does not have to sync so much. Even if it were less efficient this could be overcome by the sheer number of nodes you could marshal.

hyeonwho22 · 2 years ago

Is that a serious argument against an AI pause? There are potential scenarios in which regulating AI is challenging, so it isn't worth doing? Why don't we stop regulating nuclear material while we're at it?

In my mind the existential risks make regulation of large training runs worth it. Should distributed training runs become an issue we can figure out a way to inspect them, too.

To respond to the specific htpothetical, if that scenario happens it will presumably be by either a botnet, by a large group of wealthy hobbyists, or by a corporation or a nation state intent on circumventing the pause. Botnets have been dismantled before, and large groups of wealthy hoobyists tend to interested in self preservation (at least more so than individuals). Corporate and state actors defecting on international treaties can be penalized via standard mechanisms.

hyeonwho22 commented on Joint Statement on AI Safety and Openness open.mozilla.org/letter/... · Posted by u/DerekBickerton

visarga · 2 years ago

> Next in the pipeline is obviously some religious nut (who would not otherwise have the capability)

So you're saying that:

1. the religious nut would not find the same information on Google or in books

2. if someone is motivated enough to commit such an act, the ease of use of AI vs. web search would make a difference

Has anyone checked how many biology students can prepare dangerous substances with just what they learned in school?

Have we removed the sites disseminating dangerous information off the internet first? What is to stop someone from training a model on such data anytime they want?

hyeonwho22 · 2 years ago

1. The religious nut doesn't have the knowledge or the skill sets right now, but AI might enable them.

2. Accessibility of information makes a huge difference. Prior to 2020 people rarely stole Kias or catalytic converters. When knowledge of how to do this (and for catalytic converters, knowledge of their resale value) became available (i.e. trending on Tiktok), then thefts became frequent. The only barrier which disappeared from 2019 to 2021 was that the information became very easily accessible.

Your last two questions are not counterarguments, since AIs are already outperforming the median biology student, and obviously removing sites from the internet is not feasible. Easier to stop foundation model development than to censor the internet.

> What is to stop someone from training a model on such data anytime they want?

Present proposals are to limit GPU access and compute for training runs. Data centers are kind of like nuclear enrichment facilities in that they are hard to hide, require large numbers of dual-use components that are possible to regulate (centrifuges vs. GPUs), and they have large power requirements which make them show up on aerial imaging.

hyeonwho22 commented on Joint Statement on AI Safety and Openness open.mozilla.org/letter/... · Posted by u/DerekBickerton

tsurba · 2 years ago

Essentially you are advocating against information being more efficiently available. Come on.

It’s true we are fucked if bioweapons become easy to make, but that is not a question of ”AI”.

hyeonwho22 · 2 years ago

The only thing keeping bioweapons from being easy to make is information becoming easily available.

> Essentially you are advocating against information being more efficiently available.

Yes. Some kinds of information should be kept obscure, even if it is theoretically possible for an intelligent individual with access to the world's scientific literature to rediscover them. The really obvious case for this is in regards to the proliferation of WMDs.

For nuclear weapons information is not the barrier to manufacture: we can regulate and track uranium, and enrichment is thought to require industrial scale processes. But the precursors for biological weapons are unregulated and widely available, so we need to gatekeep the relevant skills and knowledge.

I'm sure you will agree with me that if access information on how to make a WMD becomes even a few order of magnitudes as accessible as information on how to steal a Kia or how to steal a catalytic converter, then we will have lost.

My argument is that a truly intelligent AI without safeguards or ethics would make bioweapons accessible to the public, and we would be fucked.

hyeonwho22 commented on Joint Statement on AI Safety and Openness open.mozilla.org/letter/... · Posted by u/DerekBickerton

CamperBob2 · 2 years ago

AI is no different than proliferating nuclear weapons

I mean, once the discussion goes THIS far off the rails of reality, where do we go from here?

hyeonwho22 · 2 years ago

Why not? It has already been shown that AI can be (mis)used to identify good candidates for chemical weapons. [1] Next in the pipeline is obviously some religious nut (who would not otherwise have the capability) using it to design a virus which doesn't set off alarms at the gene synthesis / custom construct companies, and then learning to transfect it.

More banally, state actors can already use open source models to efficiently create misinformation. It took what, 60,000 votes to swing the US election in 2016? Imagine what astroturfing can be done with 100x the labor thanks to LLMs.

[1] dx.doi.org/10.1038/s42256-022-00465-9

hyeonwho22 commented on Gmail, Yahoo announce new 2024 authentication requirements for bulk senders blog.google/products/gmai... · Posted by u/ilamont

Wojtkie · 2 years ago

Nextdoor is the absolute worst about this. Selecting unsubscribe only lets you unsubscribe from the "type" of email they're sending you. After unsubscribing 7 or 8 times I just reported the whole domain as spam and blocked it.

hyeonwho22 · 2 years ago

You have much more patience than I. After the second email type I deleted my account.

hyeonwho22 commented on 'It's quite soul-destroying': how we fell out of love with dating apps theguardian.com/lifeandst... · Posted by u/mindracer

warner25 · 2 years ago

This was my question elsewhere, but I'll put it here too because I'm really curious: Are most women really able to look at a guy in-person and know that he's 5'10 vs. 6'0? Or is it mostly abstract and only becomes an issue because height is explicitly listed in online profiles? I suspect the latter, and I think you're saying the same.

hyeonwho22 · 2 years ago

When dating in Korea women would guess my height to 2 cm higher than I am tall. Which was very eerie, because I correct my gait with 1.5 cm high insoles. I would estimate most women there have an accuracy of +/- 1 cm, and the worst were +/- 2.5 cm.

hyeonwho22 commented on 'It's quite soul-destroying': how we fell out of love with dating apps theguardian.com/lifeandst... · Posted by u/mindracer

red-iron-pine · 2 years ago

there is a huge DEMAND, but whether you can take that to market and make it viable is a different story.

actually kinda shocked China or Korea haven't made this a state thing in order to help boost marriages and childbirths

hyeonwho22 · 2 years ago

The problem in Korea at least isn't with meeting people (well, for 80% of the population), it is young couples feeling they can't meet the finances required to satisfy social norms, and thus deciding to delay the marriage-kids sequence indefinitely.

Several of my coworkers had been dating for 5+ years, but they were only making $50k annually (early career engineers). The socially expected family-sized condo costs $500k to $1M, and the young couple is expected to buy and furnish it before their wedding.