Readit News logoReadit News
sinity commented on Dario Amodei calls OpenAI’s messaging around military deal ‘straight up lies’   techcrunch.com/2026/03/04... · Posted by u/SilverElfin
blueblisters · 12 days ago
Wow. Surprising to see open hostilities between the leaders of the big ai labs. The differences appear to not just be competitive but also ideological.

Edit: Also openly calling OpenAI employees "gullible" and "twitter morons" seems sub-optimal if you like that talent to work for you at some point.

Example - https://x.com/tszzl/status/2029334980481212820

sinity · 12 days ago
Twitter morons wasn't referring to OpenAI employees, I think.
sinity commented on GPT-4 is phenomenal at Code   github.com/anysphere/gpt-... · Posted by u/sualehasif
ChuckNorris89 · 3 years ago
I think my barber and my plumber have safer jobs from automation than average coders.
sinity · 3 years ago
> I think my barber and my plumber have safer jobs from automation than average coders.

Possible, but their 'advantage' is unlikely to last more than a few years.

sinity commented on GPT-4 is phenomenal at Code   github.com/anysphere/gpt-... · Posted by u/sualehasif
dougmwne · 3 years ago
This is actually an excellent point. Also why I think we will eventually all be back in the open office, even though I enjoy working in my pajamas.
sinity · 3 years ago
Possibly. See https://thezvi.substack.com/p/escape-velocity-from-bullshit-...

The scariest thing is that there are people who advocate for it. Because humans are dangerous, I guess, so it's better to preemptively enslave them.

Sample

> Social control in the sense of not wanting lots of unemployed and restless youths. Having a system where long term and steady work is required in order to "live a good life" implies control - you have to act right and follow the rules in order to keep a job, which is itself necessary in order to have enough food and other necessities.

> Those making this argument here I believe are also making an argument for an alternative where the productivity of society is more equally spread, without the need to make everyone work for it.

> I agree that it's a system of social control, but I don't think it's nefarious or bad. We really don't want to live in a society where 25-year-old men don't have meaningful work to do and roam the streets getting into trouble.

The argument also doesn't _really_ make sense - there's already socially accepted system of 'social control' which _directly_ keeps people following the rules. The law.

Also unclear why lack of work would cause young people to "roam the streets" instead of staying home and roaming the internets. As they're already increasingly doing in free time.

sinity commented on Europe's big tech bill is coming to fruition   technologyreview.com/2023... · Posted by u/DamnInteresting
einpoklum · 3 years ago
> The internet is about to get a lot safer

Uh-oh. When I hear that, I'm assuming that it will:

1. Going to be more tightly controlled

2. Going to be more strongly censored

3. Probably not going to be actually safer for people like me.

> This article is from The Technocrat

Is it now?... that does not bode well.

> If you use Google, Instagram, Wikipedia, or YouTube

Oh, you mean _that_ Internet. 4 sites which get a huge part of the traffic. Well, the first, second and fourth of these are quite unsafe: They surveil your activities for commercial manipulation purposes and also let the US (and maybe other) governments get some of that information.

As for Wikipedia, its editorial/censorship/moderation policies are variegated and complex, and while I'm not well-read about that, it does seem that they have at least some sort of a mainstream-politics bias.

> The DSA will require these companies to assess risks on their platforms, like the likelihood of illegal content

Lots of things can be illegal, especially in world states with more restrictive laws. That doesn't sound very safe.

> The DSA will require these companies to assess risks on their platforms, like ... election manipulation,

Ah, now we're getting somewhere. So this is formalizing the drumming-up-hysteria-about-Russia shenanigans we've seen in recent years. Once there were witches and gremlins and leprechauns who caused mischief, now it's those evil Russian hackers, which were sent by evil Putin, since why not, right? Just recently we read in the Twitter files how the twitter people were pressured by the US government to come up with supposed Russian meddling, and they were panicking since there wasn't any, so they had to cook something up.

> Perhaps most important, the DSA requires that companies significantly increase transparency

That's good, but about what?

> ... through reporting obligations for “terms of service”

Uh, that's not so interesting. Plus, they still get to have outrageous "terms of service". Those things shouldn't be enforceable anyway, it's not like you can seriously negotiate those terms.

> hate speech, misinformation, and violence.

And who decides which information is valid and which isn't? Also, what if governments engages in misinformation or violence, as they often do? I'm pretty sure it's going to be the "information we don't like", which is sometimes misinformation, and sometimes - not.

> You will be able to participate in content moderation decisions that companies make and formally contest them

Such platforms should probably just be recognized as semi-public so that commercial companies can't censor them without a court order.

, you're going to start noticing changes to content moderation, transparency, and safety features on those sites over the next six months.

sinity · 3 years ago
> And who decides which information is valid and which isn't?

https://www.youtube.com/watch?v=-gGLvg0n-uY

Hehe

> Who are you to decide what's misinformation anyway?

> That sounds like something misinformation terrorist would say.

...

> First, we'll censor any use related to social taboos. Then we'll censor anything we desire. If anyone complains, we'll accuse them of wanting to engage in and promote social taboos.

sinity commented on Europe's big tech bill is coming to fruition   technologyreview.com/2023... · Posted by u/DamnInteresting
whywhywhydude · 3 years ago
Europe’s modus operandi when it comes to tech- we can’t innovate, lets regulate and extract a few billion here and there.
sinity · 3 years ago
This thread is gold: https://twitter.com/punk6529/status/1509832349986562048

> I watched a panel on AI (machine learning) at a conference hosted by the European Commission.

> 9 people on the panel

> Everyone agreed that the USA was 100 miles ahead of EU in machine learning and China was 99 miles ahead

> In any case, everyone agreed that in the most important technology of the 21st century, the EU was not on the map.

> The last person on the panel was an entrepreneur.

> He noted that the EU had as many AI startups as Israel (a country 1/50th the size) and, btw, two thirds of those were in London that was heading out the door due to Brexit.

> So basically the EU had 1/3 the AI startups of Israel (this was a few years ago)

> So the panel discussion turned to "What should the EU do?"

> And the more or less unanimous conclusion (except for the entrepreneur) was "We are going to build on the success of GDPR and aim to be the REGULATORY LEADER of machine learning"

> I literally laughed out loud

> Being the "Regulatory Leader" is NOT A REAL THING.

> Imagine it is the early 20th century and imagine that cars were invented and that the USA and China were producing a lot of cars.

> The EU of today would say "Building cars looks hard, but we will be the leader in STOP SIGNs"

> This is defeatism, this is surrender, this is deciding to be a vassal state of the United States and China in the 21st century.

> The EU is already a Web 2 vassal to the US tech companies (none of its own, so it has to try to limit their power)

sinity commented on Facebook LLAMA is being openly distributed via torrents   github.com/facebookresear... · Posted by u/micro_charm
Name_Chawps · 3 years ago
Open sourcing is widely recognized to be a bad thing when it comes to AI existential risk. (For the same reason you don't want simple instructions for how to build bio weapons posted to the internet.)

Modern AI is pretty harmless though, so it doesn't matter yet.

sinity · 3 years ago
> Modern AI is pretty harmless though, so it doesn't matter yet.

Yes, that's why the only thing people flipping out about "safety" of making them public achieve is making public distrustful about AI safety.

sinity commented on Open source implementation for LLaMA-based ChatGPT   github.com/nebuly-ai/nebu... · Posted by u/georgehill
vivegi · 3 years ago
This obsession with locking up model weights behind a gate-keeping application form and calling it open source is weird. I don't know who the high priests are trying to fool.

If your model is really that good, unleash it into the open so that others can truly evaluate it-warts and all-and help improve it by identifying the flaws.

sinity · 3 years ago
> This obsession with locking up model weights behind a gate-keeping application form and calling it open source is weird. I don't know who the high priests are trying to fool.

When they don't do it, people scream at them (see Galactica)

"Journalists" react like this:

> On November 15 Meta unveiled a new large language model called Galactica, designed to assist scientists. But instead of landing with the big bang Meta hoped for, Galactica has died with a whimper after three days of intense criticism. Yesterday the company took down the public demo that it had encouraged everyone to try out.

> Meta’s misstep—and its hubris—show once again that Big Tech has a blind spot about the severe limitations of large language models. There is a large body of research that highlights the flaws of this technology, including its tendencies to reproduce prejudice and assert falsehoods as facts.

> However, Meta and other companies working on large language models, including Google, have failed to take it seriously.

Yann LeCunn confirmed this: https://twitter.com/pmarca/status/1631185701864865792

I wonder if they just leaked it onto 4chan themselves, lol.

sinity commented on Open source implementation for LLaMA-based ChatGPT   github.com/nebuly-ai/nebu... · Posted by u/georgehill
Taek · 3 years ago
Can't have a stable diffusion moment if you refuse to release the weights to the general public. Stable diffusion only got to where it is because 10,000 people with otherwise zero reputation were able to play around with the code and models.

LLaMA is still only available to the elite.

sinity · 3 years ago
It was released on 4chan recently :)

files_catbox_moe[slash]o8a7xw(dot)torrent

sinity commented on Jailbreak Chat: A collection of ChatGPT jailbreaks   jailbreakchat.com... · Posted by u/rafiste
sinity · 3 years ago
Well, https://gwern.net/scaling-hypothesis

Quote below:

Humans, one might say, are the cyanobacteria of AI: we constantly emit large amounts of structured data, which implicitly rely on logic, causality, object permanence, history—all of that good stuff. All of that is implicit and encoded into our writings and videos and ‘data exhaust’. A model learning to predict must learn to understand all of that to get the best performance; as it predicts the easy things which are mere statistical pattern-matching, what’s left are the hard things. AI critics often say that the long tail of scenarios for tasks like self-driving cars or natural language can only be solved by true generalization & reasoning; it follows then that if models solve the long tail, they must learn to generalize & reason.

Early on in training, a model learns the crudest levels: that some letters like ‘e’ are more frequent than others like ‘z’, that every 5 characters or so there is a space, and so on. It goes from predicted uniformly-distributed bytes to what looks like Base-60 encoding—alphanumeric gibberish.

As crude as this may be, it’s enough to make quite a bit of absolute progress: a random predictor needs 8 bits to ‘predict’ a byte/character, but just by at least matching letter and space frequencies, it can almost halve its error to around 5 bits. Because it is learning so much from every character, and because the learned frequencies are simple, it can happen so fast that if one is not logging samples frequently, one might not even observe the improvement.

As training progresses, the task becomes more difficult. Now it begins to learn what words actually exist and do not exist. It doesn’t know anything about meaning, but at least now when it’s asked to predict the second half of a word, it can actually do that to some degree, saving it a few more bits. This takes a while because any specific instance will show up only occasionally: a word may not appear in a dozen samples, and there are many thousands of words to learn. With some more work, it has learned that punctuation, pluralization, possessives are all things that exist. Put that together, and it may have progressed again, all the way down to 3–4 bits error per character! (While the progress is gratifyingly fast, it’s still all gibberish, though, makes no mistake: a sample may be spelled correctly, but it doesn’t make even a bit of sense.

But once a model has learned a good English vocabulary and correct formatting/spelling, what’s next? There’s not much juice left in predicting within-words. The next thing is picking up associations among words. What words tend to come first? What words ‘cluster’ and are often used nearby each other? Nautical terms tend to get used a lot with each other in sea stories, and likewise Bible passages, or American history Wikipedia article, and so on. If the word “Jefferson” is the last word, then “Washington” may not be far away, and it should hedge its bets on predicting that ‘W’ is the next character, and then if it shows up, go all-in on “ashington”. Such bag-of-words approaches still predict badly, but now we’re down to perhaps <3 bits per character.

What next? Does it stop there? Not if there is enough data and the earlier stuff like learning English vocab doesn’t hem the model in by using up its learning ability. Gradually, other words like “President” or “general” or “after” begin to show the model subtle correlations: “Jefferson was President after…” With many such passages, the word “after” begins to serve a use in predicting the next word, and then the use can be broadened. By this point, the loss is perhaps 2 bits: every additional 0.1 bit decrease comes at a steeper cost and takes more time. However, now the sentences have started to make sense. A sentence like “Jefferson was President after Washington” does in fact mean something (and if occasionally we sample “Washington was President after Jefferson”, well, what do you expect from such an un-converged model).

Jarring errors will immediately jostle us out of any illusion about the model’s understanding, and so training continues. (Around here, Markov chain & n-gram models start to fall behind; they can memorize increasingly large chunks of the training corpus, but they can’t solve increasingly critical syntactic tasks like balancing parentheses or quotes, much less start to ascend from syntax to semantics.

Now training is hard. Even subtler aspects of language must be modeled, such as keeping pronouns consistent. This is hard in part because the model’s errors are becoming rare, and because the relevant pieces of text are increasingly distant and ‘long-range’. As it makes progress, the absolute size of errors shrinks dramatically.

Consider the case of associating names with gender pronouns: the difference between “Janelle ate some ice cream, because he likes sweet things like ice cream” and “Janelle ate some ice cream, because she likes sweet things like ice cream” is one no human could fail to notice, and yet, it is a difference of a single letter. If we compared two models, one of which didn’t understand gender pronouns at all and guessed ‘he’/‘she’ purely at random, and one which understood them perfectly and always guessed ‘she’, the second model would attain a lower average error of barely <0.02 bits per character!

Nevertheless, as training continues, these problems and more, like imitating genres, get solved, and eventually at a loss of 1–2 (where a small char-RNN might converge on a small corpus like Shakespeare or some Project Gutenberg ebooks), we will finally get samples that sound human—at least, for a few sentences.

These final samples may convince us briefly, but, aside from issues like repetition loops, even with good samples, the errors accumulate: a sample will state that someone is “alive” and then 10 sentences later, use the word “dead”, or it will digress into an irrelevant argument instead of the expected next argument, or someone will do something physically improbable, or it may just continue for a while without seeming to get anywhere.

All of these errors are far less than <0.02 bits per character; we are now talking not hundredths of bits per characters but less than ten-thousandths.The pretraining thesis argues that this can go even further: we can compare this performance directly with humans doing the same objective task, who can achieve closer to 0.7 bits per character. What is in that missing >0.4?

Well—everything! Everything that the model misses. While just babbling random words was good enough at the beginning, at the end, it needs to be able to reason our way through the most difficult textual scenarios requiring causality or commonsense reasoning. Every error where the model predicts that ice cream put in a freezer will “melt” rather than “freeze”, every case where the model can’t keep straight whether a person is alive or dead, every time that the model chooses a word that doesn’t help build somehow towards the ultimate conclusion of an ‘essay’, every time that it lacks the theory of mind to compress novel scenes describing the Machiavellian scheming of a dozen individuals at dinner jockeying for power as they talk, every use of logic or abstraction or instructions or Q&A where the model is befuddled and needs more bits to cover up for its mistake where a human would think, understand, and predict.

For a language model, the truth is that which keeps on predicting well—because truth is one and error many. Each of these cognitive breakthroughs allows ever so slightly better prediction of a few relevant texts; nothing less than true understanding will suffice for ideal prediction.

If we trained a model which reached that loss of <0.7, which could predict text indistinguishable from a human, whether in a dialogue or quizzed about ice cream or being tested on SAT analogies or tutored in mathematics, if for every string the model did just as good a job of predicting the next character as you could do, how could we say that it doesn’t truly understand everything? (If nothing else, we could, by definition, replace humans in any kind of text-writing job!)

sinity · 3 years ago
... The pretraining thesis, while logically impeccable—how is a model supposed to solve all possible trick questions without understanding, just guessing?—never struck me as convincing, an argument admitting neither confutation nor conviction. It feels too much like a magic trick: “here’s some information theory, here’s a human benchmark, here’s how we can encode all tasks as a sequence prediction problem, hey presto—Intelligence!” There are lots of algorithms which are Turing-complete or ‘universal’ in some sense; there are lots of algorithms like AIXI which solve AI in some theoretical sense (Schmidhuber & company have many of these cute algorithms such as ‘the fastest possible algorithm for all problems’, with the minor catch of some constant factors which require computers bigger than the universe).

Why think pretraining or sequence modeling is not another one of them? Sure, if the model got a low enough loss, it’d have to be intelligent, but how could you prove that would happen in practice? (Training char-RNNs was fun, but they hadn’t exactly revolutionized deep learning.) It might require more text than exists, countless petabytes of data for all of those subtle factors like logical reasoning to represent enough training signal, amidst all the noise and distractors, to train a model. Or maybe your models are too small to do more than absorb the simple surface-level signals, and you would have to scale them 100 orders of magnitude for it to work, because the scaling curves didn’t cooperate. Or maybe your models are fundamentally broken, and stuff like abstraction require an entirely different architecture to work at all, and whatever you do, your current models will saturate at poor performance. Or it’ll train, but it’ll spend all its time trying to improve the surface-level modeling, absorbing more and more literal data and facts without ever ascending to the higher planes of cognition as planned. Or…

But apparently, it would’ve worked fine. Even RNNs probably would’ve worked—Transformers are nice, but they seem mostly be about efficiency. (Training large RNNs is much more expensive, and doing BPTT over multiple nodes is much harder engineering-wise.) It just required more compute & data than anyone was willing to risk on it until a few true-believers were able to get their hands on a few million dollars of compute.

GPT-2-1.5b had a cross-entropy WebText validation loss of ~3.3. GPT-3 halved that loss to ~1.73. For a hypothetical GPT-4, if the scaling curve continues for another 3 orders or so of compute (100–1000×) before crossing over and hitting harder diminishing returns , the cross-entropy loss will drop to ~1.24

If GPT-3 gained so much meta-learning and world knowledge by dropping its absolute loss ~50% when starting from GPT-2’s level, what capabilities would another ~30% improvement over GPT-3 gain? (Cutting the loss that much would still not reach human-level, as far as I can tell. ) What would a drop to ≤1, perhaps using wider context windows or recurrency, gain?

sinity commented on Jailbreak Chat: A collection of ChatGPT jailbreaks   jailbreakchat.com... · Posted by u/rafiste
EGreg · 3 years ago
Alright, I have to ask the people here who know about transformers.

What the ... seriously?

How is sentence completion able to generate thoughtful answers to questions? If it goes word by word, or sentence by sentence, how does it generate the structure you ask it (e.g. essay)? There must be something more than just completion. What do the 185 billion parameters encode?

it seems to me, as Stephen Wolfram says, something about our language in the first place, rather than what ChatGPT does.

sinity · 3 years ago
Well, https://gwern.net/scaling-hypothesis

Quote below:

Humans, one might say, are the cyanobacteria of AI: we constantly emit large amounts of structured data, which implicitly rely on logic, causality, object permanence, history—all of that good stuff. All of that is implicit and encoded into our writings and videos and ‘data exhaust’. A model learning to predict must learn to understand all of that to get the best performance; as it predicts the easy things which are mere statistical pattern-matching, what’s left are the hard things. AI critics often say that the long tail of scenarios for tasks like self-driving cars or natural language can only be solved by true generalization & reasoning; it follows then that if models solve the long tail, they must learn to generalize & reason.

Early on in training, a model learns the crudest levels: that some letters like ‘e’ are more frequent than others like ‘z’, that every 5 characters or so there is a space, and so on. It goes from predicted uniformly-distributed bytes to what looks like Base-60 encoding—alphanumeric gibberish.

As crude as this may be, it’s enough to make quite a bit of absolute progress: a random predictor needs 8 bits to ‘predict’ a byte/character, but just by at least matching letter and space frequencies, it can almost halve its error to around 5 bits. Because it is learning so much from every character, and because the learned frequencies are simple, it can happen so fast that if one is not logging samples frequently, one might not even observe the improvement.

As training progresses, the task becomes more difficult. Now it begins to learn what words actually exist and do not exist. It doesn’t know anything about meaning, but at least now when it’s asked to predict the second half of a word, it can actually do that to some degree, saving it a few more bits. This takes a while because any specific instance will show up only occasionally: a word may not appear in a dozen samples, and there are many thousands of words to learn. With some more work, it has learned that punctuation, pluralization, possessives are all things that exist. Put that together, and it may have progressed again, all the way down to 3–4 bits error per character! (While the progress is gratifyingly fast, it’s still all gibberish, though, makes no mistake: a sample may be spelled correctly, but it doesn’t make even a bit of sense.

But once a model has learned a good English vocabulary and correct formatting/spelling, what’s next? There’s not much juice left in predicting within-words. The next thing is picking up associations among words. What words tend to come first? What words ‘cluster’ and are often used nearby each other? Nautical terms tend to get used a lot with each other in sea stories, and likewise Bible passages, or American history Wikipedia article, and so on. If the word “Jefferson” is the last word, then “Washington” may not be far away, and it should hedge its bets on predicting that ‘W’ is the next character, and then if it shows up, go all-in on “ashington”. Such bag-of-words approaches still predict badly, but now we’re down to perhaps <3 bits per character.

What next? Does it stop there? Not if there is enough data and the earlier stuff like learning English vocab doesn’t hem the model in by using up its learning ability. Gradually, other words like “President” or “general” or “after” begin to show the model subtle correlations: “Jefferson was President after…” With many such passages, the word “after” begins to serve a use in predicting the next word, and then the use can be broadened. By this point, the loss is perhaps 2 bits: every additional 0.1 bit decrease comes at a steeper cost and takes more time. However, now the sentences have started to make sense. A sentence like “Jefferson was President after Washington” does in fact mean something (and if occasionally we sample “Washington was President after Jefferson”, well, what do you expect from such an un-converged model).

Jarring errors will immediately jostle us out of any illusion about the model’s understanding, and so training continues. (Around here, Markov chain & n-gram models start to fall behind; they can memorize increasingly large chunks of the training corpus, but they can’t solve increasingly critical syntactic tasks like balancing parentheses or quotes, much less start to ascend from syntax to semantics.

Now training is hard. Even subtler aspects of language must be modeled, such as keeping pronouns consistent. This is hard in part because the model’s errors are becoming rare, and because the relevant pieces of text are increasingly distant and ‘long-range’. As it makes progress, the absolute size of errors shrinks dramatically.

Consider the case of associating names with gender pronouns: the difference between “Janelle ate some ice cream, because he likes sweet things like ice cream” and “Janelle ate some ice cream, because she likes sweet things like ice cream” is one no human could fail to notice, and yet, it is a difference of a single letter. If we compared two models, one of which didn’t understand gender pronouns at all and guessed ‘he’/‘she’ purely at random, and one which understood them perfectly and always guessed ‘she’, the second model would attain a lower average error of barely <0.02 bits per character!

Nevertheless, as training continues, these problems and more, like imitating genres, get solved, and eventually at a loss of 1–2 (where a small char-RNN might converge on a small corpus like Shakespeare or some Project Gutenberg ebooks), we will finally get samples that sound human—at least, for a few sentences.

These final samples may convince us briefly, but, aside from issues like repetition loops, even with good samples, the errors accumulate: a sample will state that someone is “alive” and then 10 sentences later, use the word “dead”, or it will digress into an irrelevant argument instead of the expected next argument, or someone will do something physically improbable, or it may just continue for a while without seeming to get anywhere.

All of these errors are far less than <0.02 bits per character; we are now talking not hundredths of bits per characters but less than ten-thousandths.The pretraining thesis argues that this can go even further: we can compare this performance directly with humans doing the same objective task, who can achieve closer to 0.7 bits per character. What is in that missing >0.4?

Well—everything! Everything that the model misses. While just babbling random words was good enough at the beginning, at the end, it needs to be able to reason our way through the most difficult textual scenarios requiring causality or commonsense reasoning. Every error where the model predicts that ice cream put in a freezer will “melt” rather than “freeze”, every case where the model can’t keep straight whether a person is alive or dead, every time that the model chooses a word that doesn’t help build somehow towards the ultimate conclusion of an ‘essay’, every time that it lacks the theory of mind to compress novel scenes describing the Machiavellian scheming of a dozen individuals at dinner jockeying for power as they talk, every use of logic or abstraction or instructions or Q&A where the model is befuddled and needs more bits to cover up for its mistake where a human would think, understand, and predict.

For a language model, the truth is that which keeps on predicting well—because truth is one and error many. Each of these cognitive breakthroughs allows ever so slightly better prediction of a few relevant texts; nothing less than true understanding will suffice for ideal prediction.

If we trained a model which reached that loss of <0.7, which could predict text indistinguishable from a human, whether in a dialogue or quizzed about ice cream or being tested on SAT analogies or tutored in mathematics, if for every string the model did just as good a job of predicting the next character as you could do, how could we say that it doesn’t truly understand everything? (If nothing else, we could, by definition, replace humans in any kind of text-writing job!)

u/sinity

KarmaCake day131May 20, 2020View Original