mlb_hn (u/mlb_hn) - Readit News

mlb_hn commented on GCP Outage status.cloud.google.com/... · Posted by u/thanhhaimai

mlb_hn · 9 months ago

Our GCP is down

mlb_hn commented on WebSim, WorldSim and the Summer of Simulative AI latent.space/p/sim-ai... · Posted by u/swyx

mlb_hn · 2 years ago

nice overview of progress over time. are there quant metrics for the sim capabilities or is it mostly vibes?

mlb_hn commented on Ross Anderson has died alecmuffett.com/article/1... · Posted by u/mstef

mlb_hn · 2 years ago

he was good people

mlb_hn commented on I asked GPT-NeoX-20B a hundred arithmetic questions twitter.com/moyix/status/... · Posted by u/nandhinianand

gwern · 4 years ago

The big caveat here is that the inner monologue papers generally work with GPT-3-175b, LaMDA, or Gopher, all of which are much bigger than 20b, and they generally show phase transitions (https://old.reddit.com/r/mlscaling/comments/sjzvl0/d_instanc...) in the monologue capability: below a critical size, inner monologue doesn't work at all, performing worse than baseline even, no matter how they scale, and only past the critical size does inner monologue suddenly start working much better. So it's possible (has anyone checked?) that GPT-NeoX-20b just isn't large enough to do inner monologue.

mlb_hn · 4 years ago

yeah, that's a very big caveat - haven't checked neo 20b yet. I've had a hard time getting the AI21 models to use it and those are also pretty big so it's interesting why sometimes it works and sometimes it doesn't. (and Davinci > Codegen Davinci > Curie > J-6B). Fine tunes can also learn to do the inner monologue as well which is really cool - not sure how much is architecture vs. training parameters.

mlb_hn commented on I asked GPT-NeoX-20B a hundred arithmetic questions twitter.com/moyix/status/... · Posted by u/nandhinianand

moyix · 4 years ago

Yup, quite possible that this has something to do with it. There is other work showing that giving LMs a "scratchpad" for intermediate computations allows them to do much better not just at arithmetic but also things like predicting the output of some code: https://arxiv.org/abs/2112.00114

mlb_hn · 4 years ago

definitely. also works on text translation/comprehension like emojis! https://aidungeon.medium.com/introducing-ai-dungeon-translat.... For actual benchmarks, scratchpad improves GPT-Davinci WIC from 50% accuracy (chance) to nearly 70%.

I think the check and validate is a different sort of scratchpad but maybe not. Seems like at least 3 types - soe for pulling implicit info out of the network viz wic, sometimes for intermediary steps viz coding, sometimes for verification like here.

mlb_hn commented on I asked GPT-NeoX-20B a hundred arithmetic questions twitter.com/moyix/status/... · Posted by u/nandhinianand

moyix · 4 years ago

Hey! As the author of the gist, just wanted to clear up what seem to be a few misconceptions:

- This isn't GPT-3, it's the recently-released open-source and open-weights model from EleutherAI, GPT-NeoX-20B. GPT-3 is much larger (175 billion parameters vs NeoX's 20 billion).

- It's well-known that language models don't tend to be good at math by default (Gwern, among others, pointed this out back in June 2020). It seems likely that this is at least in part because of how these models currently tokenize their input (they don't represent numbers by their individual digits, but by tokens representing commonly-occurring character sequences): https://www.gwern.net/GPT-3#bpes . Someone also pointed me to this paper which looks at number representations (though it uses somewhat older models like BERT): https://arxiv.org/abs/1909.07940

- Despite the tokenization, it performs (IMO) surprisingly well at getting close to the true value, particularly for the start and end digits and the overall magnitude. You can see this by looking at the tokenization (indicated by brackets) of its guess vs the correct answer for 28531*8065 (I asked multiple times to get an idea of how consistent it is – it's not deterministic because I ran this with temperature = 0.1, which will use random sampling to get the most likely tokens):

  [What][ is][ 285][31][ *][ 80][65][?][\n][22][77][05][315]
                              Correct: [\n][23][010][25][15]
  [What][ is][ 285][31][ *][ 80][65][?][\n][22][95][01][115]
                              Correct: [\n][23][010][25][15]
  [What][ is][ 285][31][ *][ 80][65][?][\n][22][38][95][015]
                              Correct: [\n][23][010][25][15]
  [What][ is][ 285][31][ *][ 80][65][?][\n][22][99][25][015]
                              Correct: [\n][23][010][25][15]
  [What][ is][ 285][31][ *][ 80][65][?][\n][22][99][17][115]
                              Correct: [\n][23][010][25][15]

You can see that it manages to find things that are numerically close, even when no individual token is actually correct. And it compensates for different-length tokens, always picking tokens that end up with the correct total number of digits.

- Please don't use this as a calculator :) The goal in doing this was to figure out what it knows about arithmetic and see if I can understand what algorithms it might have invented for doing arithmetic, not to show that it's good or bad at math (we have calculators for that, they work fine).

mlb_hn · 4 years ago

I get the tokenization argument and it may influence it a bit, but I suspect the n-digit math issue has to do more with search the way it samples (in the bpe link gwern references some experiements I'd done with improving n-digit math by chunking using commas, http://gptprompts.wikidot.com/logic:math). I think since it samples left to right on the first pass, it's not able to predict well if things carry from right to left.

I think can mitigate the search issue a bit if you have the prompt double-check itself after the fact (e.g. https://towardsdatascience.com/1-1-3-wait-no-1-1-2-how-to-ha...). Works different depending on the size of the model tho.

mlb_hn commented on I asked GPT-NeoX-20B a hundred arithmetic questions twitter.com/moyix/status/... · Posted by u/nandhinianand

mlb_hn · 4 years ago

Couple things there where you can see if it improves with the prompt/formatting. E.g. with Davinci (and J a bit but didn't test too much) you can get bette results by:

  - Using few-shot examples of similar length to the targets (e.g. 10 digit math, use 10 digit few shots)

  - Chunking numbers with commas

  - Having it double check itself

and here it's not doing any of those things.

mlb_hn commented on Opinion: The secret gag orders must stop washingtonpost.com/opinio... · Posted by u/1cvmask

jrockway · 5 years ago

Good points. Secrecy was once used so that people wouldn't destroy the evidence the warrant sought. But now it's easy to tell the third party to retain the evidence; even if you delete it, they can still keep a copy. So there is no reason not to tell the target; they can't change the outcome, but they can begin the counter legal process to ensure their rights are preserved.

Sometimes I wonder if they want secrecy so that criminals don't know they're being investigated, and commit a crime that's easier to prosecute. If they just stop committing crimes, then there's no fancy press release saying how great the DA is or whatever.

mlb_hn · 5 years ago

An alternative of that is as Oppenheimer once said, sometimes things are secret because a man doesn't like to know what he's up to if he can avoid it