I'm guessing like a lot of reposts, it was deemed more relevant now, especially the GitHub Copilot-esque "from bs4 import BeautifulSoup as BS" prompt that the commenter alleged to have emitted plausible python: https://news.ycombinator.com/item?id=27447557
I will admit, I'm especially impressed that it caught the "as BS" part in its code generation -- then again, it's entirely possible that a lot of code examples do that, so maybe a more stunning result would be "as BSOUP" and see if it still generates plausible code
what i understand from a completely armchair perspective, is that this kind of context-holding is THE feature of this wave of ai models, which is what similarly allows it to generate paragraphs of prose seemingly out of whole cloth, tracking what it's talking about as it rambles along. the output seems (with occasional glaring issues) remarkably cohesive and realistic, given that it has nothing original to say. I picture it as taking sentence structure it's seen before, and narrative structure it's seen before, and topics it's seen before, mixing them all up, and putting them back together like multidimensional legos- from this perspective, processing the 'import as' lego seems to fit right in.
The whole concept is fascinating, and i'm immensely relieved that we seem to have such a competitive open-source implementation of this nightmare
Hey everyone, I just wanted to chime in and say that GPT-J is incredibly legit. Every aspect of GPT-J is production grade — no difference in process, quality, or results, compared to any other big name research lab.
I also want to apologize to Eleuther for giving them a hard time in the past. My earlier concerns were unjustified. To be completely honest, I was jealous they achieved everything I tried to achieve with my own open source research lab attempt. It took a long time to even recognize that jealousy in myself, let alone set it aside. Sorry.
The credit for this work goes almost entirely to kindiana, aka Ben Wang. Remember that name; you’ll be seeing a lot of it in the coming decade. It’s clear to me that whichever lab he ends up at (he’s an undergrad! Google let him slip away because he didn’t have a degree!), he’s gonna be changing the world. Don’t know what, don’t know how, know he will.
Every aspect of that codebase is immaculate. Most research code is not pretty; this looks carved out of marble and placed in a museum.
Eleuther also helped test the model thoroughly. Leo Gao (go follow him: https://twitter.com/nabla_theta?s=21) ran GPT-J through the gauntlet. He was the primary person behind The Pile, the training data that makes any of this possible. I can say with absolute certainty and no hesitation that there are no “gotchas” here.
Eleuther’s https://6b.eleuther.ai page looks wonderful too. It’s like a free OpenAI API playground that everyone can try. Keeping it running for months is no small achievement. (Set top_p to 1.0 and temp to 0.8; the defaults are pretty bad.)
Congratulations, and thank you everyone for all your hard work. The world is so much better for having access to this level of quality.
It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.
Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.
Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.
(I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)
And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.
(It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)
To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."
They give you access to 100.
On day one.
This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png
The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.
It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."
These are 96 CPU, 330GB RAM, Ubuntu servers.
That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.
This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.
God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.
At least, theoretically. I tried getting Tensorflow to do this, for over a year.
He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.
You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.
The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.
TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.
Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?
And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.
But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?
Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."
GPT-J is proof. What more can I say? No TFRC, no GPT-J.
The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan from the support team is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members. They all do.
(He was only ever late answering my emails one time. And it was because he was on vacation!)
If you happen to be an ambitious low-level hacker, I tried to make it easier for you to get your feet wet with JAX:
3. Scroll to the first JAX section; start reading, linearly, all the way to the bottom.
I'd like to think I'm a fairly capable hacker. And that notebook is how I learned JAX, from zero knowledge. Because I had zero knowledge, a week or two ago. Then I went from tutorial to tutorial, and copied down verbatim the things that I learned along the way.
(It's still somewhat amazing to me how effective it is to literally re-type what a tutorial is trying to teach you. I'd copy each sentence, then fix up the markdown, and in the process of fixing up the markdown, unconsciously osmose the idea that they were trying to get across.)
The best part was, I was connected remotely to a TPU VM the whole time I was writing that notebook, via a jupyter server running on the TPU. Because, like I said, you can run whatever the hell you want on TPUs now, so you can certainly run a jupyter server without breaking a sweat.
It's so friggin' nice to have a TPU repl. I know I'm just wall-of-text'ing at this point, but I've literally waited two years for this to come true. (There's a fellow from the TPU team who DMs with me occasionally. I call him TPU Jesus now, because it's nothing short of a miracle that they were able to launch all of this infrastructure -- imagine how much effort, from so many teams, were involved in making all of this possible.)
In exchange for this, I expect you to build an NES emulator powered by TPUs. Do as many crazy ideas as you can possibly think of. This point in history will never come again; it feels to me like watching the internet itself come alive back in the 80's, if only briefly.
It's like having a hundred raspberry pis to play with, except every raspberry pi is actually an ubuntu server with 96 CPUs and 330GB of RAM, and it happens to have 8 GPUs, along with a 100Gbit/s link to every other raspberry pi.
In Google’s defense, it’s not that Ben didn’t go to college it’s that he’s still a college student. This is less “experienced ML dev iced out over lack of degree” and more “college kid does something amazing and some people aren’t sold to hire him on the spot.”
That said, I wouldn’t feel bad for Ben. The world is his oyster.
Likely this was for a position for which yearly total comp is in the single digit millions. I'd guess they are highly sought after and quite many people apply. So such filtering, while sad, is something to be expected.
They super earned it. From day one, everyone showed up with a level of drive and determination I haven't seen elsewhere.
My name is on The Pile paper https://arxiv.org/abs/2101.00027 but I didn't do anything except make the books3 dataset. Stella, Leo, and everyone else did the hard work. You know, the work that's "actually useful to the scientific community." I didn't even help them hunt for typos, even though Stella asked me to. I was just like, sorry, no time, I have to focus on my own research.
Imagine saying "nah" to helping shape one of the most important open source AI research projects of the coming years. Training data quality is becoming more and more of a focus.
> But EleutherAI claims to have performed “extensive bias analysis” on The Pile and made “tough editorial decisions” to exclude datasets they felt were “unacceptably negatively biased” toward certain groups or views.
When I read this, I felt astonished that Eleuther was yet again trying to pose as the cool super-progressive AI lab. To my knowledge, no such thing ever happened. And I was involved with The Pile back when it was just me and Leo memeing in Discord DMs about how the world needed some quality training data once and for all.
Stella calmly explained to me that the US Congressional Record had been considered and rejected for inclusion in The Pile. I thought "Big deal, who the hell cares?" while saying "Okay, but I don't know what that is."
It’s a written record of all statements made in the US legislature. It was also somewhere between 1GB and 15GB, which would have been a significant portion of The Pile's total size.
I'm going to quote from her private DMs with me, which I haven't asked for permission to do. So this is technically another bad move by me. But she put it so perfectly, I was stunned:
> For half the history of the US, black people were slaves. For something like 75% of it, black people didn’t have the right to vote. A modern reader didn’t think there wasn’t a high proportion of extremely racist content, that would primarily be an inditement of modern people lol.
> The reason we first looked at it was that we included a similar document for the EU Parlement
It took me a few minutes to come to my senses, but I finally realized:
(a) this dataset likely contained a huge proportion of content that, politics aside, would be a Very Bad Idea to include in your ML models by default;
(b) Eleuther had just been trying to do good work this whole time
So you know, when you're in that situation, you can choose to either keep believing your own false ideas, or you can pay attention to empirical evidence and change your behavior. And empirically, I had been a massive asshole to everyone since pretty much the beginning. The only thing I helped with was books3 and arranging The Eye to get them some reliable hosting. (Shoutout to The Eye, by the way. Help 'em out if you can: https://the-eye.eu/public/AI/)
And there's my name, right there on the paper.
It's even worse than I described. I put the paper in jeopardy, because they were submitting it to a conference with strict anonymity rules. I had no idea about it (no one told me). I ended up so happy to see my name on a real arxiv paper that I tweeted out some self-congratulatory bullshit, and quote-tweeted something linking to The Pile. It was a few days into the anonymity period, but nonetheless, it was a violation of the anonymity rules. A lot of people saw that tweet, and the whole point of the rules is to ensure that people don't get unfair advantages by advertising on social media.
When they came to me in DMs apologizing profusely for not talking with me about it, and asking me to delete the tweet, I basically told them to go shove a spoon up their.... because I didn't agree to any rules, and the idea that The Pile should go radio silent for five months on social media struck me as completely crazy.
In hindsight, I was... just awful. So I mean, me posting this is like, the absolute minimum I can do. They've been the ones working for like a year to make all of this happen. Ended up feeling like a fraud, since everyone thinks highly of my ML work, and here I'd been nothing but problematic for a group of people who are just trying to ship good scientific work.
Fast forward to today, and the results are clear. Go help Eleuther: https://www.eleuther.ai/ They're cool, and you'll get a shot at changing the world. I'm not sure you even have to be particularly skilled; some of the most valuable work was done by people who just showed up and started doing things, e.g. making the website look a little nicer, or making a cool logo.
Seems good, starting with "Today I wake up and" it gave the following:
Today I wake up and eat my breakfast in silence. Alone. I am actually kinda happy with that. I listen to some podcasts and read the news before leaving to get a couple of documents together. I’m going to my mom’s today to have a meeting regarding my eviction notice. My mom still lives with my dad. I can’t ask him to do anything, though, I guess. I was always so used to having to ask for stuff, that I completely forgot I could get people to do what I want for me. My mom, for instance, is not as bad as my dad, and I actually don’t mind asking her to help me with everything. It makes me feel like she loves me. I know that this is a small thing, but it’s something that makes me happy.
I get to my mom’s apartment at 5:30. The place is very nice. There is a roof terrace, an elevator that goes up and down (where can you find something like that in an apartment building?!), and all rooms have their own AC units. All of the furniture is in good shape. The appliances were also all upgraded recently, so there is a full washer/dryer, a coff
> I wanted to write a language model, but all I got was
Got (including the `[...]` in the middle; that is, everything beyond this trailing colon is its result):
I wanted to write a language model, but all I got was a rabbit." --Richard Feynman
Though I might not have gotten the overall award for RPL, I certainly came in close for the more specific one of having the highest score in a core language spec proposal. And yes, there were more than two people (including me) that submitted those, but somehow a few of my friends and I pushed it over the top. So I wanted to give credit where credit is due, so I present this write up of my proposal and the (not so minor) additions that were needed to get it through (I may have exaggerated at times but it really isn't much).
[...]
Thanks everyone for voting, I'd really like to go to the hackercon in august so I can build a portable terminal that I can give to everyone in the congress... as a shameless self-promotion, I'll offer all attendees a guaranteed savings on the discount package
Yes, you do lose some points for a paper that you haven't quite finished, but that's what the RPL is all about - the unfinished.
Since I want to go to RPL I would like to know if someone can explain to me why I have lost any points. I have deleted my paper from the site
That's jibberish though. Is the bar for good "AI" still that it frequently writes sentences that, in isolation, could pass as coherent writing but string a couple together and it's a train wreck?
I tried giving it one of the prompts from copilot.github.com:
def strip_suffix(filename):
"""
Removes the suffix from a filename
"""
return filename[:filename.rfind('.')]
def test_strip_suffix():
It completed the test function like this:
# test stripping of a suffix
a = strip_suffix("I_am_file.txt")
assert a == "I_am_file"
Compare this with GitHub Copilot, which gives:
"""
Tests for the strip_suffix function
"""
assert strip_suffix('notes.txt') == 'notes'
assert strip_suffix('notes.txt.gz') == 'notes.txt'
GPT-J also gave a bunch of explanation on how to use the generated code:
So this goes in a file called my_file.py and when you run python3.4 -m my_file.strip_suffix you should get the following output
C:\Users\user\Desktop>python3.4 -m my_file.strip_suffix
'strip_suffix'
C:\Users\user\Desktop>
So the strip_suffix function which strips the suffix from a filename is called as part of a larger code that uses some functions from the file. To tell Python what functions are available in a file you just need to put #-comment characters on the first line and then the functions that you want to use, and the first line will be taken as the function documentation:
#-*- coding: utf-8 -*-
#comment
def func_one():
'''
Documentation for function one.
'''
print("This is the doc for func_one
Copilot isn't a good example because it was trained differently. A better comparison would be GPT3 vs GPT-J. Note that there's still a pretty big difference in parameter size.
The solution to the housing affordability problem is relatively simple. All we have to do is ...
stop building houses, and start building hospitals instead.
I’m certain that all sides of the political spectrum have had a passing thought about this. It is actually just a simple supply and demand problem. As it is now, the ratio of housing supply to demand is quite heavily skewed in favour of the housing demand side.
So what’s to be done about it?
There is one way in which we could achieve this.
Build many more hospitals and simply not need them.
It would be amazing if this can actually work.
A few things to consider:
Almost all of our current hospital spending goes towards tertiary care. Most primary care (general practice) and secondary care (ambulance, emergency etc) spending is on hospitals. We already pay about $5 in Medicare, so $5 of your hospital spend is already locked in.
We already spend about $9 billion on private hospitals for elective surgery, and around $2 billion on private hospital for elective imaging. There is virtually no need for these in terms of frequency or outcomes.
Ha, looks like our AI overlords have the whole housing crisis figured out. If there's no open apartments but open hospital beds, just injure some people.
Presumably everything after the ellipsis is GTP-J. Solving the housing crisis by building more hospitals and "not needing them" doesn't sound like the product of a human mind, jokes about how crazy many people's political views are aside.
I’m simultaneously surprised and unsurprised that announcements about Copilot get so much copyright discussion, while the GPT-like models don’t get nearly the same. Meanwhile, GPT-J is literally trained on pirated books (the books3 corpus is part of the Pile, which is the corpus this was trained on).
Charitably, it’s because licenses are already such a core discussion when github comes up.
Uncharitably, it’s because Copilot uses “our” community’s labor, while the GPTs use others’.
Part of the difference like the other commenter mentioned is that Copilot isn't open source while basically everything except the final model is for the GPT models.
The other aspect of it is in application. GPT-3 isn't particularly aimed at using the generated output in works. Rather it exists more as an experiment than anything else. Where the works are used they are generally non-commercial, not used in the final product, or are transient and don't actually stick around (i.e. AI dungeon).
This is compared to Copilot which, while in beta, is very much being marketed as a programming utility to help you write code. This comes with the implication that said code will be used in the final product. If GPT-3 was being used as a writing aid (not just brainstorming but actually writing), then I think we would be seeing a very different discussion around it.
Another consideration (which I'm not sure how true it is but I'm inclined to believe) is that programming text tends to have a smaller resolution at which it becomes "unique" or can be matched against a source as a copyright infringement. I may be wrong about this and copilot may just be poorly trained or designed by comparison but it seems far harder to identify outright copied text from GPT-3 (that isn't quoted/attributed). I'm sure examples exist but from my experience with these text generation tools it seems far harder to get into copyright violation territory.
---
Side note: If Copilot was working at an AST level rather than at a textual level I suspect it would have far less issues with copyright and would be more useful as a tool.
OpenAI is absolutely trying to commercialise GPT-3. But I agree the applications aren't so obviously "here is some text, you can put it in your product".
Part of the copilot discussion was about patents rather than copyright, which doesn't apply to text. Also the concern is less about the legal implications of Copilot itself but those for developers using its output, which are largely the same concerns why we frown on people copy-pasting code from StackOverflow or random Google results (other than quality).
The copyright problem with Copilot is not just the license of the corpus it was trained on, it's also that in many cases it reproduces source material verbatim with no indication that this is happening.
If GPT were to be used to produce fiction books, poetry or lyrics (not simply as an artistic experiment), I'm sure its output would undergo similar scrutiny from people in the respective industries. As it stands, for text it's more likely to see use to generate rough drafts for news articles and blog posts which would need a lot of editing to make useful. It might still reproduce writing styles or idioms but neither of these are subject to copyright in much the same way as lines of code.
Making the output of Copilot useful is more challenging, even if you could magically avoid the legal minefield its training data poses. The quality is hit or miss, but it can introduce subtle bugs and because it doesn't understand the code it generates, you now have to understand it for it, which can be difficult because you didn't even come up with it and there's no-one you can ask.
It’s simply because the output of copilot is indended to be included in commercial projects. That’s when the licensing issues actually matter.
The output of this isn’t really proposed for anything in particular right now. If someone turned this into a tool to help with creative writing or something the exact same issues would be raised.
One of these promises to justify the billable hours of half the industry for the next decade, the other threatens to eliminate them by the next decade. It really isn't more complicated than that
This is really impressive! After a few legit responses, I got this: with the input
The sun is shining today so I am thinking about going down to the lake and sail my boat. I shouldn't go by bike because
It generated:
it is too hot. I should go by car but I don't have a car. I should go by bus but I don't have a bus pass. I should go by foot but I don't have a pair of shoes. I should go by train but I don't have a ticket. I should go by plane but I don't have a passport. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I should go by boat but I don't have a boat. I
I tried this with a more technical prompt... and what can I say? I'm stunned at the incredible command of jargon here. It's even mostly correct with the actual technical details at a glance, especially that description of how radar works! I can't wait for those research paper generators to use stuff like this to make high caliber fakes.
Prompt:
> Synthetic aperture radar imaging is an effective method for
Results:
> Synthetic aperture radar imaging is an effective method for detecting the presence and/or quantification of target objects, including: detection and tracking of moving targets, which are known to be difficult with visible imaging and/or laser radar; monitoring the occupancy of an enclosed space, such as a room, by the presence of occupants therein; and detection of moving objects, including animals and pedestrians, as well as people. A radar device is composed of an antenna, a receiver, and a transmitter. The antenna radiates a beam of energy (e.g., electromagnetic radiation in the radiofrequency range) that is received by the receiver. The transmitter then transmits an interrogating signal to the target(s). The time delay between transmission and reception at the target(s) is used to measure the range of the target(s) from the radar device. All targets in the field of view of the radar device contribute to the signal received by the receiver. The transmitter amplitude and/or phase modulates the signal transmitted to the target(s) with the information to be encoded, typically at a much higher modulation rate than that used
Another prompt:
> Fringe fields in microstrip patch antennas
Results (shortened by me for brevity):
> Fringe fields in microstrip patch antennas and their RF performance
> 1. The microstrip patch antenna is a conventional patch antenna that is created by a thin layer of insulating material placed on top of a conductive substrate. The antenna is excited by a current in the conductive layer in the vicinity of the antenna.
Now I have a question. How does this GPT-J know what a patch antenna is? It's able to give an almost correct description of this rather technical device. It made the mistake of flipping conductive material and insulating substrate, but it's very close. Does this neural network have the "knowledge" of a patch antenna in the weights? How does that even work? It seems kind of incredible.
To me this really is not that impressive. Responding to a definition request of some concrete technical object is basically just a dictionary look up task. Responding to more abstract input especially having to do with the meaningful relationship of multiple objects in space and time is more dificult. For example in a sex related prompt, I find the AI confuses gender role, i.e. he was so wet, her nipples were pulsating, and it gets confused about other object relations.
I will admit, I'm especially impressed that it caught the "as BS" part in its code generation -- then again, it's entirely possible that a lot of code examples do that, so maybe a more stunning result would be "as BSOUP" and see if it still generates plausible code
what i understand from a completely armchair perspective, is that this kind of context-holding is THE feature of this wave of ai models, which is what similarly allows it to generate paragraphs of prose seemingly out of whole cloth, tracking what it's talking about as it rambles along. the output seems (with occasional glaring issues) remarkably cohesive and realistic, given that it has nothing original to say. I picture it as taking sentence structure it's seen before, and narrative structure it's seen before, and topics it's seen before, mixing them all up, and putting them back together like multidimensional legos- from this perspective, processing the 'import as' lego seems to fit right in.
The whole concept is fascinating, and i'm immensely relieved that we seem to have such a competitive open-source implementation of this nightmare
I also want to apologize to Eleuther for giving them a hard time in the past. My earlier concerns were unjustified. To be completely honest, I was jealous they achieved everything I tried to achieve with my own open source research lab attempt. It took a long time to even recognize that jealousy in myself, let alone set it aside. Sorry.
The credit for this work goes almost entirely to kindiana, aka Ben Wang. Remember that name; you’ll be seeing a lot of it in the coming decade. It’s clear to me that whichever lab he ends up at (he’s an undergrad! Google let him slip away because he didn’t have a degree!), he’s gonna be changing the world. Don’t know what, don’t know how, know he will.
Every aspect of that codebase is immaculate. Most research code is not pretty; this looks carved out of marble and placed in a museum.
Without Eleuther’s TPU resources, this work wouldn’t have happened. Tensorfork (my lab) didn’t get access to the TPU VM alpha. And TPU VMs were an absolute necessity here. (TPU VMs are a new thing; they’ve been in alpha since December, but only recently launched. If curious see https://github.com/shawwn/website/blob/master/jaxtpu.md and https://github.com/shawwn/website/blob/master/mlmind.md for why it’s the future of ML.)
Eleuther also helped test the model thoroughly. Leo Gao (go follow him: https://twitter.com/nabla_theta?s=21) ran GPT-J through the gauntlet. He was the primary person behind The Pile, the training data that makes any of this possible. I can say with absolute certainty and no hesitation that there are no “gotchas” here.
Eleuther’s https://6b.eleuther.ai page looks wonderful too. It’s like a free OpenAI API playground that everyone can try. Keeping it running for months is no small achievement. (Set top_p to 1.0 and temp to 0.8; the defaults are pretty bad.)
Congratulations, and thank you everyone for all your hard work. The world is so much better for having access to this level of quality.
I also tried the playground and was impressed that it was free! It must be a sizable chunk of money to run that.
It's thanks to TFRC. It's the most world-changing program I know of. It's why I go door to door like the proverbial religious fanatic, singing TFRC's praises, whether people want to listen or not.
Because for the first time in history, any capable ML hacker now has the resources they need to do something like this.
Imagine it. This is a legit OpenAI-style model inference API. It's now survived two HN front page floods.
(I saw it go down about an hour ago, so I was like "Nooo! Prove you're production grade! I believe in you!" and I think my anime-style energy must've brought it back up, since the API works fine now. Yep, it was all me. Keyboard goes clackclackclack, world changes, what can I say? Just another day at the ML office oh god this joke has gone on for like centuries too long.)
And it's all thanks to TFRC. I'm intentionally not linking anything about TFRC, because in typical google fashion, every single thing you can find online is the most corporate, soulless-looking "We try to help you do research at scale" generic boilerplate imaginable.
So I decided to write something about TFRC that wasn't: https://blog.gpt4.org/jaxtpu
(It was pretty hard to write a medieval fantasy-style TPU fanfic, but someone had to. Well, maybe no one had to. But I just couldn't let such a wonderful project go unnoticed, so I had to try as much stupid shit as possible to get the entire world to notice how goddamn cool it is.)
To put things into perspective, a TPU v2-8 is the "worst possible TPU you could get access to."
They give you access to 100.
On day one.
This is what originally hooked me in. My face, that first day in 2019 when TFRC's email showed up saying "You can use 100 v2-8's in us-central1-f!": https://i.imgur.com/EznLvlb.png
The idea of using 100 theoretically high-performance nodes of anything, in creative ways, greatly appealed to my gamedev background.
It wasn't till later that I discovered, to my delight, that these weren't "nodes of anything."
These are 96 CPU, 330GB RAM, Ubuntu servers.
That blog post I just linked to is running off of a TPU right now. Because it's literally just an ubuntu server.
This is like the world's best kept secret. It's so fucking incredible that I have no idea why people aren't beating down the doors, using every TPU that they can get their hands on, for as many harebrained ideas as possible.
God, I can't even list how much cool shit there is to discover. You'll find out that you get 100Gbit/s between two separate TPUs. In fact, I'm pretty sure it's even higher than this. That means you don't even need a TPU pod anymore.
At least, theoretically. I tried getting Tensorflow to do this, for over a year.
kindiana (Ben Wang), the guy who wrote this GPT-J codebase we're all talking about, casually proved that this was not merely theoretical: https://twitter.com/theshawwn/status/1406171487988498433
He tried to show me https://github.com/kingoflolz/swarm-jax/ once, long ago. I didn't understand at the time what I was looking at, or why it was such a big deal. But basically, when you put each GPT layer on a separate TPU, it means you can string together as many TPUs as you want, to make however large of a model you want.
You should be immediately skeptical of that claim. It shouldn't be obvious that the bandwidth is high enough to train a GPT-3 sized model in any reasonable time frame. It's still not obvious to me. But at this point, I've been amazed by so many things related to TPUs, JAX, and TFRC, that I feel like I'm dancing around in willy wonka's factory while the door's wide open. The oompa loompas are singing about "that's just what the world will do, oompa-loompa they'll ignore you" while I keep trying to get everybody to stop what they're doing and step into the factory.
The more people using TPUs, the more google is going to build TPUs. They can fill three small countries entirely with buildings devoted to TPUs. The more people want these things, the more we'll all have.
Because I think Google's gonna utterly annihilate Facebook in ML mindshare wars: https://blog.gpt4.org/mlmind
TPU VMs just launched a month ago. No one realizes yet that JAX is the React of ML.
Facebook left themselves wide open by betting on GPUs. GPUs fucking suck at large-scale ML training. Why the hell would you pay $1M when you can get the same thing for orders of magnitude less?
And no one's noticed that TPUs don't suck anymore. Forget everything you've ever heard about them. JAX on TPU VMs changes everything. In five years, you'll all look like you've been writing websites in assembly.
But hey, I'm just a fanatic TPU zealot. It's better to just write me off and keep betting on that reliable GPU pipeline. After all, everyone has millions of VC dollars to pour into the cloud furnace, right?
TFRC changed my life. I tried to do some "research" https://www.docdroid.net/faDq8Bu/swarm-training-v01a-pdf back when Tensorflow's horrible problems were your only option on TPUs.
Nowadays, you can think of JAX as "approximately every single thing you could possibly hope for."
GPT-J is proof. What more can I say? No TFRC, no GPT-J.
The world is nuts for not noticing how impactful TFRC has been. Especially TFRC support. Jonathan from the support team is just ... such a wonderful person. I was blown away at how much he cares about taking care of new TFRC members. They all do.
(He was only ever late answering my emails one time. And it was because he was on vacation!)
If you happen to be an ambitious low-level hacker, I tried to make it easier for you to get your feet wet with JAX:
1. Head to https://github.com/shawwn/jaxnotes/blob/master/notebooks/001...
2. Click "Open in Collaboratory"
3. Scroll to the first JAX section; start reading, linearly, all the way to the bottom.
I'd like to think I'm a fairly capable hacker. And that notebook is how I learned JAX, from zero knowledge. Because I had zero knowledge, a week or two ago. Then I went from tutorial to tutorial, and copied down verbatim the things that I learned along the way.
(It's still somewhat amazing to me how effective it is to literally re-type what a tutorial is trying to teach you. I'd copy each sentence, then fix up the markdown, and in the process of fixing up the markdown, unconsciously osmose the idea that they were trying to get across.)
The best part was, I was connected remotely to a TPU VM the whole time I was writing that notebook, via a jupyter server running on the TPU. Because, like I said, you can run whatever the hell you want on TPUs now, so you can certainly run a jupyter server without breaking a sweat.
It's so friggin' nice to have a TPU repl. I know I'm just wall-of-text'ing at this point, but I've literally waited two years for this to come true. (There's a fellow from the TPU team who DMs with me occasionally. I call him TPU Jesus now, because it's nothing short of a miracle that they were able to launch all of this infrastructure -- imagine how much effort, from so many teams, were involved in making all of this possible.)
Anyway. Go read https://github.com/shawwn/website/blob/master/mlmind.md to get hyped, then read https://github.com/shawwn/website/blob/master/jaxtpu.md to get started, and then read https://github.com/shawwn/jaxnotes/blob/master/notebooks/001... to get effective, and you'll have all my knowledge.
In exchange for this, I expect you to build an NES emulator powered by TPUs. Do as many crazy ideas as you can possibly think of. This point in history will never come again; it feels to me like watching the internet itself come alive back in the 80's, if only briefly.
It's like having a hundred raspberry pis to play with, except every raspberry pi is actually an ubuntu server with 96 CPUs and 330GB of RAM, and it happens to have 8 GPUs, along with a 100Gbit/s link to every other raspberry pi.
If this is the really the only reason he wasn't hired, that's ridiculous.
I have a degree in a field totally unrelated to computer science and I've been a developer for 20+ years. Very "senior" at the moment.
That said, I wouldn’t feel bad for Ben. The world is his oyster.
My name is on The Pile paper https://arxiv.org/abs/2101.00027 but I didn't do anything except make the books3 dataset. Stella, Leo, and everyone else did the hard work. You know, the work that's "actually useful to the scientific community." I didn't even help them hunt for typos, even though Stella asked me to. I was just like, sorry, no time, I have to focus on my own research.
Imagine saying "nah" to helping shape one of the most important open source AI research projects of the coming years. Training data quality is becoming more and more of a focus.
Lemme tell you a quick story.
When https://venturebeat.com/2021/06/09/eleutherai-claims-new-nlp... come out, this quote caught my eye:
> But EleutherAI claims to have performed “extensive bias analysis” on The Pile and made “tough editorial decisions” to exclude datasets they felt were “unacceptably negatively biased” toward certain groups or views.
When I read this, I felt astonished that Eleuther was yet again trying to pose as the cool super-progressive AI lab. To my knowledge, no such thing ever happened. And I was involved with The Pile back when it was just me and Leo memeing in Discord DMs about how the world needed some quality training data once and for all.
I went to Stella in DMs (you should follow her too! https://twitter.com/BlancheMinerva/status/139408950872390042...) and was like, what the hell? I don't understand how this could possibly be true. What are these supposed "tough editorial decisions"?
Stella calmly explained to me that the US Congressional Record had been considered and rejected for inclusion in The Pile. I thought "Big deal, who the hell cares?" while saying "Okay, but I don't know what that is."
It’s a written record of all statements made in the US legislature. It was also somewhere between 1GB and 15GB, which would have been a significant portion of The Pile's total size.
I'm going to quote from her private DMs with me, which I haven't asked for permission to do. So this is technically another bad move by me. But she put it so perfectly, I was stunned:
> For half the history of the US, black people were slaves. For something like 75% of it, black people didn’t have the right to vote. A modern reader didn’t think there wasn’t a high proportion of extremely racist content, that would primarily be an inditement of modern people lol.
> The reason we first looked at it was that we included a similar document for the EU Parlement
It took me a few minutes to come to my senses, but I finally realized:
(a) this dataset likely contained a huge proportion of content that, politics aside, would be a Very Bad Idea to include in your ML models by default;
(b) Eleuther had just been trying to do good work this whole time
So you know, when you're in that situation, you can choose to either keep believing your own false ideas, or you can pay attention to empirical evidence and change your behavior. And empirically, I had been a massive asshole to everyone since pretty much the beginning. The only thing I helped with was books3 and arranging The Eye to get them some reliable hosting. (Shoutout to The Eye, by the way. Help 'em out if you can: https://the-eye.eu/public/AI/)
And there's my name, right there on the paper.
It's even worse than I described. I put the paper in jeopardy, because they were submitting it to a conference with strict anonymity rules. I had no idea about it (no one told me). I ended up so happy to see my name on a real arxiv paper that I tweeted out some self-congratulatory bullshit, and quote-tweeted something linking to The Pile. It was a few days into the anonymity period, but nonetheless, it was a violation of the anonymity rules. A lot of people saw that tweet, and the whole point of the rules is to ensure that people don't get unfair advantages by advertising on social media.
When they came to me in DMs apologizing profusely for not talking with me about it, and asking me to delete the tweet, I basically told them to go shove a spoon up their.... because I didn't agree to any rules, and the idea that The Pile should go radio silent for five months on social media struck me as completely crazy.
In hindsight, I was... just awful. So I mean, me posting this is like, the absolute minimum I can do. They've been the ones working for like a year to make all of this happen. Ended up feeling like a fraud, since everyone thinks highly of my ML work, and here I'd been nothing but problematic for a group of people who are just trying to ship good scientific work.
Fast forward to today, and the results are clear. Go help Eleuther: https://www.eleuther.ai/ They're cool, and you'll get a shot at changing the world. I'm not sure you even have to be particularly skilled; some of the most valuable work was done by people who just showed up and started doing things, e.g. making the website look a little nicer, or making a cool logo.
Today I wake up and eat my breakfast in silence. Alone. I am actually kinda happy with that. I listen to some podcasts and read the news before leaving to get a couple of documents together. I’m going to my mom’s today to have a meeting regarding my eviction notice. My mom still lives with my dad. I can’t ask him to do anything, though, I guess. I was always so used to having to ask for stuff, that I completely forgot I could get people to do what I want for me. My mom, for instance, is not as bad as my dad, and I actually don’t mind asking her to help me with everything. It makes me feel like she loves me. I know that this is a small thing, but it’s something that makes me happy.
I get to my mom’s apartment at 5:30. The place is very nice. There is a roof terrace, an elevator that goes up and down (where can you find something like that in an apartment building?!), and all rooms have their own AC units. All of the furniture is in good shape. The appliances were also all upgraded recently, so there is a full washer/dryer, a coff
> I wanted to write a language model, but all I got was
Got (including the `[...]` in the middle; that is, everything beyond this trailing colon is its result):
I wanted to write a language model, but all I got was a rabbit." --Richard Feynman
Though I might not have gotten the overall award for RPL, I certainly came in close for the more specific one of having the highest score in a core language spec proposal. And yes, there were more than two people (including me) that submitted those, but somehow a few of my friends and I pushed it over the top. So I wanted to give credit where credit is due, so I present this write up of my proposal and the (not so minor) additions that were needed to get it through (I may have exaggerated at times but it really isn't much).
[...]
Thanks everyone for voting, I'd really like to go to the hackercon in august so I can build a portable terminal that I can give to everyone in the congress... as a shameless self-promotion, I'll offer all attendees a guaranteed savings on the discount package
Yes, you do lose some points for a paper that you haven't quite finished, but that's what the RPL is all about - the unfinished.
Since I want to go to RPL I would like to know if someone can explain to me why I have lost any points. I have deleted my paper from the site
Deleted Comment
stop building houses, and start building hospitals instead.
I’m certain that all sides of the political spectrum have had a passing thought about this. It is actually just a simple supply and demand problem. As it is now, the ratio of housing supply to demand is quite heavily skewed in favour of the housing demand side.
So what’s to be done about it?
There is one way in which we could achieve this.
Build many more hospitals and simply not need them.
It would be amazing if this can actually work.
A few things to consider:
Almost all of our current hospital spending goes towards tertiary care. Most primary care (general practice) and secondary care (ambulance, emergency etc) spending is on hospitals. We already pay about $5 in Medicare, so $5 of your hospital spend is already locked in.
We already spend about $9 billion on private hospitals for elective surgery, and around $2 billion on private hospital for elective imaging. There is virtually no need for these in terms of frequency or outcomes.
Charitably, it’s because licenses are already such a core discussion when github comes up.
Uncharitably, it’s because Copilot uses “our” community’s labor, while the GPTs use others’.
The other aspect of it is in application. GPT-3 isn't particularly aimed at using the generated output in works. Rather it exists more as an experiment than anything else. Where the works are used they are generally non-commercial, not used in the final product, or are transient and don't actually stick around (i.e. AI dungeon).
This is compared to Copilot which, while in beta, is very much being marketed as a programming utility to help you write code. This comes with the implication that said code will be used in the final product. If GPT-3 was being used as a writing aid (not just brainstorming but actually writing), then I think we would be seeing a very different discussion around it.
Another consideration (which I'm not sure how true it is but I'm inclined to believe) is that programming text tends to have a smaller resolution at which it becomes "unique" or can be matched against a source as a copyright infringement. I may be wrong about this and copilot may just be poorly trained or designed by comparison but it seems far harder to identify outright copied text from GPT-3 (that isn't quoted/attributed). I'm sure examples exist but from my experience with these text generation tools it seems far harder to get into copyright violation territory.
---
Side note: If Copilot was working at an AST level rather than at a textual level I suspect it would have far less issues with copyright and would be more useful as a tool.
The copyright problem with Copilot is not just the license of the corpus it was trained on, it's also that in many cases it reproduces source material verbatim with no indication that this is happening.
If GPT were to be used to produce fiction books, poetry or lyrics (not simply as an artistic experiment), I'm sure its output would undergo similar scrutiny from people in the respective industries. As it stands, for text it's more likely to see use to generate rough drafts for news articles and blog posts which would need a lot of editing to make useful. It might still reproduce writing styles or idioms but neither of these are subject to copyright in much the same way as lines of code.
Making the output of Copilot useful is more challenging, even if you could magically avoid the legal minefield its training data poses. The quality is hit or miss, but it can introduce subtle bugs and because it doesn't understand the code it generates, you now have to understand it for it, which can be difficult because you didn't even come up with it and there's no-one you can ask.
The output of this isn’t really proposed for anything in particular right now. If someone turned this into a tool to help with creative writing or something the exact same issues would be raised.
Deleted Comment
The sun is shining today so I am thinking about going down to the lake and sail my boat. I shouldn't go by bike because
It generated:
Prompt:
> Synthetic aperture radar imaging is an effective method for
Results:
> Synthetic aperture radar imaging is an effective method for detecting the presence and/or quantification of target objects, including: detection and tracking of moving targets, which are known to be difficult with visible imaging and/or laser radar; monitoring the occupancy of an enclosed space, such as a room, by the presence of occupants therein; and detection of moving objects, including animals and pedestrians, as well as people. A radar device is composed of an antenna, a receiver, and a transmitter. The antenna radiates a beam of energy (e.g., electromagnetic radiation in the radiofrequency range) that is received by the receiver. The transmitter then transmits an interrogating signal to the target(s). The time delay between transmission and reception at the target(s) is used to measure the range of the target(s) from the radar device. All targets in the field of view of the radar device contribute to the signal received by the receiver. The transmitter amplitude and/or phase modulates the signal transmitted to the target(s) with the information to be encoded, typically at a much higher modulation rate than that used
Another prompt:
> Fringe fields in microstrip patch antennas
Results (shortened by me for brevity):
> Fringe fields in microstrip patch antennas and their RF performance
> 1. The microstrip patch antenna is a conventional patch antenna that is created by a thin layer of insulating material placed on top of a conductive substrate. The antenna is excited by a current in the conductive layer in the vicinity of the antenna.
Now I have a question. How does this GPT-J know what a patch antenna is? It's able to give an almost correct description of this rather technical device. It made the mistake of flipping conductive material and insulating substrate, but it's very close. Does this neural network have the "knowledge" of a patch antenna in the weights? How does that even work? It seems kind of incredible.
The plagiarism software arms race is about kick into high gear!
Dead Comment