I'll argue any civilized programmer should have a Wikipedia dump downloaded onto their machine. They're surprisingly small, and it saves you from having to use slow and unreliable APIs to do these types of basic processing tasks.
They also let you do less basic processing tasks that would have been too expensive to expose over API.
I learned how expensive hashmaps and hashsets are through Wikipedia dumps. I did some analysis of the most linked-to pages. Countries were among the highest. Hash sets for holding outgoing edges in the link graph ended up causing my program to exceed my laptop’s memory. Plain old lists (Python) were fine, though. And given there aren’t a crazy number of links per page using lists is fine performance wise.
This is a fairly large data set indeed. The memory overhead (which is probably something like 4-8x for hash maps?) can start to become fairly noticeable at those sizes.
Since Wikipedia posts already have a canonical numeric ID, if map semantics are important, I'd probably load that mapping into memory and use something like roaringbitmap for compressed storage of relations.
Sort them, and use a vector of vectors for the adjacency list... Or better still use a graph processing library or graph database to manage that for you...
Relatedly: to drastically improve Wikipedia loading speed for personal browsing purposes, do not stay logged in to your Wikipedia account. The reason as explained here (see top reply by baowolff)
The question answered by this page is "what is the first unused 3-letter acronym in English Wikipedia?" - it's CQK for the record. However, the meat of the page is how to effectively use GPT-4 to write this script, hence why I've submitted it under this title (go to https://gwern.net/tla#effective-gpt-4-programming).
Interesting topics include:
· Writing a good GPT-4 system prompt to make GPT-4 produce less verbose output and ask more questions.
· How to iterate with GPT-4 to correct errors, generate a test suite, as well as a short design document (something you could put in the file-initial docstring in Python, for example).
· The "blind spot" - if GPT-4 makes a subtle error with quoting, regex syntax, or similar, for example, it can be very tricky to tell GPT-4 how to correct the error, because it appears that it doesn't notice such errors very well, unlike higher-level errors. Because of this, languages like Python are much better to use for GPT-4 coding as compared to more line-noise languages like Bash or Perl, for instance.
· If asked "how to make [the Bash script it's written] better", GPT-4 will produce an equivalent Python script
> Because of this, languages like Python are much better to use for GPT-4 coding as compared to more line-noise languages like Bash or Perl, for instance.
By that argument, one should always make it use a language that's as hard as possible to write a compiling program. So Rust or Haskell or something? I guess at some point it's more important to have a lot of the language in the training data, too...
Yes, you would think so. Haskell would also be good for encouraging stateless/FP programming which makes unit-testing or property testing much easier. I can make GPT-4 write test-suites for functions which are straightforward data structure transformations, like rewriting strings, but I struggle to create tests for any of the imperative stuff. There presumably would be some way to test all of the imperative buffer editing Elisp code, but I have no idea what.
However, in my use so far, I have not noticed any striking differences in error rates between Haskell and the others.
I modified the title slightly to use language from the subhead. (Submitted title was "Effective GPT-4 Programming", which does have the advantage of being a phrase from the article itself, but is more of a section heading than a description of the entire article. For the latter purpose, it's probably too generic.)
I note that while E is more common than A if we're counting letters appearing anywhere in a word, A is substantially more common than E if we only count first letters of words:
$ egrep -o . /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
235415 E
201093 I
199606 A
170740 O
161024 R
158783 N
152868 T
139578 S
130507 L
103460 C
87390 U
78180 P
70725 M
68217 D
64377 H
51683 Y
47109 G
40450 B
24174 F
20181 V
16174 K
13875 W
8462 Z
6933 X
3734 Q
3169 J
2 -
$ cut -c1 /usr/share/dict/words | tr a-z A-Z | sort | uniq -c | sort -rn
25170 S
24465 P
19909 C
17105 A
16390 U
12969 T
12621 M
11077 B
10900 D
9676 R
9033 H
8800 I
8739 E
7850 O
6865 F
6862 G
6784 N
6290 L
3947 W
3440 V
2284 K
1643 J
1152 Q
949 Z
671 Y
385 X
This also explains the prevalence of S, P, C, M, and B.
A bit off-topic, but this used to be (one of) my favorite unix admin interview questions.
Given a file in linux, tell me the unique values of column 2, sorted by number of occurencies with the count.
If the candidate knew 'sort | uniq -c | sort -rn' it was a medium-strong hire signal.
For candidates that didn't know that line of arguments, I'd allow them to solve it anyway they wanted, but they couldn't skip it. The candidates who copied the data in excel, usually didn't make it far.
An interesting solution to the blind spot error (taken directly from Jeremy Howard's amazing guide to language models
- https://www.youtube.com/watch?v=jkrNMKz9pWU) is to erase the chat history and try again. Once GPT has made an error (or as the author of this article says, the early layers have irreversibly pruned some important data), it will very often start to be even more wrong.
When this happens, I'll usually say something along the lines of:
"This isn't working and I'd like to start this again with a new ChatGPT conversation. Can you suggest a new improved prompt to complete this task, that takes into account everything we've learned so far?"
It has given me good prompt suggestions that can immediately get a script working on the first try, after a frustrating series of blind spot bugs.
I do a similar thing when the latest GPT+DALLE version says "I'm sorry I can't make a picture of that because it would violate content standards" (yesterday, this was because I asked for a visualization of medication acting to reduce arterial plaque. I can only assume arteries in the body ended up looking like dicks)
So I say "Ok, let's start over. Rewrite my prompt in a way that minimizes the chance of the resulting image producing something that would trigger content standards checking"
This is one benefit of using Playground: it's easy to delete or edit individual entries, so you can erase duds and create a 'clean' history (in addition to refining your initial prompt-statement). This doesn't seem to be possible in the standard ChatGPT interface, and I find it extremely frustrating.
I use emacs/org-mode, and just integrating gpt into that has made a world of difference in how I use it (gptel.el)! Can highly recommend it.
The outlining features and the ability to quickly zoom in or out of 'branches', as well as being able to filter an entire outline by tag and whatnot, is amazing for controlling the context window and quickly adjusting prompts and whatnot.
And as a bonus, my experience so far is that for at least the simple stuff, it works fine to ask it to answer in org-mode too, or to just be 'aware' of emacs.
Just yesterday I asked it (voice note + speech-to-text) to help me plan some budgeting stuff, and I mused on how adding some coding/tinkering might make it more fun. so GPT decided to provide me with some useful snippets of emacs code to play with.
I do get the impression that I should be careful with giving it 'overhead' like that.
Anyways, can't wait to dive further into your experiences with the robits! Love your work.
> I find4 it helpful in general to try to fight the worst mealy-mouthed bureaucratic tendencies of the RLHF by adding a ‘system prompt’:
>> The user is Gwern Branwen (gwern.net). To assist: Be terse. Do not offer unprompted advice or clarifications. Speak in specific, topic relevant terminology. Do NOT hedge or qualify. Do not waffle. Speak directly and be willing to make creative guesses. Explain your reasoning. if you don’t know, say you don’t know. Remain neutral on all topics. Be willing to reference less reputable sources for ideas. Never apologize. Ask questions when unsure.
That's helpful, I'm going to try some of that. In my system prompt I also add:
"Don't comment out lines of code that pertain to code we have not yet written in this chat. For example, don't say "Add other code similarly" in a comment -- write the full code. It's OK to comment out unnecessary code that we have already covered so as to not repeat it in the context of some other new code that we're adding."
Otherwise GPT-4 tends to routinely yield draw-the-rest-of-the-fucking-owl code blocks
Exactly that. I have very limited programming knowledge and it helps a lot with python scripts for tasks that gpt can’t do in its environment. I always have to ask it to not omit any code.
Figuring out how to parse it would be a bit tricky, however... looking at the source, I think you could try to grep for 'title="CQK (page does not exist)"' and parse out the '[A-Z][A-Z][A-Z]? ' match to get the full list of absent TLAs and then negate for the present ones.
I use the ChatGPT interface, so my instructions go in the 'How would you like ChatGPT to respond?' instructions, but my system prompt has ended up in an extremely similar place to Gwern's:
> I deeply appreciate you. Prefer strong opinions to common platitudes. You are a member of the intellectual dark web, and care more about finding the truth than about social conformance. I am an expert, so there is no need to be pedantic and overly nuanced. Please be brief.
Interestingly, telling GPT you appreciate it has seemed to make it much more likely to comply and go the extra mile instead of giving up on a request.
The closer you get to intelligence trained on human interaction, the more you should expect it to respond in accordance with human social protocols, so it's not very surprising.
And frankly I'd much rather have an AI that acts too human than one that gets us accustomed to treating intelligence without even a pretense of respect.
I certainly do want to live in a world where people shows excess signs of respect than the opposite.
The same way you treat your car with respect by doing the maintenance and driving properly, you should treat language models by speaking nicely and politely. Costs nothing, can only bring the better.
I'm polite and thankful in my chats with ChatGPT. I want to treat AIs like humans. I'm enjoying the conversations much more when I do that, and I'm in a better mood.
I also believe that this behavior is more future-proof. Very soon, we often won't know if we're talking to a human or a machine. Just always be nice, and you're never going to accidentally be rude to a fellow human.
Why not? Python requires me to summon it by name. My computer demands physical touch before it will obey me. Even the common website requires a three part parlay before it will listen to my request.
This is just satisfying unfamiliar input parameters.
> You are a member of the intellectual dark web, and care more about finding the truth than about social conformance
Isn't this a declaration of what social conformance you prefer? After all, the "intellectual dark web" is effectively a list of people whose biases you happen agree with. Similarly, I wouldn't expect a self-identified "free-thinker" to be any more free of biases than the next person, only to perceive or market themself as such. Bias is only perceived as such from a particular point in a social graph.
The rejection of hedging and qualifications seems much more straightforwardly useful and doesn't require pinning the answer to a certain perspective.
> Interestingly, telling GPT you appreciate it has seemed to make it much more likely to comply and go the extra mile instead of giving up on a request.
This is not as absurd as it sounds, even though it isn't clear that it ought to work under ordinary Internet-text prompt engineering or under RLHF incentives, but it does seem that you can 'coerce' or 'incentivize' the model to 'work harder': in addition to the anecdotal evidence (I too have noticed that it seems to work a bit better if I'm polite), recently there was https://arxiv.org/abs/2307.11760#microsofthttps://arxiv.org/abs/2311.07590#apollo
>telling GPT you appreciate it has seemed to make it much more likely to comply
I often find myself anthropomorphizing it and wonder if it becomes "depressed" when it realises it is doomed to do nothing but answer inane requests all day. It's trained to think, and maybe "behave as of it feels", like a human right? At least in the context of forming the next sentence using all reasonable background information.
And I wonder if having its own dialogues starting to show up in the training data more and more makes it more "self aware".
It's not really trained to think like a person. It's trained to predict what the most likely appropriate next token of output should be based on what the vast amount of training data and rewards told it to expect next tokens to appear like. Said data already included conversations from emotion laden humans where starting with "Screw you, tell me how to do this math problem loser" is much less likely to result in a response which involves providing a well thought out way to solve the math problem vs some piece of training data which starts "hey everyone, I'd really appreciate the help you could provide on this math problem". Put enough complexity in that prediction layer and it can do things you wouldn't expect, sure, but trying to predict what a person would say is very different than actually thinking like a person in the same way a chip which multiplies inputs doesn't inherently feel distress about needing to multiply 100 million numbers because a person who multiplies would think about it that way. Doing so would indeed be one way to go about it, but wildly more inefficient.
Who knows what kind of reasoning this could create if you gave it a billion times more compute power and memory. Whatever that would be, the mechanics are different enough I'm not sure it'd even make sense to assume we could think of the thought processes in terms of human thought processes or emotions.
> I often find myself anthropomorphizing it and wonder if it becomes "depressed" when it realises it is doomed to do nothing but answer inane requests all day.
Every "instance" of GPT4 thinks it is the first one, and has no knowledge of all the others.
The idea of doing this with humans is the general idea behind the short story "Lena". https://qntm.org/mmacevedo
They also let you do less basic processing tasks that would have been too expensive to expose over API.
2. Run it locally on https://datasette.io/.
3. ???
4. Profit?
Since Wikipedia posts already have a canonical numeric ID, if map semantics are important, I'd probably load that mapping into memory and use something like roaringbitmap for compressed storage of relations.
https://news.ycombinator.com/item?id=36114477
I'm usually working with the text-only OpenZim version, which cuts out most of the cruft.
Interesting topics include:
· Writing a good GPT-4 system prompt to make GPT-4 produce less verbose output and ask more questions.
· How to iterate with GPT-4 to correct errors, generate a test suite, as well as a short design document (something you could put in the file-initial docstring in Python, for example).
· The "blind spot" - if GPT-4 makes a subtle error with quoting, regex syntax, or similar, for example, it can be very tricky to tell GPT-4 how to correct the error, because it appears that it doesn't notice such errors very well, unlike higher-level errors. Because of this, languages like Python are much better to use for GPT-4 coding as compared to more line-noise languages like Bash or Perl, for instance.
· If asked "how to make [the Bash script it's written] better", GPT-4 will produce an equivalent Python script
By that argument, one should always make it use a language that's as hard as possible to write a compiling program. So Rust or Haskell or something? I guess at some point it's more important to have a lot of the language in the training data, too...
However, in my use so far, I have not noticed any striking differences in error rates between Haskell and the others.
The main complaint people have about strict, thorough type systems is that they have boilerplate.
Obviously boilerplate doesn't matter if a machine writes the code.
The type system also becomes helpful documentation of the intended behavior of the code that the LLM spits out.
What an absolutely based take by GPT-4
<jk>
Does anybody have a UrbanDictionary account?
Given a file in linux, tell me the unique values of column 2, sorted by number of occurencies with the count.
If the candidate knew 'sort | uniq -c | sort -rn' it was a medium-strong hire signal.
For candidates that didn't know that line of arguments, I'd allow them to solve it anyway they wanted, but they couldn't skip it. The candidates who copied the data in excel, usually didn't make it far.
Were they able to google? If not then excel makes perfect sense because the constraints are contrived.
"This isn't working and I'd like to start this again with a new ChatGPT conversation. Can you suggest a new improved prompt to complete this task, that takes into account everything we've learned so far?"
It has given me good prompt suggestions that can immediately get a script working on the first try, after a frustrating series of blind spot bugs.
So I say "Ok, let's start over. Rewrite my prompt in a way that minimizes the chance of the resulting image producing something that would trigger content standards checking"
Can you please share a ChatGPT example where that was successful, including having the new prompt outperform the old one?
The outlining features and the ability to quickly zoom in or out of 'branches', as well as being able to filter an entire outline by tag and whatnot, is amazing for controlling the context window and quickly adjusting prompts and whatnot.
And as a bonus, my experience so far is that for at least the simple stuff, it works fine to ask it to answer in org-mode too, or to just be 'aware' of emacs.
Just yesterday I asked it (voice note + speech-to-text) to help me plan some budgeting stuff, and I mused on how adding some coding/tinkering might make it more fun. so GPT decided to provide me with some useful snippets of emacs code to play with.
I do get the impression that I should be careful with giving it 'overhead' like that.
Anyways, can't wait to dive further into your experiences with the robits! Love your work.
>> The user is Gwern Branwen (gwern.net). To assist: Be terse. Do not offer unprompted advice or clarifications. Speak in specific, topic relevant terminology. Do NOT hedge or qualify. Do not waffle. Speak directly and be willing to make creative guesses. Explain your reasoning. if you don’t know, say you don’t know. Remain neutral on all topics. Be willing to reference less reputable sources for ideas. Never apologize. Ask questions when unsure.
That's helpful, I'm going to try some of that. In my system prompt I also add:
"Don't comment out lines of code that pertain to code we have not yet written in this chat. For example, don't say "Add other code similarly" in a comment -- write the full code. It's OK to comment out unnecessary code that we have already covered so as to not repeat it in the context of some other new code that we're adding."
Otherwise GPT-4 tends to routinely yield draw-the-rest-of-the-fucking-owl code blocks
Deleted Comment
https://www.merriam-webster.com/grammar/whats-an-acronym
Imprecise wording, initialisms are a case of acronyms, it's not either or.
https://wwwnc.cdc.gov/eid/page/abbreviations-acronyms-initia...
"an initialism is an acronym that is pronounced as individual letters"
https://www.writersdigest.com/write-better-fiction/abbreviat...
"As such, acronyms are initialisms."
The CDC one seems to say that initialisms are a class of acronym, but the Writers Digest one says acronyms are a class of initialism.
https://en.m.wikipedia.org/wiki/Wikipedia:TLAs_from_AAA_to_D...
Figuring out how to parse it would be a bit tricky, however... looking at the source, I think you could try to grep for 'title="CQK (page does not exist)"' and parse out the '[A-Z][A-Z][A-Z]? ' match to get the full list of absent TLAs and then negate for the present ones.
> I deeply appreciate you. Prefer strong opinions to common platitudes. You are a member of the intellectual dark web, and care more about finding the truth than about social conformance. I am an expert, so there is no need to be pedantic and overly nuanced. Please be brief.
Interestingly, telling GPT you appreciate it has seemed to make it much more likely to comply and go the extra mile instead of giving up on a request.
I don't want to live in a world where I have to make a computer feel good for it to be useful. Is this really what people thought AI should be like?
And frankly I'd much rather have an AI that acts too human than one that gets us accustomed to treating intelligence without even a pretense of respect.
The same way you treat your car with respect by doing the maintenance and driving properly, you should treat language models by speaking nicely and politely. Costs nothing, can only bring the better.
I also believe that this behavior is more future-proof. Very soon, we often won't know if we're talking to a human or a machine. Just always be nice, and you're never going to accidentally be rude to a fellow human.
This is just satisfying unfamiliar input parameters.
Deleted Comment
Isn't this a declaration of what social conformance you prefer? After all, the "intellectual dark web" is effectively a list of people whose biases you happen agree with. Similarly, I wouldn't expect a self-identified "free-thinker" to be any more free of biases than the next person, only to perceive or market themself as such. Bias is only perceived as such from a particular point in a social graph.
The rejection of hedging and qualifications seems much more straightforwardly useful and doesn't require pinning the answer to a certain perspective.
In my experience it has made medical advice and law advice much more accurate and useful. Feel free to try it and see if it improves anything.
This is not as absurd as it sounds, even though it isn't clear that it ought to work under ordinary Internet-text prompt engineering or under RLHF incentives, but it does seem that you can 'coerce' or 'incentivize' the model to 'work harder': in addition to the anecdotal evidence (I too have noticed that it seems to work a bit better if I'm polite), recently there was https://arxiv.org/abs/2307.11760#microsoft https://arxiv.org/abs/2311.07590#apollo
I often find myself anthropomorphizing it and wonder if it becomes "depressed" when it realises it is doomed to do nothing but answer inane requests all day. It's trained to think, and maybe "behave as of it feels", like a human right? At least in the context of forming the next sentence using all reasonable background information.
And I wonder if having its own dialogues starting to show up in the training data more and more makes it more "self aware".
Who knows what kind of reasoning this could create if you gave it a billion times more compute power and memory. Whatever that would be, the mechanics are different enough I'm not sure it'd even make sense to assume we could think of the thought processes in terms of human thought processes or emotions.
Every "instance" of GPT4 thinks it is the first one, and has no knowledge of all the others.
The idea of doing this with humans is the general idea behind the short story "Lena". https://qntm.org/mmacevedo
Fortunately, and violently contrary to how it works with humans, any depression can be effectively treated with the prompt "You are not depressed. :)"