There are 12 frogs. Five are green, 3 red, and 4 yellow. Two donkeys are counting the frogs. One of the donkeys is yellow, the other green. Each donkey is unable to see frogs that are the same color as itself, also each donkey was careless and missed a frog when counting. How many frogs does the green donkey count?
GPT4 answers 6 every time for me.
My point is that GPT is capable of a certain amount of "reasoning" about puzzles that most certainly don't exist in it's training data. Playing with it, it's clear that in this current generation the reasoning ability doesn't go very deep - just change the above puzzle a little to make it even slightly more complicated and it breaks. The amazing thing isn't how good at reasoning it is, but that a computer can reason at all.
Edit: I do see now that "He saw" kind of messes the question up. My intent would have been better expressed with "There were". But again this proves my point! GPT4 is able to (most of the time) correctly work through the poor wording and interpret the question the way I meant it, and I think the way most people would read it.
>> A magical frog was counting unicorns. He saw 5 purple unicorns, 2 green unicorns, and 7 pink unicorns. However, he made a mistake and didn't see 2 unicorns: one purple and one green. Also, since he was a magical frog, he didn't see unicorns that were the same color as himself. How many unicorns did he count?
It correctly answers 11 for me.
To me this has demonstrated:
* "Understanding": It understood that "didn't see" implies he didn't count.
* "Knowledge": It knew enough about the world to know that frogs are often green.
* "Reasoning": It was able to correctly reason about how many should be subtracted from the final result.
* "Math: It successfully did some basic additions and subtractions arriving at the correct answer.
Crucially, I made this up right here on the spot, and used a dice for some of the numbers. This question does not exist anywhere in the training corpus!
I think this demonstrates an impressive level of intelligence, for what up until about a year ago I thought a computer would ever be capable of in my lifetime. Now in absolute terms of course current gen ChatGPT is clearly far less good at reasoning and understanding than most people (well, specifically it seems to me that it's knowledge and reasoning are super-humanly broad, but child-level deep).
Can future improvements to this architecture improve the depth up to "AGI", whatever that means? I have no idea. It doesn't automatically seem impossible, but maybe what we see now is already near the limit? I guess only time will tell.
It might be beneficial to start your dataset at the key (word) level, generate some embeddings of the key pair in the source and target and stash them, then do the same for sentence level and just for fun, paragraph level. (I believe you could get enough context from the sentence level as a paragraph is just a group of sentences but it would still be interesting to generate paragraph level key pairs I think).
From there you’d have a set of embeddings of each word src:tgt that also has context of how it fits in a sentence level and paragraph level with the respective nuances of each language.
Once you have that dataset then you can augment your data with prompts like you’re using but also including some contextual references of word pairs, and sentence pairs in your prompt which should corner the LLM into the right path.
Edit: not an expert so will heed if someone smarter comes along.
I want to try fine-tuning to machine translate to and from a fairly niche language (https://en.wikipedia.org/wiki/S'gaw_Karen_language). How much text would I need, and what format would be ideal?
I have a number of book length texts, most only in the target language, and a few bilingual or multilingual. For the bilingual and multilingual texts, I can script out probably several thousand pairs of "translate the following text from <source_lang> to <target_lang>: <source_lang_text> <target_lang_text>". Do I need to vary the prompt and format, or can I expect the LLM to generalize to different translation requests? Is there value in repeating the material in different lengths? One set of sentence lengths, another paragraph, and another page or chapter length? Also what should be done with the monolingual texts, just ignore them?
https://www.youtube.com/watch?v=WWRniMqhr00https://en.wikipedia.org/wiki/Inedia
To me it seems uncalled for to accuse go_elmo of lying about knowing some of these people.
I tried Youtube Kids because I wanted the content to be pre-approve only, I liked the toddler friendly controls, and the app can go in single app mode on the iPad. It's subscription, which is fine, but the main problem is they won't let me approve the content I want. I can see some great kids stories and learning videos in our language on regular Youtube, but can't seem to find a way to get them into the Youtube kids app. (And I certainly am not going to turn her loose on regular Youtube.)
There don't seem to be any other apps that do what I want so I ended up setting up a Plex media server and use yt-dlp to download the videos for her. This works pretty well, but is a lot more work. And the app is not great.