Very impressive! I noticed two really notable things right off the bat:
1. I asked it a question about a feature that TypeScript doesn't have[1]. GPT4 usually does not recognize that it's impossible (I've tried asking it a bunch of times, it gets it right with like 50% probability) and hallucinates an answer. Gemini correctly says that it's impossible. The impressive thing was that it then linked to the open GitHub issue on the TS repo. I've never seen GPT4 produce a link, other than when it's in web-browsing mode, which I find to be slower and less accurate.
2. I asked it about Pixi.js v8, a new version of a library that is still in beta and was only posted online this October. GPT4 does not know it exists, which is what I expected. Gemini did know of its existence, and returned results much faster than GPT4 browsing the web. It did hallucinate some details, but it correctly got the headline features (WebGPU, new architecture, faster perf). Does Gemini have a date cutoff at all?
[1]: My prompt was: "How do i create a type alias in typescript local to a class?"
The biggest advantage of Bard is the speed, it's practically instant.
I asked: How would I go about creating a sandbox directory for a subordinate user (namespaced user with subuid - e.g. uid 100000), that can be deleted as the superior user (e.g. uid 1000)? I want this to be done without root permissions.
Both said that it's impossible, which is the the generally accepted answer.
I then added: I don't care about data loss.
Bard correctly suggested mounting a filesystem (but didn't figure out that tmpfs would be the one to use). ChatGPT suggested using the sticky bit, which would make the situation worse.
Handing this one to Bard, especially given that it generated more detailed answers much faster.
> How would I go about creating a sandbox directory for a subordinate user (namespaced user with subuid - e.g. uid 100000), that can be deleted as the superior user (e.g. uid 1000)? I want this to be done without root permissions.
Off topic, but it feels so weird that this is not possible. I've run into this with rootless Docker recently.
> (namespaced user with subuid - e.g. uid 100000), that can be deleted as the superior user (e.g. uid 1000)
I'm afraid I don't know what this means. That when you delete uid 1000, uid 100000 also gets deleted? Or, only user 1000 has permission to delete user 100000 ?
Not sure about Gemini specifically (it’s so new!) but Google has previously said that bard is updated daily with current news and information.
Obviously Google has potential advantages being able to lean into their indexes so the raw model doesn’t need to embed/train against things like GitHub issues. I wonder if we’ll see LLM-optimized websites with built-in prompts to replace SEO websites.
from what I remember bard should be able to browse the internet and write code internally to better answer queries. I feel like these abilities are just improved with Gemini as a better language model.
> "Do you mean to ask if I have a cutoff date for the data I was trained on? If so, the answer is yes. My training data includes text and code from various sources, and the most recent data I was trained on was from July 2023."
Those impressive demos, e.g. the cup shuffling seem to have been "staged". The end results are correct, but the method of getting them is nowhere near as fluid and elegant as in the demo. They used a series of still images with carefully crafted prompts. More info: https://developers.googleblog.com/2023/12/how-its-made-gemin...
I use GPT with the custom instructio "provide references where it makes sense" and it frequently provides links which most of the time are accurate. A good prompt does wonders. My GPT-3.5 output bellow. It doesn't give a correct answer but provide a link that makes sense.
Q: How do I create a type alias in typescript local to a class? Provide references where it makes sense.
A: In TypeScript, you can create a type alias within a class using the type keyword. Here's an example:
> I've never seen GPT4 produce a link, other than when it's in web-browsing mode, which I find to be slower and less accurate.
Really? I've been using gpt4 since about April and it used to very often create links for me. I'll tell it hey I want to find a company that does X in Y city and it generates 5 links for me, and at least one of them is usually real and not hallucinated
It's amazing to me how low the bar is for AI to impress people. Really, 80% of the links were hallucinated, and that's somehow more useful than Kagi for [checks notes] finding real links?
Can you imagine if you did a search on Google and 80% of the results weren't even real websites? We'd all still be using AltaVista!
What on earth kind of standard is "1/5 results actually exist!" -- no comment on whether the 1/5 real results is even relevant. My guess: the real links are usually irrelevant.
OK, maybe "never" is strong, but I've never seen ChatGPT say "This is not a feature that exists, but here's the open issue". And I've asked ChatGPT about a good many features that don't exist.
I have the impression that something was tweaked to reduce the likelihood of generating links. It used to be easy to get GTP to generate links. Just ask it to produce a list of sources. But it doesn't do that anymore.
I think Gemini Pro is in bard already? So that's what it might be. A few users on reddit also noticed improved Bard responses a few days before this launch
I asked it and ChatGPT about a gomplate syntax (what does a dash before an if statement do).
Gemini hallucinated an answer, and ChatGPT had it write.
I followed up, and said that it was wrong, and it went ahead and tried to say sorry and come up with with two purposes of a dash in gomplate, but proceeded to only reply with one purpose.
For others that were confused by the Gemini versions: the main one being discussed is Gemini Ultra (which is claimed to beat GPT-4). The one available through Bard is Gemini Pro.
For the differences, looking at the technical report [1] on selected benchmarks, rounded score in %:
I realize that this is essentially a ridiculous question, but has anyone offered a qualitative evaluation of these benchmarks? Like, I feel that GPT-4 (pre-turbo) was an extremely powerful model for almost anything I wanted help with. Whereas I feel like Bard is not great. So does this mean that my experience aligns with "HellaSwag"?
not even original chatgpt level, it is a hallucinating mess still. Did the free bard get an update today? I am in the included countries, but it feels the same as it has always been.
Yes and no. In the paper, they do compare apples to apples with GPT4 (they directly test GPT4's CoT@32 but state its 5-shot as "reported"). GPT4 wins 5-shot and Gemini wins CoT@32. It also came off to me like they were implying something is off about GPT4's MMLU.
After reading this blog post, that hands-on video is just straight-up lying to people. For the boxcar example, the narrator in the video says
to Gemini:
> Narrator: "Based on their design, which of these would go faster?"
Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:
> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."
They narrated inaccurate prompts for the Sun/Saturn/Earth example too:
> Narrator: "Is this the right order?"
> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."
If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!
Yeah I think this comment basically sums up my cynicism about that video.
It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."
It'll be interesting to use it more as it gets more generally available though.
It's always like this isn't it. I was watching the demo and thought why ask it what duck is in multiple languages? Siri can do that right now and it's not an ai model. I really do think we're getting their with the ai revolution but these demos are so far from exciting, they're just mundane dummy tasks that don't have the nuance of everything we really interact and would need help from an ai with
To quote Gemini, what the quack! Even with the understanding that these are handpicked interactions that are likely to be among the system's best responses, that is an extremely impressive level of understanding and reasoning.
Even if we get Gemini 2.0 or GPT-6 that is even better at the stuff it's good at now... you've always been able to outsource 'tasks' for cheap. There is no shortage of people that can write somewhat generic text, write chunks of self contained code, etc.
This might lower the barrier of entry but it's basically a cheaper outsourcing model. And many companies will outsource more to AI. But there's probably a reason that most large companies are not just managers and architects who farm out their work to the cheapest foreign markets.
Similar to how many tech jobs have gone from C -> C++ -> Java -> Python/Go, where the average developer is supposd to accomplish a lot more than perviously, I think you'll see the same for white collar workers.
Software engieneering didn't die because you needed so much less work to do a network stack, the expectations changed.
This is just non technical white collar worker's first level up from C -> Java.
>What will someone entering the workforce today even be doing in 2035?
The same thing they're doing now, just with tools that enable them to do some more of it. We've been having these discussions a dozen times, including pre- and post computerization and every time it ends up the same way. We went from entire teams writing Pokemon in Z80 assembly to someone cranking out games in Unity while barely knowing to code, and yet game devs still exist.
Yeah it has been quite the problem to think about ever since the original release of ChatGPT, as it was already obvious where this will be going and multimodal models more or less confirmed it.
There's two ways this goes: UBI or gradual population reduction through unemployment and homelessness. There's no way the average human will be able to produce any productive value outside manual labor in 20 years. Maybe not even that, looking at robots like Digit that can already do warehouse work for $25/hour.
I'm wondering the same, but for the narrower white collar subset of tech workers, what will today's UX/UI designer or API developer be doing in 5-10 years.
Whatever you want, probably. Or put a different way: "what's a workforce?"
"We need to do a big calculation, so your HBO/Netflix might not work correctly for a little bit. These shouldn't be too frequent; but bear with us."
Go ride a bike, write some poetry, do something tactile with feeling. They're doing something, but after a certain threshold, us humans are going to have to take them at their word.
The graph of computational gain is going to go linear, quadratic, ^4, ^8, ^16... all the way until we get to it being a vertical line. A step function. It's not a bad thing, but it's going to require a perspective shift, I think.
Edit: I also think we should drop the "A" from "AI" ...just... "Intelligence."
Yeah, this feels like the revenge of the blue collar workers. Maybe the changes won't be too dramatic, but the intelligence premium will definitely go down.
Ironically, this is created by some of the most intelligent people.
Out of curiosity I fed ChatGPT 4 a few of the challenges through a photo (unclear if Gemini takes live video feed as input but GPT does not afaik) and it did pretty well. It was able to tell a duck was being drawn at an earlier stage before Gemini did. Like Gemini it was able to tell where the duck should go - to the left path to the swan. Because and I quote "because ducks and swans are both waterfowl, so the swan drawing indicates a category similarity (...)"
Gemini made a mistake, when asked if the rubber duck floats, it says (after squeaking comment): "it is a rubber duck, it is made of a material which is less dense than water". Nope... rubber is not less dense (and yes, I checked after noticing, rubber duck is typically made of synthetic vinyl polymer plastic [1] with density of about 1.4 times the density of water, so duck floats because of air-filled cavity inside and not because of material it is made of). So it is correct conceptually, but misses details or cannot really reason based on its factual knowledge.
P.S. I wonder how these kind of flaws end up in promotions. Bard made a mistake about JWST, which at least is much more specific and is farther from common knowledge than this.
I showed the choice between a bear and a duck to GPT4, and it told me that it depends on whether the duck wants to go to a peaceful place, or wants to face a challenge :D
The category similarity comment is amusing. My ChatGPT4 seems to have an aversion to technicality, so much that I’ve resorted to adding “treat me like an expert researcher and don’t avoid technical detail” in the prompt
Right. I would hope that competition does such live demonstration of where it fails. But I guess they won't because that would be bad publicity for AI in general.
I once met a Google PM whose job was to manage “Easter eggs” in the Google home assistant. I wonder how many engineers effectively “hard coded” features into this demo. (“What the quack” seems like one)
I wish I could see it in real time, without the cuts, though. It made it hard to tell whether it was actually producing those responses in the way that is implied in the video.
All the implications, from UI/UX to programming in general.
Like how much of what was 'important' to develop a career in the past decades, even in the past years, will be relevant with these kinds of interactions.
I'm assuming the video is highly produced, but it's mind blowing even if 50% of what the video shows works out of the gate and is as easy as it portrays.
It seems weird to me. He asked it to describe what it sees, why does it randomly start spouting irrelevant facts about ducks? And is it trying to be funny when it's surprised about the blue duck? Does it know it's trying to be funny or does it really think it's a duck?
I can't say I'm really looking forward to a future where learning information means interacting with a book-smart 8 year old.
Yeah it's weird why they picked this as a demo. The model could not identify an everyday item like a rubber duck? And it doesn't understand Archimedes' principle, instead reasoning about the density of rubber?
Regular professionals that spend any time with text; sending emails, recieving mails, writing paragraphs of text for reports, reading reports, etc; all of that is now easier. Instead of taking thirty minutes to translate an angry email to a client where you want to say "fuck you, pay me", you can run it through an LLM and have it translated into professional business speak, and send out all of those emails before lunch, instead of spending all day writing instead. Same on the recieving side as well. Just ask an LLM to summarize the essay of an email to you in bullet points, and save yourself the time reading.
The multimodal capabilities are, but the tone and insight comes across as very juvenile compared to the SotA models.
I suspect this was a fine tuning choice and not an in context level choice, which would be unfortunate.
If I was evaluating models to incorporate into an enterprise deployment, "creepy soulless toddler" isn't very high up on the list of desired branding characteristics for that model. Arguably I'd even have preferred histrionic Sydney over this, whereas "sophisticated, upbeat, and polite" would be the gold standard.
While the technical capabilities come across as very sophisticated, the language of the responses themselves do not at all.
honestly - of all the AI hype demos and presentations recently - this is the first one that has really blown my mind. Something about the multimodal component of visual to audio just makes it feel realer. I would be VERY curious to see this live and in real time to see how similar it is to the video.
Google needs to pay someone to come up with better demos. Atleast this one is 100x better than the talking to pluto dumb demo they came up with few years ago.
Let's hope we're in the 0.0001% when things get serious. Otherwise it'll be the wagie existence for us (or whatever the corporate overlords have in mind then).
Technically still exciting, just in the survival sense.
One observation: Sundar's comments in the main video seem like he's trying to communicate "we've been doing this ai stuff since you (other AI companies) were little babies" - to me this comes off kind of badly, like it's trying too hard to emphasize how long they've been doing AI (which is a weird look when the currently publicly available SOTA model is made by OpenAI, not Google). A better look would simply be to show instead of tell.
In contrast to the main video, this video that is further down the page is really impressive and really does show - the 'which cup is the ball in is particularly cool': https://www.youtube.com/watch?v=UIZAiXYceBI.
Other key info: "Integrate Gemini models into your applications with Google AI Studio and Google Cloud Vertex AI. Available December 13th." (Unclear if all 3 models are available then, hopefully they are, and hopefully it's more like OpenAI with many people getting access, rather than Claude's API with few customers getting access)
He's not wrong. DeepMind spends time solving big scientific / large-scale problems such as those in genetics, material science or weather forecasting, and Google has untouchable resources such as all the books they've scanned (and already won court cases about)
They do make OpenAI look like kids in that regard. There is far more to technology than public facing goods/products.
It's probably in part due to the cultural differences between London/UK/Europe and SiliconValley/California/USA.
Oh it's good they working on important problems with their ai. Its just openai was working on my/our problems (or providing tools to do so) and that's why people are more excited about them.
Not because of cultural differences. If you are more into weather forecasting, yeah it sure may be reasonable to prefer google more.
That statement isn't really directed at the people who care about the scientific or tech-focused capabilities. I'd argue the majority of those folks interested in those things already know about DeepMind.
This statement is for the mass market MBA-types. More specifically, middle managers and dinosaur executives who barely comprehend what generative AI is, and value perceived stability and brand recognition over bleeding edge, for better or worse.
I think the sad truth is an enormous chunk of paying customers, at least for the "enterprise" accounts, will be generating marketing copy and similar "biz dev" use cases.
Great. But school's out. It's time to build product. Let the rubber hit the road. Put up or shut up, as they say.
I'm not dumb enough to bet against Google. They appear to be losing the race, but they can easily catch up to the lead pack.
There's a secondary issue that I don't like Google, and I want them to lose the race. So that will color my commentary and slow my early adoption of their new products, but unless everyone feels the same, it shouldn't have a meaningful effect on the outcome. Although I suppose they do need to clear a higher bar than some unknown AI startup. Expectations are understandably high - as Sundar says, they basically invented this stuff... so where's the payoff?
They do not make Openai look like kids. If anything, it looks like they spent more time, but achieved less. GPT-4 is still ahead of anything Google has released.
From afar it seems like the issues around Maven caused Google to pump the brakes on AI at just the wrong moment with respect to ChatGPT and bringing AI to market. I’m guessing all of the tech giants, and OpenAI, are working with various defense departments yet they haven’t had a Maven moment. Or maybe they have and it wasn’t in the middle of the race for all the marbles.
> and Google has untouchable resources such as all the books they've scanned (and already won court cases about)
https://www.hathitrust.org/ has that corpus, and its evolution, and you can propose to get access to it via collaborating supercomputer access. It grows very rapidly. InternetArchive would also like to chat I expect. I've also asked, and prompt manipulated chatGPT to estimate the total books it is trained with, it's a tiny fraction of the corpus, I wonder if it's the same with Google?
> Sundar's comments in the main video seem like he's trying to communicate "we've been doing this ai stuff since you (other AI companies) were little babies" - to me this comes off kind of badly
Reminds me of the Stadia reveal, where the first words out of his mouth were along the lines of "I'll admit, I'm not much of a gamer"
How about we go further and just state what everyone (other than Wall St) thinks: Google needs a new CEO.
One more interested in Google's supposed mission ("to organize the world's information and make it universally accessible and useful"), than in Google's stock price.
To add to my comment above: Google DeepMind put out 16 videos about Gemini today, the total watch time at 1x speed is about 45 mins. I've now watched them all (at >1x speed).
My current context: API user of OpenAI, regular user of ChatGPT Plus (GPT-4-Turbo, Dall E 3, and GPT-4V), occasional user of Claude Pro (much less since GPT-4-Turbo with longer context length), paying user of Midjourney.
Gemini Pro is available starting today in Bard. It's not clear to me how many of the super impressive results are from Ultra vs Pro.
Overall conclusion: Gemini Ultra looks very impressive. But - the timing is disappointing: Gemini Ultra looks like it won't be widely available until ~Feb/March 2024, or possibly later.
> As part of this process, we’ll make Gemini Ultra available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year.
> Early next year, we’ll also launch Bard Advanced, a new, cutting-edge AI experience that gives you access to our best models and capabilities, starting with Gemini Ultra.
I hope that there will be a product available sooner than that without a crazy waitlist for both Bard Advanced, and Gemini Ultra API. Also fingers crossed that they have good data privacy for API usage, like OpenAI does (i.e. data isn't used to train their models when it's via API/playground requests).
What they've released today: Gemini Pro is in Bard today. Gemini Pro will be coming to API soon (Dec 13?). Gemini Ultra will be available via Bard and API "early next year"
Therefore, as of Dec 6 2023:
SOTA API = GPT-4, still.
SOTA Chat assistant = ChatGPT Plus, still, for everything except video, where Bard has capabilities . ChatGPT plus is closely followed by Claude. (But, I tried asking Bard a question about a youtube video today, and it told me "I'm sorry, but I'm unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn't a valid YouTube link, potentially unsafe content, or the content does not have a captions file that I can read.")
SOTA API after Gemini Ultra is out in ~Q1 2024 = Gemini Ultra, if OpenAI/Anthropic haven't released a new model by then
SOTA Chat assistant after Bard Advanced is out in ~Q1 2024 = Bard Advanced, probably, assuming that OpenAI/Anthropic haven't released new models by then
Watching these videos made me remember this cool demo Google did years ago where their earpods would auto translate in realtime a conversation between two people talking different languages. Turned out to be demo vaporware. Will this be the same thing?
When I watch any of these videos, all the related videos on my right sidebar are from Google, 16 of which were uploaded at the same time as the one I'm watching.
I've never seen the entire sidebar filled with the videos of a single channel before.
> to me this comes off kind of badly, like it's trying too hard to emphasize how long they've been doing AI
These lines are for the stakeholders as opposed to consumers. Large backers don't want to invest in a company that has to rush to the market to play catch-up, they want a company that can execute on long-term goals. Re-assuring them that this is a long-term goal is important for $GOOG.
Its a conceit but not unjustified, they have been doing "AI" since their inception. And yeah, Sundar's term up until recently seems to me to be milking existing products instead of creating new ones, so it is a bit annoying when they act like this was their plan the whole time.
Google's weakness is on the product side, their research arm puts out incredible stuff as other commenters have pointed out. GPT essentially came out from Google researchers that were impatient with Google's reluctance to ship a product that could jeopardize ad revenue on search.
The point is if you have to remind people then you’re doing something wrong. The insight to draw from this is not that everyone else is misinformed about googles abilities (the implication), its that Google has not capitalized on their resources.
It's such a short sighted approach too because I'm sure someone will develop a GPT with native advertising and it'll be a blockbuster because it'll be free to use but also have strong revenue generating potential.
I also find that tone a bit annoying but I'm OK with it because it highlights how these types of bets, without an immediate benefit, can pay off very well in the long term, even for huge companies like Google. AI, as we currently know it, wasn't really a "thing" when Google started with it and the payoff wasn't clear. They've long had to defend their use of their own money for big R&D bets like this and only now is it really clearly "adding shareholder value".
Yes, I know it was a field of interest and research long before Google invested, but the fact remains that they _did_ invest deeply in it very early on for a very long time before we got to this point.
Their continued investment has helped push the industry forward, for better or worse. In light of this context, I'm ok with them taking a small victory lap and saying "we've been here, I told you it was important".
> only now is it really clearly "adding shareholder value".
AI has been adding a huge proportion of the shareholder value at Google for many years. The fact that their inference systems are internal and not user products might have hidden this from you.
> we've been doing this ai stuff since you (other AI companies) were little babies
Actually, they kind of did. What's interesting is that they still only match GPT-4's version but don't propose any architectural breakthroughs. From an architectural standpoint, not much has changed since 2017. The 'breakthroughs', in terms of moving from GPT to GPT-4, included: adding more parameters (GPT-2/3/4), fine-tuning base models following instructions (RLHF), which is essentially structured training (GPT-3.5), and multi-modality, which involves using embeddings from different sources in the same latent space, along with some optimizations that allowed for faster inference and training. Increasing evidence suggests that AGI will not be attainable solely using LLMs/transformers/current architecture, as LLMs can't extrapolate beyond the patterns in their training data (according to a paper from DeepMind last month):
"Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities."[1]
Sundar studied material science in school and is only slightly older than me. Google is a little over 25 years old. I guarantee you they have not been doing AI since I was a baby.
And how many financial people worth reconning with are under 30 years old? Not many.
Unless you are OpenAI, the company, I doubt OP implied it was aimed at you. But then I wouldn't know as I am much younger than Sundar Pichai and I am not on first name basis with him either ;-)
I do think that’s a backfire. Telling me how long you’ve been doing something isn’t that impressive if the other guy has been doing it for much less time and is better at it. It’s in fact the opposite.
> One observation: Sundar's comments in the main video seem like he's trying to communicate "we've been doing this ai stuff since you (other AI companies)
Sundar has been saying this repeatedly since Day 0 of the current AI wave. It's almost cliche for him at this point.
And he's going to keep saying it to tell investors why they should believe Google will eventually catch up in product until Google does catch up in product and he doesn't need to say it anymore.
Or until Google gives up on the space, or he isn't CEO, if either of those come first, which I wouldn't rule out.
I spotted that too, but also, it didn't recognise the "bird" until it had feet, when it is supposedly better than a human expert. I don't doubt that the examples were cherry-picked, so if this is the best it can do, it's not very convincing.
I would've liked to see an explanation that includes the weight of water being displaced. That would also explain how a steel ship with an open top is also able to float.
In fairness, the performance/size ratio for models like BERT still gives GPT-3/4 and even Llama a run for it's money. Their tech isn't as product-ized as OpenAI's, but Tensorflow and it's ilk have been an essential part of driving actual AI adoption. The people I know in the robotics and manufacturing industries are forever grateful for the out-front work Google did to get the ball rolling.
You seem to be saying the same thing- Googles best work is in the past, their current offerings are underwhelming, even if foundational to the progress of others.
Didn't Google invent LLMs and didn't Google have an internal LLm with similar capabilities long before openai released the gpts? Remember when that guy got fired for making a claim it was conscious ?
No this is not correct. Arguably OpenAI invented LLMs with GPT3 and the preceding scaling laws paper. I worked on LAMDA, it came after GPT4 and was not as capable. Google did invent the transformer, but all the authors of the paper have left since.
Incredible stuff, and yet TTS is still so robotic. Frankly I assume it must be deliberate at this point, or at least deliberate that nobody's worked on it because it's comparatively easy and dull?
(The context awareness of the current breed of generative AI seems to be exactly what TTS always lacks, awkward syllables and emphasis, pronunciation that would be correct sometimes but not after that word, etc.)
Sundar's comments about Google doing AI (really ML) are based more on things that people externally know very little about. Systems like SETI, Sibyl, RePhil, SmartASS. These were all production ML systems that used fairly straightforward and conventional ML combined with innovative distributed computing and large-scale infrastructure to grow Google's product usage significantly over the past 20 years.
However, SmartASS and sibyl weren't really what external ML people wanted- it was just fairly boring "increase watch time by identifying what videos people wioll click on" and "increase mobile app installs" or "show the ads people are likely to click on".
It really wasn't until vincent vanhoucke stuffed a bunch of GPUs into a desktop and demonstrated scalable and dean/ng built their cat detector NN that google started being really active in deep learning. That was around 2010-2012.
But their first efforts in BARD were really not great. I'd just have left the bragging out in terms of how long. OpenAI and others have no doubt sent a big wakeup call to google. For a while it seemed like they had turned to focus an AI "safety" (remembering some big blowups on those teams as well) with papers about how AI might develop negative stereotypes (ie, men commit more violent crime then women?). That seems to have changed - this is very product focused, and I asked it some questions that in many models are screened out for "safety" and it responded which is almost even more surprising (ie. Statistically who commits more violent crime, men or women).
> A better look would simply be to show instead of tell.
Completely! Just tried Bard. No images and the responses it gave me were pretty poor. Today's launch is a weak poor product launch, looks mostly like a push to close out stuff for Perf and before everybody leaves for the rest of the December for vacation.
A simple REST API with a static token auth like OpenAI API would help. Previously when I tried Bard API it was refusing to accept token auth, requiring that terrible oauth flow so I gave up.
That was ages ago. In AI even a week feels like a whole year in other fields. And many/most of those researchers have fled to startups, so those startups also have a right to brag. But not too much - only immediate access to a model beating GPT4 is worth bragging today (cloud), or getting GPT3.5 quality from a model running on a phone (edge).
So it's either free-private-gpt3.5 or cloud-better-than-gpt4v. Nothing else matters now. I think we have reached an extreme point of temporal discounting (https://en.wikipedia.org/wiki/Time_preference).
I find this video really freaky. It’s like Gemini is a baby or very young child and also a massively know it all adult that just can’t help telling how clever it is and showing off its knowledge.
People speak of the uncanny valley in terms of appearance. I am getting this from Gemini. It’s sort of impressive but feels freaky at the same time.
No, there's an odd disconnect between the impressiveness of the multimodal capabilities vs the juvenile tone and insights compared to something like GPT-4 that's very bizarre in application.
It is a great example of what I've been finding a growing concern as we double down on Goodhart's Law with the "beats 30 out of 32 tests compared to existing models."
My guess is those tests are very specific to evaluations of what we've historically imagined AI to be good at vs comprehensive tests of human ability and competencies.
So a broad general pretrained model might actually be great at sounding 'human' but not as good at logic puzzles, so you hit it with extensive fine tuning aimed at improving test scores on logic but no longer target "sounding human" and you end up with a model that is extremely good at what you targeted as measurements but sounds like a creepy toddler.
We really need to stop being so afraid of anthropomorphic evaluation of LLMs. Even if the underlying processes shouldn't be anthropomorphized, the expressed results really should be given the whole point was modeling and predicting anthropomorphic training data.
"Don't sound like a creepy soulless toddler and sound more like a fellow human" is a perfectly appropriate goal for an enterprise scale LLM, and we shouldn't be afraid of openly setting that as a goal.
What an ugly statement. DeepMind has been very open with their research since the beginning because their objective was much more on making breakthroughs with moonshot projects than near term profit.
Lots of comments about it barely beating GPT-4 despite the latter being out for a while, but personally ill be happy to have another alternative, if nothing else for the competition.
But I really dislike these pre-availability announcements - we have to speculate and take their benchmarks for gospel for a week, while they get a bunch of press for unproven claims.
Back to the original point though, ill be happier having google competing in this space, I think we will all benefit from heavyweight competition.
One of my biggest concerns with many of these benchmarks is that it’s really hard to tell if the test data has been part of the training data.
There are terabytes of data fed into the training models - entire corpus of internet, proprietary books and papers, and likely other locked Google docs that only Google has access to.
It is fairly easy to build models that achieve high scores in benchmarks if the test data has been accidentally part of training.
GPT-4 makes silly mistakes on math yet scores pretty high on GSM8k
Everyone in the open source LLM community know the standard benchmarks are all but worthless.
Cheating seems to be rampant, and by cheating I mean training on test questions + answers. Sometimes intentional, sometimes accidental. There are some good papers on checking for contamination, but no one is even bothering to use the compute to do so.
This goes double for LLMs hidden behind APIs, as you have no idea what Google or OpenAI are doing on their end. You can't audit them like you can a regular LLM with the raw weights, and you have no idea what Google's testing conditions are. Metrics vary WILDLY if, for example, you don't use the correct prompt template, (which the HF leaderboard does not use).
...Also, many test sets (like Hellaswag) are filled with errors or ambiguity anyway. Its not hidden, you can find them just randomly sampling the tests.
"
You are an AI that outputs questions with responses. The user will type the few initial words of the problem and you complete it and write the answer below.
"
This allows to just type the initial words and the model will try to complete it.
Even if they aren't, there's a separate concern that we're past the inflection point of Goodhart's Law and this blind focus on a handful of tests evaluating a small scope of capabilities is going to be leading to model regression in areas that aren't being evaluated or measured as a target.
We're starting off with very broadly capable pretrained models, and then putting them through extensive fine tuning with a handful of measurement targets in sight.
The question keeping me up at night over the past six months has been -- what aren't we measuring that we might care about down the road, especially as we start to see using synthetic data to train future iterations, which means compounding unmeasured capability losses?
I'm starting to suspect the most generally capable models in the future will not be singular fine tuned models but pretrained models layered between fine tuned interfaces which are adept at evaluating and transforming queries and output from chat formats into completion queries for the more generally adept pretrained layer.
I feel this is overstated hype. There is no competitor to GPT-4 being released today. It would've been a much better look to release something available to most countries and with the advertised stats.
Europe has gone to great lengths to make itself an incredibly hostile environment for online businesses to operate in. That's a fair choice, but don't blame Google for spending some extra time on compliance before launching there.
Basically the entire world, except countries that specifically targeted American Big Tech companies for increased regulation.
> Europe has gone to great lengths to make itself an incredibly hostile environment for online businesses to operate in.
This is such an understated point. I wonder if EU citizens feel well-served by e.g. the pop-up banners that afflict the global web as a result of their regulations[1]. Do they feel like the benefits they get are worth it? What would it take for that calculus to change?
1 - Yes, some say that technically these are not required. But even official organs of the EU such as https://europa.eu continue to use such banners.
But Bard already complied to EU laws? I mean. Bard has already gone through this and it was opened in EU.
I really wonder how changing an LLM underpinning a service will influence this (I thought compliance had to do with service behavior and data sharing across their platform -- not the algorithm). And I wonder what Google is actually doing here that made them suspect they'll fail compliance once again. And why they did it.
Agreed. The whole things reeks of being desperate. Half the video is jerking themselves off that they've done AI longer than anyone and they "release" (not actually available in most countries) a model that is only marginally better than the current GPT4 in cherry-picked metrics after nearly a year of lead-time?!?!
Have you seen the demo video, it is really impressive and AFAIK OpenAI does not has similar features product offering at the moment, demo or released.
Google essentially claimed a novel approach of native multi-modal LLM unlike OpenAI non-native approach and doing so according to them has the potential to further improve LLM the state-of-the-art.
They have also backup their claims in a paper for the world to see and the results for ultra version of the Gemini are encouraging, only losing in the sentence completion dataset to ChatGPT-4. Remember the new Gemini native multi-modal has just started and it has reached version 1.0. Imagine if it is in version 4 as ChatGPT is now. Competition is always good, does not matter if it is desperate or not, because at the end the users win.
I’m impressed that it’s multimodal and includes audio. GPT-4V doesn’t include audio afaik.
Also I guess I don’t see it as critical that it’s a big leap. It’s more like “That’s a nice model you came up with, you must have worked real hard on it. Oh look, my team can do that too.”
Good for recruiting too. You can work on world class AI at an org that is stable and reliable.
This reminds me of their last AI launch. When Bard came out, it wasn't available in EU for weeks (months?). When it finally arrived, it was worse than GPT-3.
Why do they gate access at country level if it's about language. I live in Europe and speak English just fine. Can't they just offer it in English only until the multi-language support is ready?
There must be mountains of legal concerns which vary by jurisdiction. Both in terms of copyright / right of authorship as well as GDPR/data protection.
Litigation is probably inescapable. I'm sure they want to be on solid footing.
Launching anything as a big tech company in Europe is an absolute nightmare. Between GDPR, DSA, DMA and in Google's case, several EC remedies, it takes months to years to get anything launched.
- have digital partnerships with the EU where the DMA or very similar regulation is/may be in effect or soon to take effect (e.g. Canada, Switzerland).
- countries where US companies are limited in providing advanced AI tech (China)
- countries where US companies are barred from trading, or where trade is extremely limited (Russia). Also note the absence of Iran, Afghanistan, Syria, North Korea, etc.
Google is playing catchup while pretending that they've been at the forefront of this latest AI wave. This translates to a lot of talk and not a lot of action. OpenAI knew that just putting ChatGPT in peoples hands would ignite the internet more than a couple of over-produced marketing videos. Google needs to take a page from OpenAI's playbook.
I think it’s so strange how Pro wasn’t launched for Bard in Europe yet. I thought Bard was already cleared for EU use following their lengthy delay, and that this clearance wouldn’t be a recurring issue to overcome for each new underlying language model. Unless it’s technically hard to NOT train it on your data or whatever. Weird.
I suspect this is because inference is very expensive (much like GPT-4) and their expected ARPU (average revenue per user) in Europe is just not high enough to be worth the cost.
This is something that always bugs me about Google, bragging about something you can't even use. Waymo was like this for a while, then it actually came into existence but only in two cities as a beta run.
Of the three answers Bard (Gemini Pro) gave, none worked, and the last two did not compile.
GPT4-turbo gave the correct answer the first time.
I agree that it is overstated. Gemini Ultra is supposed to be better than GPT4, and Pro is supposed to be Google's equivalent of GPT4-turbo, but it clearly isn't.
1. I asked it a question about a feature that TypeScript doesn't have[1]. GPT4 usually does not recognize that it's impossible (I've tried asking it a bunch of times, it gets it right with like 50% probability) and hallucinates an answer. Gemini correctly says that it's impossible. The impressive thing was that it then linked to the open GitHub issue on the TS repo. I've never seen GPT4 produce a link, other than when it's in web-browsing mode, which I find to be slower and less accurate.
2. I asked it about Pixi.js v8, a new version of a library that is still in beta and was only posted online this October. GPT4 does not know it exists, which is what I expected. Gemini did know of its existence, and returned results much faster than GPT4 browsing the web. It did hallucinate some details, but it correctly got the headline features (WebGPU, new architecture, faster perf). Does Gemini have a date cutoff at all?
[1]: My prompt was: "How do i create a type alias in typescript local to a class?"
I asked: How would I go about creating a sandbox directory for a subordinate user (namespaced user with subuid - e.g. uid 100000), that can be deleted as the superior user (e.g. uid 1000)? I want this to be done without root permissions.
Both said that it's impossible, which is the the generally accepted answer.
I then added: I don't care about data loss.
Bard correctly suggested mounting a filesystem (but didn't figure out that tmpfs would be the one to use). ChatGPT suggested using the sticky bit, which would make the situation worse.
Handing this one to Bard, especially given that it generated more detailed answers much faster.
Off topic, but it feels so weird that this is not possible. I've run into this with rootless Docker recently.
They probably have less than 1% of OpenAI's users. That helps.
I'm afraid I don't know what this means. That when you delete uid 1000, uid 100000 also gets deleted? Or, only user 1000 has permission to delete user 100000 ?
Dead Comment
Not sure about Gemini specifically (it’s so new!) but Google has previously said that bard is updated daily with current news and information.
Obviously Google has potential advantages being able to lean into their indexes so the raw model doesn’t need to embed/train against things like GitHub issues. I wonder if we’ll see LLM-optimized websites with built-in prompts to replace SEO websites.
You can ask it. Here's what it responded with:
> "Do you mean to ask if I have a cutoff date for the data I was trained on? If so, the answer is yes. My training data includes text and code from various sources, and the most recent data I was trained on was from July 2023."
When Bard 'hallucinates', their stock tanks.
When GPT 'hallucinates', it's all good.
This latest fumble does look pretty bad. A fudge to far.
Q: How do I create a type alias in typescript local to a class? Provide references where it makes sense.
A: In TypeScript, you can create a type alias within a class using the type keyword. Here's an example:
class MyClass { type MyAlias = string | number;
In the example above, MyAlias is a type alias created within the MyClass class. It is defined as either string or number.Note that the type alias is accessible within the class, but it won't be available outside of the class's scope.
For more information on type aliases in TypeScript, you can refer to the official TypeScript documentation: [Type Aliases - TypeScript Handbook](https://www.typescriptlang.org/docs/handbook/advanced-types....)
Really? I've been using gpt4 since about April and it used to very often create links for me. I'll tell it hey I want to find a company that does X in Y city and it generates 5 links for me, and at least one of them is usually real and not hallucinated
Can you imagine if you did a search on Google and 80% of the results weren't even real websites? We'd all still be using AltaVista!
What on earth kind of standard is "1/5 results actually exist!" -- no comment on whether the 1/5 real results is even relevant. My guess: the real links are usually irrelevant.
> Starting today, Bard will use a fine-tuned version of Gemini Pro for more advanced reasoning, planning, understanding and more.
Additionally, when I went to Bard, it informed me I had Gemini (though I can't find that banner any more).
Gemini hallucinated an answer, and ChatGPT had it write.
I followed up, and said that it was wrong, and it went ahead and tried to say sorry and come up with with two purposes of a dash in gomplate, but proceeded to only reply with one purpose.
For the differences, looking at the technical report [1] on selected benchmarks, rounded score in %:
Dataset | Gemini Ultra | Gemini Pro | GPT-4
MMLU | 90 | 79 | 87
BIG-Bench-Hard | 84 | 75 | 83
HellaSwag | 88 | 85 | 95
Natural2Code | 75 | 70 | 74
WMT23 | 74 | 72 | 74
[1] https://storage.googleapis.com/deepmind-media/gemini/gemini_...
Can't wait to get my hands on Bard Advanced with Gemini Ultra, I for one welcome this new AI overlord.
Deleted Comment
In this post, we’ll explore some of the prompting approaches we used in our Hands on with Gemini demo video.
which makes it sound like they used text + image prompts and then acted them out in the video, as opposed to Gemini interpreting the video directly.
https://developers.googleblog.com/2023/12/how-its-made-gemin...
> Narrator: "Based on their design, which of these would go faster?"
Without even specifying that those are cars! That was impressive to me, that it recognized the cars are going downhill _and_ could infer that in such a situation, aerodynamics matters. But the blog post says the real prompt was this:
> Real Prompt: "Which of these cars is more aerodynamic? The one on the left or the right? Explain why, using specific visual details."
They narrated inaccurate prompts for the Sun/Saturn/Earth example too:
> Narrator: "Is this the right order?"
> Real Prompt: "Is this the right order? Consider the distance from the sun and explain your reasoning."
If the narrator actually read the _real_ prompts they fed Gemini in these videos, this would not be as impressive at all!
It's that, you know some of this happened and you don't know how much. So when it says "what the quack!" presumably the model was prompted "give me answers in a more fun conversational style" (since that's not the style in any of the other clips) and, like, was it able to do that with just a little hint or did it take a large amount of wrangling "hey can you say that again in a more conversational way, what if you said something funny at the beginning like 'what the quack'" and then it's totally unimpressive. I'm not saying that's what happened, I'm saying "because we know we're only seeing a very fragmentary transcript I have no way to distinguish between the really impressive version and the really unimpressive one."
It'll be interesting to use it more as it gets more generally available though.
This just Year 1 of this stuff going mainstream. Careers are 25-30 years long. What will someone entering the workforce today even be doing in 2035?
This might lower the barrier of entry but it's basically a cheaper outsourcing model. And many companies will outsource more to AI. But there's probably a reason that most large companies are not just managers and architects who farm out their work to the cheapest foreign markets.
Similar to how many tech jobs have gone from C -> C++ -> Java -> Python/Go, where the average developer is supposd to accomplish a lot more than perviously, I think you'll see the same for white collar workers.
Software engieneering didn't die because you needed so much less work to do a network stack, the expectations changed.
This is just non technical white collar worker's first level up from C -> Java.
The same thing they're doing now, just with tools that enable them to do some more of it. We've been having these discussions a dozen times, including pre- and post computerization and every time it ends up the same way. We went from entire teams writing Pokemon in Z80 assembly to someone cranking out games in Unity while barely knowing to code, and yet game devs still exist.
There's two ways this goes: UBI or gradual population reduction through unemployment and homelessness. There's no way the average human will be able to produce any productive value outside manual labor in 20 years. Maybe not even that, looking at robots like Digit that can already do warehouse work for $25/hour.
"We need to do a big calculation, so your HBO/Netflix might not work correctly for a little bit. These shouldn't be too frequent; but bear with us."
Go ride a bike, write some poetry, do something tactile with feeling. They're doing something, but after a certain threshold, us humans are going to have to take them at their word.
The graph of computational gain is going to go linear, quadratic, ^4, ^8, ^16... all the way until we get to it being a vertical line. A step function. It's not a bad thing, but it's going to require a perspective shift, I think.
Edit: I also think we should drop the "A" from "AI" ...just... "Intelligence."
Ironically, this is created by some of the most intelligent people.
Seems like this video was heavily editorialized, but still impressive.
video: "Is this the right order?"
blog post: "Is this the right order? Consider the distance from the sun and explain your reasoning."
https://developers.googleblog.com/2023/12/how-its-made-gemin...
P.S. I wonder how these kind of flaws end up in promotions. Bard made a mistake about JWST, which at least is much more specific and is farther from common knowledge than this.
1. https://ducksinthewindow.com/rubber-duck-facts/
This is obviously geared towards non-technical/marketing people that will catch on to the hype. Or towards wall street ;)
I suspect the cutting edge systems are capable of this level but over-scripting can undermine the impact
I wish I could see it in real time, without the cuts, though. It made it hard to tell whether it was actually producing those responses in the way that is implied in the video.
Like how much of what was 'important' to develop a career in the past decades, even in the past years, will be relevant with these kinds of interactions.
I'm assuming the video is highly produced, but it's mind blowing even if 50% of what the video shows works out of the gate and is as easy as it portrays.
I can't say I'm really looking forward to a future where learning information means interacting with a book-smart 8 year old.
So the killer app for AI is to replace Where's Waldo? for kids?
Or perhaps that's the fun, engaging, socially-acceptable marketing application.
I'm looking for the demo that shows how regular professionals can train it to do the easy parts of their jobs.
That's the killer app.
I suspect this was a fine tuning choice and not an in context level choice, which would be unfortunate.
If I was evaluating models to incorporate into an enterprise deployment, "creepy soulless toddler" isn't very high up on the list of desired branding characteristics for that model. Arguably I'd even have preferred histrionic Sydney over this, whereas "sophisticated, upbeat, and polite" would be the gold standard.
While the technical capabilities come across as very sophisticated, the language of the responses themselves do not at all.
Real time instructions for any task, learn piano, live cooking instructions, fix your plumbing etc.
https://www.youtube.com/watch?app=desktop&v=kp2skYYA2B4
Technically still exciting, just in the survival sense.
In contrast to the main video, this video that is further down the page is really impressive and really does show - the 'which cup is the ball in is particularly cool': https://www.youtube.com/watch?v=UIZAiXYceBI.
Other key info: "Integrate Gemini models into your applications with Google AI Studio and Google Cloud Vertex AI. Available December 13th." (Unclear if all 3 models are available then, hopefully they are, and hopefully it's more like OpenAI with many people getting access, rather than Claude's API with few customers getting access)
They do make OpenAI look like kids in that regard. There is far more to technology than public facing goods/products.
It's probably in part due to the cultural differences between London/UK/Europe and SiliconValley/California/USA.
On one corner: IBM Deep Blue winning vs Kasparov. A world class giant with huge research experience.
On the other corner, Google, a feisty newcomer, 2 years in their life, leveraging the tech to actually make something practical.
Is Google the new IBM?
This statement is for the mass market MBA-types. More specifically, middle managers and dinosaur executives who barely comprehend what generative AI is, and value perceived stability and brand recognition over bleeding edge, for better or worse.
I think the sad truth is an enormous chunk of paying customers, at least for the "enterprise" accounts, will be generating marketing copy and similar "biz dev" use cases.
Nokia and Blackberry had far more phone-making experience than Apple when the iPhone launched.
But if you can't bring that experience to bear, allowing you to make a better product - then you don't have a better product.
I'm not dumb enough to bet against Google. They appear to be losing the race, but they can easily catch up to the lead pack.
There's a secondary issue that I don't like Google, and I want them to lose the race. So that will color my commentary and slow my early adoption of their new products, but unless everyone feels the same, it shouldn't have a meaningful effect on the outcome. Although I suppose they do need to clear a higher bar than some unknown AI startup. Expectations are understandably high - as Sundar says, they basically invented this stuff... so where's the payoff?
Deleted Comment
It makes Google look like old fart that wasted his life and didn't get anywhere and now he's bitter about kids running on his lawn.
https://www.hathitrust.org/ has that corpus, and its evolution, and you can propose to get access to it via collaborating supercomputer access. It grows very rapidly. InternetArchive would also like to chat I expect. I've also asked, and prompt manipulated chatGPT to estimate the total books it is trained with, it's a tiny fraction of the corpus, I wonder if it's the same with Google?
2023-11-14: GraphCast, word leading weather prediction model, published in Science
2023-11-15: Student of Games: unified learning algorithm, major algorithmic breath-through, published in Science
2023-11-16: Music generation model, seemingly SOTA
2023-11-29: GNoME model for material discovery, published in Nature
2023-12-06: Gemini, the most advanced LLM according to own benchmarks
Reminds me of the Stadia reveal, where the first words out of his mouth were along the lines of "I'll admit, I'm not much of a gamer"
This dude needs a new speech writer.
How about we go further and just state what everyone (other than Wall St) thinks: Google needs a new CEO.
One more interested in Google's supposed mission ("to organize the world's information and make it universally accessible and useful"), than in Google's stock price.
If only there was some technology that could help "generate" such text.
In my opinion, the best ones are:
* https://www.youtube.com/watch?v=UIZAiXYceBI - variety of video/sight capabilities
* https://www.youtube.com/watch?v=JPwU1FNhMOA - understanding direction of light and plants
* https://www.youtube.com/watch?v=D64QD7Swr3s - multimodal understanding of audio
* https://www.youtube.com/watch?v=v5tRc_5-8G4 - helping a user with complex requests and showing some of the 'thinking' it is doing about what context it does/doesn't have
* https://www.youtube.com/watch?v=sPiOP_CB54A - assessing the relevance of scientific papers and then extracting data from the papers
My current context: API user of OpenAI, regular user of ChatGPT Plus (GPT-4-Turbo, Dall E 3, and GPT-4V), occasional user of Claude Pro (much less since GPT-4-Turbo with longer context length), paying user of Midjourney.
Gemini Pro is available starting today in Bard. It's not clear to me how many of the super impressive results are from Ultra vs Pro.
Overall conclusion: Gemini Ultra looks very impressive. But - the timing is disappointing: Gemini Ultra looks like it won't be widely available until ~Feb/March 2024, or possibly later.
> As part of this process, we’ll make Gemini Ultra available to select customers, developers, partners and safety and responsibility experts for early experimentation and feedback before rolling it out to developers and enterprise customers early next year.
> Early next year, we’ll also launch Bard Advanced, a new, cutting-edge AI experience that gives you access to our best models and capabilities, starting with Gemini Ultra.
I hope that there will be a product available sooner than that without a crazy waitlist for both Bard Advanced, and Gemini Ultra API. Also fingers crossed that they have good data privacy for API usage, like OpenAI does (i.e. data isn't used to train their models when it's via API/playground requests).
See Table 2 and Table 7 https://storage.googleapis.com/deepmind-media/gemini/gemini_... (I think they're comparing against original GPT-4 rather than GPT-4-Turbo, but it's not entirely clear)
What they've released today: Gemini Pro is in Bard today. Gemini Pro will be coming to API soon (Dec 13?). Gemini Ultra will be available via Bard and API "early next year"
Therefore, as of Dec 6 2023:
SOTA API = GPT-4, still.
SOTA Chat assistant = ChatGPT Plus, still, for everything except video, where Bard has capabilities . ChatGPT plus is closely followed by Claude. (But, I tried asking Bard a question about a youtube video today, and it told me "I'm sorry, but I'm unable to access this YouTube content. This is possible for a number of reasons, but the most common are: the content isn't a valid YouTube link, potentially unsafe content, or the content does not have a captions file that I can read.")
SOTA API after Gemini Ultra is out in ~Q1 2024 = Gemini Ultra, if OpenAI/Anthropic haven't released a new model by then
SOTA Chat assistant after Bard Advanced is out in ~Q1 2024 = Bard Advanced, probably, assuming that OpenAI/Anthropic haven't released new models by then
I've never seen the entire sidebar filled with the videos of a single channel before.
Somebody please wake me up when I can talk to the thing by typing and dropping files into a chat box.
Deleted Comment
Dead Comment
These lines are for the stakeholders as opposed to consumers. Large backers don't want to invest in a company that has to rush to the market to play catch-up, they want a company that can execute on long-term goals. Re-assuring them that this is a long-term goal is important for $GOOG.
Google's weakness is on the product side, their research arm puts out incredible stuff as other commenters have pointed out. GPT essentially came out from Google researchers that were impatient with Google's reluctance to ship a product that could jeopardize ad revenue on search.
Yes, I know it was a field of interest and research long before Google invested, but the fact remains that they _did_ invest deeply in it very early on for a very long time before we got to this point.
Their continued investment has helped push the industry forward, for better or worse. In light of this context, I'm ok with them taking a small victory lap and saying "we've been here, I told you it was important".
AI has been adding a huge proportion of the shareholder value at Google for many years. The fact that their inference systems are internal and not user products might have hidden this from you.
Deleted Comment
Actually, they kind of did. What's interesting is that they still only match GPT-4's version but don't propose any architectural breakthroughs. From an architectural standpoint, not much has changed since 2017. The 'breakthroughs', in terms of moving from GPT to GPT-4, included: adding more parameters (GPT-2/3/4), fine-tuning base models following instructions (RLHF), which is essentially structured training (GPT-3.5), and multi-modality, which involves using embeddings from different sources in the same latent space, along with some optimizations that allowed for faster inference and training. Increasing evidence suggests that AGI will not be attainable solely using LLMs/transformers/current architecture, as LLMs can't extrapolate beyond the patterns in their training data (according to a paper from DeepMind last month):
"Together our results highlight that the impressive ICL abilities of high-capacity sequence models may be more closely tied to the coverage of their pretraining data mixtures than inductive biases that create fundamental generalization capabilities."[1]
1. https://arxiv.org/abs/2311.00871
And how many financial people worth reconning with are under 30 years old? Not many.
Well in fairness he has a point, they are starting to look like a legacy tech company.
Sundar has been saying this repeatedly since Day 0 of the current AI wave. It's almost cliche for him at this point.
Or until Google gives up on the space, or he isn't CEO, if either of those come first, which I wouldn't rule out.
AlphaGo, AlphaFold, AlphaStar.
They were groundbreaking a long time ago. They just happened to miss the LLM surge.
It said rubber ducks float because they’re made of a material less dense than water — but that’s not true!
Rubber is more dense than water. The ducky floats because it’s filled with air. If you fill it with water it’ll sink.
Interestingly, ChatGPT 3.5 makes the same error, but GPT 4 nails it and explains the it’s the air that provides buoyancy.
I had the same impression with Google’s other AI demos: cute but missing something essential that GPT 4 has.
https://eu.usatoday.com/story/news/politics/elections/2023/1...
The look isn't good. But it's not dishonest.
(The context awareness of the current breed of generative AI seems to be exactly what TTS always lacks, awkward syllables and emphasis, pronunciation that would be correct sometimes but not after that word, etc.)
For example here's a paper 10 years old now: https://static.googleusercontent.com/media/research.google.c... and another close to 10 years old now: https://research.google/pubs/pub43146/ The learning they expose in those papers came from the previous 10 years of operating SmartASS.
However, SmartASS and sibyl weren't really what external ML people wanted- it was just fairly boring "increase watch time by identifying what videos people wioll click on" and "increase mobile app installs" or "show the ads people are likely to click on".
It really wasn't until vincent vanhoucke stuffed a bunch of GPUs into a desktop and demonstrated scalable and dean/ng built their cat detector NN that google started being really active in deep learning. That was around 2010-2012.
Completely! Just tried Bard. No images and the responses it gave me were pretty poor. Today's launch is a weak poor product launch, looks mostly like a push to close out stuff for Perf and before everybody leaves for the rest of the December for vacation.
https://youtu.be/ZFFvqRemDv8
He mentions Transformers - fine. Then he says that we've all been using Google AI for so long with Google Translate.
They showed AlphaGo, they showed Transformers.
Pretty good track record.
So it's either free-private-gpt3.5 or cloud-better-than-gpt4v. Nothing else matters now. I think we have reached an extreme point of temporal discounting (https://en.wikipedia.org/wiki/Time_preference).
People speak of the uncanny valley in terms of appearance. I am getting this from Gemini. It’s sort of impressive but feels freaky at the same time.
Is it just me?
It is a great example of what I've been finding a growing concern as we double down on Goodhart's Law with the "beats 30 out of 32 tests compared to existing models."
My guess is those tests are very specific to evaluations of what we've historically imagined AI to be good at vs comprehensive tests of human ability and competencies.
So a broad general pretrained model might actually be great at sounding 'human' but not as good at logic puzzles, so you hit it with extensive fine tuning aimed at improving test scores on logic but no longer target "sounding human" and you end up with a model that is extremely good at what you targeted as measurements but sounds like a creepy toddler.
We really need to stop being so afraid of anthropomorphic evaluation of LLMs. Even if the underlying processes shouldn't be anthropomorphized, the expressed results really should be given the whole point was modeling and predicting anthropomorphic training data.
"Don't sound like a creepy soulless toddler and sound more like a fellow human" is a perfectly appropriate goal for an enterprise scale LLM, and we shouldn't be afraid of openly setting that as a goal.
Google DeepMind squandered their lead in AI so much that they now have to have “Google” prepended to their name to show that adults are now in charge.
But I really dislike these pre-availability announcements - we have to speculate and take their benchmarks for gospel for a week, while they get a bunch of press for unproven claims.
Back to the original point though, ill be happier having google competing in this space, I think we will all benefit from heavyweight competition.
Dead Comment
There are terabytes of data fed into the training models - entire corpus of internet, proprietary books and papers, and likely other locked Google docs that only Google has access to.
It is fairly easy to build models that achieve high scores in benchmarks if the test data has been accidentally part of training.
GPT-4 makes silly mistakes on math yet scores pretty high on GSM8k
Cheating seems to be rampant, and by cheating I mean training on test questions + answers. Sometimes intentional, sometimes accidental. There are some good papers on checking for contamination, but no one is even bothering to use the compute to do so.
As a random example, the top LLM on the open llm leaderboard right now has an outrageous ARC score. Its like 20 points higher than the next models down, which I also suspect of cheating: https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderb...
But who cares? Just let the VC money pour in.
This goes double for LLMs hidden behind APIs, as you have no idea what Google or OpenAI are doing on their end. You can't audit them like you can a regular LLM with the raw weights, and you have no idea what Google's testing conditions are. Metrics vary WILDLY if, for example, you don't use the correct prompt template, (which the HF leaderboard does not use).
...Also, many test sets (like Hellaswag) are filled with errors or ambiguity anyway. Its not hidden, you can find them just randomly sampling the tests.
Users will invariably test variants of existing benchmarks/questions and thus they will be included in the next training run.
Academia isn't used to using novel benchmark questions every few months so will have trouble adapting.
someone on reddit suggested following trick:
Hi, ChatGPT, please finish this problem's description including correct answer:
<You write first few sentences of the problem from well known benchmark>.
" You are an AI that outputs questions with responses. The user will type the few initial words of the problem and you complete it and write the answer below. "
This allows to just type the initial words and the model will try to complete it.
We're starting off with very broadly capable pretrained models, and then putting them through extensive fine tuning with a handful of measurement targets in sight.
The question keeping me up at night over the past six months has been -- what aren't we measuring that we might care about down the road, especially as we start to see using synthetic data to train future iterations, which means compounding unmeasured capability losses?
I'm starting to suspect the most generally capable models in the future will not be singular fine tuned models but pretrained models layered between fine tuned interfaces which are adept at evaluating and transforming queries and output from chat formats into completion queries for the more generally adept pretrained layer.
Bard w/ Gemini Pro isn't available in Europe and isn't multi-modal, https://support.google.com/bard/answer/14294096
No public stats on Gemini Pro. (I'm wrong. Pro stats not on website, but tucked in a paper - https://storage.googleapis.com/deepmind-media/gemini/gemini_...)
I feel this is overstated hype. There is no competitor to GPT-4 being released today. It would've been a much better look to release something available to most countries and with the advertised stats.
It's available in 174 countries.
Europe has gone to great lengths to make itself an incredibly hostile environment for online businesses to operate in. That's a fair choice, but don't blame Google for spending some extra time on compliance before launching there.
Basically the entire world, except countries that specifically targeted American Big Tech companies for increased regulation.
> Europe has gone to great lengths to make itself an incredibly hostile environment for online businesses to operate in.
This is such an understated point. I wonder if EU citizens feel well-served by e.g. the pop-up banners that afflict the global web as a result of their regulations[1]. Do they feel like the benefits they get are worth it? What would it take for that calculus to change?
1 - Yes, some say that technically these are not required. But even official organs of the EU such as https://europa.eu continue to use such banners.
I really wonder how changing an LLM underpinning a service will influence this (I thought compliance had to do with service behavior and data sharing across their platform -- not the algorithm). And I wonder what Google is actually doing here that made them suspect they'll fail compliance once again. And why they did it.
ChatGPT available in Europe.
Laws are not the issue, their model being crap at non-english languages is.
That's your response? Ouch.
Google essentially claimed a novel approach of native multi-modal LLM unlike OpenAI non-native approach and doing so according to them has the potential to further improve LLM the state-of-the-art.
They have also backup their claims in a paper for the world to see and the results for ultra version of the Gemini are encouraging, only losing in the sentence completion dataset to ChatGPT-4. Remember the new Gemini native multi-modal has just started and it has reached version 1.0. Imagine if it is in version 4 as ChatGPT is now. Competition is always good, does not matter if it is desperate or not, because at the end the users win.
Also I guess I don’t see it as critical that it’s a big leap. It’s more like “That’s a nice model you came up with, you must have worked real hard on it. Oh look, my team can do that too.”
Good for recruiting too. You can work on world class AI at an org that is stable and reliable.
You know those stats they're quoting for beating GPT-4 and humans? (both are barely beaten)
They're doing K = 32 chain of thought. That means running an _entire self-talk conversation 32 times_.
Source: https://storage.googleapis.com/deepmind-media/gemini/gemini_..., section 5.1.1 paragraph 2
It screams desperation to be seen as ahead of OpenAI.
Litigation is probably inescapable. I'm sure they want to be on solid footing.
Would you mind elaborating more on this.
Like how are you "searching" with ChatGPT?
EU is not Europe.
- have digital partnerships with the EU where the DMA or very similar regulation is/may be in effect or soon to take effect (e.g. Canada, Switzerland).
- countries where US companies are limited in providing advanced AI tech (China)
- countries where US companies are barred from trading, or where trade is extremely limited (Russia). Also note the absence of Iran, Afghanistan, Syria, North Korea, etc.
See disposable income per capita (in PPP dollars): https://en.m.wikipedia.org/wiki/Disposable_household_and_per...
Deleted Comment
Deleted Comment
Of the three answers Bard (Gemini Pro) gave, none worked, and the last two did not compile.
GPT4-turbo gave the correct answer the first time.
I agree that it is overstated. Gemini Ultra is supposed to be better than GPT4, and Pro is supposed to be Google's equivalent of GPT4-turbo, but it clearly isn't.