https://aistudio.google.com/live is by far the coolest thing here. You can just go there and share your screen or camera and have a running live voice conversation with Gemini about anything you're looking at. As much as you want, for free.
I just tried having it teach me how to use Blender. It seems like it could actually be super helpful for beginners, as it has decent knowledge of the toolbars and keyboard shortcuts and can give you advice based on what it sees you doing on your screen. It also watched me play Indiana Jones and the Great Circle, and it successfully identified some of the characters and told me some information about them.
You can enable "Grounding" in the sidebar to let it use Google Search even in voice mode. The video streaming and integrated search make it far more useful than ChatGPT Advanced Voice mode is currently.
I got so hopefuly by your comment I showed it my current bug that I'm working on, I even prepared everything first with my github issue, the relevant code, the terminal with the failing tests, and I pasted it the full contents of the file and explained carefully what I wanted to achieve. As I was doing this, it repeated back to me everything I said, saying things like "if I understand correctly you're showing me a file called foo dot see see pee" and "I see you have a github issue open called extraneous spaces in frobnicator issue number sixty six" and "I see you have shared some extensive code" and after some more of this "validation"-speak it started repeating the full contents of the file like "backquote backquote backquote forward slash star .. import ess tee dee colon colon .." and so on.
Not quite up to excited junior-level programmer standards yet. But maybe good for other things who knows.
Not sure this is an AI limitation. I think you'd be better off here with the Gemini Code Assist plugin in VS Code rather than that. Sounds like the AI is provided with unstructured information compared to an actual code base.
I'm someone that becomes about 5x more productive when I have a person watching or just checking in on me (even if they're just hovering there).
Having AI to basically be that "parent" to kick me into gear would be so helpful. 90% of the time my problems are because I need someone to help keep the gears turning for me, but there isn't always someone available. This has the potential to be a person that's always available.
Just as an FYI: I recently learned (here on HN) that this is called Body Doubling[0] there's some services around (there's at least one by someone that hangs around here) that can do this too.
Interesting! I see how this could work for inattentive procrastinators. By "inattentive procrastinators", I mean people who are easily distracted and forget that they need to work on their tasks. Once reminded, they return to their tasks without much fuss.
However, I doubt it would work for hedonistic procrastinators. When body doubling, hedonistic procrastinators rely on social pressure to be productive. Using AI likely won't work unless the person perceives the AI as a human.
So why not fire 3 of your colleges and have another whos new job is watching over/checking in on you and by your own account productivity would be about the same. Save your company some money it will be appreciated!
On an unrelated note, I believe people need to start quantifying their outrageous ai productivity claims or shut up.
I'm intrigued to know whether that actually ends up working. I am something like that myself, but I don't know whether it is an effect of getting feedback or of having a person behind the feedback.
Your "parent" kicked you into gear because you have an emotional bond with them. A stranger might cause your guards to go up if you do not respect them as with wisdom. So too may go an AI.
At least you can theoretically stop sharing with this one. Microsoft was essentially trying to do this, but doing it for everything on your PC, with zero transparency.
Here's Google doing essentially the same thing, even more so that it's explicitly shipping your activity to the cloud, and this response is so different from the "we're sticking this on your machine and you can't turn it off" version Microsoft was attempting to land. This is what Microsoft should have done.
This is great! I viscerally dislike the "we're going to do art for you so you don't have to... even if you want to..." side of AI, but learning to use the tools to get the satisfaction of making it yourself is not easy! After 2 decades of working with 2D art and code separately, learning 3D stuff (if you include things like the complex and counterintuitive data flow of simulations in Houdini and the like) was as or more difficult than learning to code. Beyond that, taking classes is f'ing expensive, and more of that money goes to educational institutions than the teachers themselves. Obviously, getting beyond the basics for things that require experienced critique are just going to need human understanding, but for the base technical stuff, this is fantastic.
This isnt entierly suprising as Google have been breaking things artificially on Firefox for years now (Google Map and YouTube at least). Maybe try spoofing Chrome's user-agent.
I tried this, shared a terminal, asked it to talk about it, and it guessed that it was Google Chrome with some webUI stuff. Immediately closed the window and bailed.
I don't know what's not working but I get "Has a large language model. I don't have the capability to see your screen or any other visual input. My interactions are purely based on the text that you provide"
This'll be so fantastic once local models can do it, because nobody in the right mind would stream their voice and everything they do on their machine to Google right? Right?
Oh who am I kidding, people upload literally everything to drive lmao.
It can't make outbound network calls though, so this fails:
llm -m gemini-2.0-flash-exp -o code_execution 1 \
'write python code to retrieve https://simonwillison.net/ and use a regex to extract the title, run that code'
Code execution is okay, but soon runs into the problem of missing packages that it can't install.
Practically, sandboxing hasn't been super important for me. Running claude with mcp based shell access has been working fine for me, as long as you instruct it to use venv, temporary directory, etc.
Brown Pelican (Pelecanus occidentalis) heads are white in the breeding season. Birds start breeding aged three to five. So technically the statement is correct but I wonder if Gemini didn't get its pelicans and cormorants in a muddle. The mainland European Great Cormorant (Phalacrocorax carbo sinensis) has a head that gets progressively whiter as birds age.
Big companies can be slow to pivot, and Google has been famously bad at getting people aligned and driving in one direction.
But, once they do get moving in the right direction the can achieve things that smaller companies can't. Google has an insane amount of talent in this space, and seems to be getting the right results from that now.
Remains to be seen how well they will be able to productize and market, but hard to deny that their LLM models aren't really, really good though.
> Remains to be seen how well they will be able to productize and market
The challenge is trust.
Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.
It's hard to justify committing developers and money to a product when there's a good chance you'll just have to pivot again once they get bored. Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.
> they also have an incredibly bad track record of supporting their products
Incredibly bad track record of supporting products that don't grow. I'm not saying this to defend Google, I'm still (perhaps unreasonably) angry because of Reader, it's just that there is a pattern and AI isn't likely to fit that for a long while.
Yes. Imagine Google banning your entire Google account / Gmail because you violated their gray area AI terms ([1] or [2]). Or, one of your users did via an app you made using an API key and their models.
With that being said, I am extremely bullish on Google AI for a long time. I imagine they land at being the best and cheapest for the foreseeable future.
> But they also have an incredibly bad track record of supporting their products.
I don't know about that: my wife built her first SME on Google Workspace / GSuite / Google Apps for domain (this thing changed names so many times I lost track). She's now running her second company on Google tools, again.
All she needs is a browser. At one point I switched her from Windows to OS X. Then from OS X to Ubuntu.
Now I just installed Debian GNU/Linux on her desktop: she fires up a browser and opens up Google's GMail / GSuite / spreadsheets and does everything from there.
She's a happy paying customer of Google products since a great many years and there's actually phone support for paying customers.
I honestly don't have many bad things to say. It works fine. 2FA is top notch.
It's a much better experience than being stuck in the Windows "Updating... 35%" "here's an ad on your taskbar" "you're computer is now slow for no reason" world.
I don't think they'l pull the plug on GSuite: it's powering millions and millions of paying SMEs around the world.
> Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.
This is why we've stayed with Anthropic. Every single person I work with on my current project is sore at Google for discontinuing one product or another - and not a single one of them mentioned Reader.
We do run some non-customer facing assets in Google Cloud. But the website and API are on AWS.
>Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.
Eh... I don't know about that. Their tech graveyard isn't as populous as Google's, but it's hardly empty. A few that come to mind: ATL, MFC, Silverlight, UWP.
They have to not get blind sided by Sora, while at the same time fighting the cloud war against MS/Amazon.
Weirdly Google is THE AI play. If AI is not set to change everything and truly is a hype cycle, then Google stock withstands and grows. If AI is the real deal, then Google still withstands due to how much bigger the pie will get.
> and Google has been famously bad at getting people aligned and driving in one direction.
To be fair, it's not that they're bad at it -- it's that they generally have an explicit philosophy against it. It's a choice.
Google management doesn't want to "pick winners". It prefers to let multiple products (like messaging apps, famously) compete and let the market decide. According to this way of thinking, you come out ahead in the long run because you increase your chances of having the winning product.
Gemini is a great example of when they do choose to focus on a single strategy, however. Cloud was another great example.
I definitely agree that multiple competing products is a deliberate choice, but it was foolish to pursue it for so long in a space like messaging apps that has network effects.
BERT and Gemma 2B were both some of the highest-performing edge models of their time. Google does really well - in terms of pushing efficiency in the community they're second to none. They also don't need to rely on inordinate amounts of compute because Google's differentiating factor is the products they own and how they integrate it. OpenAI is API-minded, Google is laser-focused on the big-picture experience.
For example; those little AI-generated YouTube summaries that have been rolling out are wonderful. They don't require heavyweight LLMs to generate, and can create pretty effective summaries using nothing but a transcript. It's not only more useful than the other AI "features" I interact with regularly, it doesn't demand AGI or chain-of-thought.
> but hard to deny that their LLM models aren't really, really good though
Although I do still pay for ChatGPT, I find it dog slow. ChatGPT is simply way too slow to generate answers. It feels like --even though of course it's not doing the same thing-- I'm back to the 80s with my 8-bit computer printing thing line by line.
Gemini OTOH doesn't feel like that: answers are super fast.
To me low latency is going to be the killer feature. People won't keep paying for models that are dog slow to answer.
I'll probably be cancelling my ChatGPT subscription soon.
>> hard to deny that their LLM models aren't really, really good though.
The context window of Gemini 1.5 pro is incredibly large and it retains the memory of things in the middle of the window well. It is quite a game changer for RAG applications.
Bear in mind that a "1 million token" context window isn't actually that. You're being sold a sparse attention model, which is guaranteed to drop critical context. Google TPUs aren't running inference on a TERABYTE of fp8 query-key inputs, let alone TWO of fp16.
Maybe I've just hit a streak of good outputs, but I've also been noticing the automatic gemini search when doing google searches to have been significantly more useful than it was previously.
About a year ago, I was saying that Google was potentially walking toward its own grave due to not having any pivots that rivaled OpenAI. Now, I'm starting to think they've found the first few steps toward an incredible stride.
Yet, google continues to show it'll deprecate it's APIs, Services, and Functionality at the detriment of your own business. I'm not sure enterprises will trust Google's LLM over the alternatives. Too many have been burned throughout the years, including GCP customers.
> hard to deny that their LLM models aren't really, really good though
I'm so scarred by how much their first Gemini releases sucked that the thought of trying it again doesn't even cross my mind.
Are you telling us you're buying this press release wholesale, or you've tried the tech they're talking about and love it, or you have some additional knowledge not immediately evident here? Because it's not clear from your comment where you are getting that their LLM models are really good.
Buried in the announcement is the real gem — they’re releasing a new SDK that actually looks like it follows modern best practices. Could be a game-changer for usability.
They’ve had OpenAI-compatible endpoints for a while, but it’s never been clear how serious they were about supporting them long-term. Nice to see another option showing up. For reference, their main repo (not kidding) recommends setting up a Kubernetes cluster and a GCP bucket to submit batch requests.
its interesting that just as the LLM hype appears to be simmering down, DeepMind is making big strides. I'm more excited by this than any of OpenAI's announcements.
Beats Gemini 1.5 Pro at all but two of the listed benchmarks. Google DeepMind is starting to get their bearings in the LLM era. These are the minds behind AlphaGo/Zero/Fold. They control their own hardware destiny with TPUs. Bullish.
No, and they haven't been for at least half a year. Utterly optimized for by the providers. Nowadays if a model would be SotA for general use but not #1 on any of these benchmarks, I doubt they'd even release it.
I've started keeping an eye out for original brainteasers, just for that reason. GCHQ's Christmas puzzle just came out [1], and o1-pro got 6 out of 7 of them right. It took about 20 minutes in total.
I wasn't going to bother trying those because I was pretty sure it wouldn't get any of them, but decided to give it an easy one (#4) and was impressed at the CoT.
Meanwhile, Google's newest 2.0 Flash model went 0 for 7.
Regarding TPU’s, sure for the stuff that’s running on the cloud.
However their on device TPUs lag behind the competition and Google still seem to struggle to move significant parts of Gemini to run on device as a result.
Of course, Gemini is provided as a subscription service as well so perhaps they’re not incentivized to move things locally.
I am curious if they’ll introduce something like Apple’s private cloud compute.
i don’t think they need to win the on device market.
we need to separate inference and training - the real winners are those who have the training compute. you can always have other companies help with inference
Yeah they've been slow to release end-user facing stuff but it's obvious that they're just grinding away internally.
They've ceded the fast mover advantage, but with a massive installed base of Android devices, a team of experts who basically created the entire field, a huge hardware presence (that THEY own), massive legal expertise, existing content deals, and a suite of vertically integrated services, I feel like the game is theirs to lose at this point.
The only caution is regulation / anti-trust action, but with a Trump administration that seems far less likely.
OT: I’m not entirely sure why, but "agentic" sets my teeth on edge. I don't mind the concept, but the word itself has that hollow, buzzwordy flavor I associate with overblown LinkedIn jargon, particularly as it is not actually in the dictionary...unlike perfectly serviceable entries such as "versatile", "multifaceted" or "autonomous"
To play devil's advocate, the correct use of the word would be when multiple AIs are coordinating and handing off tasks to each other with limited context, such that the handoffs are dynamically decided at runtime by the AI, not by any routine code. I have yet to see a single example where this is required. Most problems can be solved with static workflows and simple rule based code. As such, I do believe that >95% of the usage of the word is marketing nonsense.
I actually have built such a tool (two AIs, each with different capabilities), but still cringe at calling at agentic. Might just be an instinctive reflex.
You nailed an interesting nuance there about agents needing to make their own decisions!
I'm getting fairly excited about "agentic" solutions to the point that I even went out of my way to build "AgentOfCode" (https://github.com/JasonSteving99/agent-of-code) to automate solving Advent of Code puzzles by iteratively debugging executions of generated unit tests (intentionally not competing on the global leaderboard).
And even for this, there's actually only a SINGLE place in the whole "agent" where the models themselves actually make a "decision" on what step to take next, and that's simply deciding whether to refactor the generated unit tests or the generated solution based on the given error message from a prior failure.
I think this sort of usage is already happening, but perhaps in the internal details or uninteresting parts, such as content moderation. Most good LLM products are in fact using many LLM calls under the hood, and I would expect that results from one are influencing which others get used.
I'm personally very glad that the word has adhered itself to a bunch of AI stuff, because people had started talking about "living more agentically" which I found much more aggravating. Now if anyone states that out loud you immediately picture them walking into doors and misunderstanding simple questions, so it will hopefully die out.
No, we need a scientific understanding of autonomous intelligent decision-making. The problem with “agentic AI” is the same old “Artificial Intelligence, Natural Stupidity” problem: we have no clue what “reasoning” or “intelligence” or “autonomous” actually means in animals, and trying to apply these terms to AI without understanding them (or inventing a new term without nailing down the underlying concept) is doomed to fail.
This is what other replies are missing - I've been following AI closely since GPT 2 and it's not immediately clear what agentic means, so to other people, the term must be even less clear. Using the word autonomous can't be worse than agentic imo.
Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.
I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.
I look forward to seeing how it handles upcoming problems!
I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.
For the more Java proficient, can someone explain why it may have provided this code:
for (int[] current = queue.remove(0)) {
which was a compilation error for me? The corrected code it gave me afterwards was just
for (int[] current : queue) {
and with that one change the class ran and gave the right solution.
I use a Claude and Gemini a lot for coding and I realized there is no good or best model. Every model has it's upside and downside. I was trying to get authentication working according to the newer guidelines of Manifest V3 for browser extensions and every model is terrible. It is one use case where there is not much information or right documentation so every model makesup stuff. But this is my experience and I don't speak for everyone.
Relatedly, I start to think more and more the AI is great for mediocre stuff. If you just need to do the 1000th website, it can do that. Do you want to build a new framework? Then there will probably be less many useful suggestions. (Still not useless though. I do like it a lot for refactoring while building xrcf.)
EDIT: One reason that lead me to think it's better for mediocre stuff was seeing the Sora model generate videos. Yes it can create semi-novel stuff through combinations of existing stuff, but it can't stick to a coherent "vision" throughout the video. It's not like a movie by a great director like Tarantino where every detail is right and all details point to the same vision. Instead, Sora is just flailing around. I see the same in software. Sometimes the suggestions go towards one style and the next moment into another. I guess AI currently is just way lower in their context length. Tarantino has been refining his style for 30 years now. And always he has been tuning his model towards his vision. AI in comparison seems to always just take everything and turn it into one mediocre blob. It's not useless but currently good to keep in mind I think. That you can only use it to generate mediocre stuff.
That's when having a huge context is valuable. Dump all of the new documentation into the model along with your query and the chances of success hugely increase.
This is true for all newish code bases. You need to provide the context it needs to get the problem right. It has been my experience that one or two examples with new functions or new requirements will suffice for a correction.
I can't comment on why the model gave you that code, but I can tell you why it was not correct.
`queue.remove(0)` gives you an `int[]`, which is also what you were assigning to `current`. So logically it's a single element, not an iterable. If you had wanted to iterate over each item in the array, it would need to be:
```
for (int[] current : queue) {
for (int c : current) {
// ...do stuff...
}
}
```
Alternatively, if you wanted to iterate over each element in the queue and treat the int array as a single element, the revised solution is the correct one.
The Gemini 2 models support native audio and image generation but the latter won't be generally available till January. Really excited for that as well as 4o's image generation (whenever that comes out). Steerability has lagged behind aesthetics in image generation for a while now and it's be great to see a big advance in that.
Also a whole lot of computer vision tasks (via LLMs) could be unlocked with this. Think Inpainting, Style Transfer, Text Editing in the wild, Segmentation, Edge detection etc
I asked Gemini 2.0 Flash (with my voice) whether it natively understands audio or is converting my voice to text. It replied:
"That's an insightful question. My understanding of your speech involves a pipeline first. Your voice is converted to text and then I process the text to understand what you're saying. So I don't understand your voice directly but rather through a text representation of it."
Unsure if this is a hallucination, but is disappointing if true.
Edit: Looking at the video you linked, they say "native audio output", so I assume this means the input isn't native? :(
Maybe some of these tasks are arguably not aligned with the traditional applications of CV, but Segmentation and Edge detection are definitely computer vision in every definition I've come across - before and after NNs took over.
I just tried having it teach me how to use Blender. It seems like it could actually be super helpful for beginners, as it has decent knowledge of the toolbars and keyboard shortcuts and can give you advice based on what it sees you doing on your screen. It also watched me play Indiana Jones and the Great Circle, and it successfully identified some of the characters and told me some information about them.
You can enable "Grounding" in the sidebar to let it use Google Search even in voice mode. The video streaming and integrated search make it far more useful than ChatGPT Advanced Voice mode is currently.
Not quite up to excited junior-level programmer standards yet. But maybe good for other things who knows.
Well there's your problem!
I'm someone that becomes about 5x more productive when I have a person watching or just checking in on me (even if they're just hovering there).
Having AI to basically be that "parent" to kick me into gear would be so helpful. 90% of the time my problems are because I need someone to help keep the gears turning for me, but there isn't always someone available. This has the potential to be a person that's always available.
[0] https://en.m.wikipedia.org/wiki/Body_doubling
However, I doubt it would work for hedonistic procrastinators. When body doubling, hedonistic procrastinators rely on social pressure to be productive. Using AI likely won't work unless the person perceives the AI as a human.
On an unrelated note, I believe people need to start quantifying their outrageous ai productivity claims or shut up.
Here's Google doing essentially the same thing, even more so that it's explicitly shipping your activity to the cloud, and this response is so different from the "we're sticking this on your machine and you can't turn it off" version Microsoft was attempting to land. This is what Microsoft should have done.
Quick research suggests this is part of Firefox's anti-fingerprinting functionality.
Oh who am I kidding, people upload literally everything to drive lmao.
Worth noting that the Gemini models have the ability to write and then execute Python code. I tried that like this:
Here's the result: https://gist.github.com/simonw/0d8225d62e8d87ce843fde471d143...It can't make outbound network calls though, so this fails:
Amusingly Gemini itself doesn't know that it can't make network calls, so it tries several different approaches before giving up: https://gist.github.com/simonw/2ccfdc68290b5ced24e5e0909563c...The new model seems very good at vision:
I got back a solid description, see here: https://gist.github.com/simonw/32172b6f8bcf8e55e489f10979f8f...Practically, sandboxing hasn't been super important for me. Running claude with mcp based shell access has been working fine for me, as long as you instruct it to use venv, temporary directory, etc.
https://ipython.readthedocs.io/en/stable/interactive/magics....
GitHub: https://github.com/ErikBjare/gptme
Alternately, if I wanted to pipe a bunch of screencaps into it and get one grand response, how would I do that?
e.g. "Does the user perform a thumbs up gesture in any of these stills?"
[edit: also, do you know the vision pricing? I couldn't find it easily]
Interesting theory!
Dead Comment
But, once they do get moving in the right direction the can achieve things that smaller companies can't. Google has an insane amount of talent in this space, and seems to be getting the right results from that now.
Remains to be seen how well they will be able to productize and market, but hard to deny that their LLM models aren't really, really good though.
The challenge is trust.
Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.
It's hard to justify committing developers and money to a product when there's a good chance you'll just have to pivot again once they get bored. Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.
Incredibly bad track record of supporting products that don't grow. I'm not saying this to defend Google, I'm still (perhaps unreasonably) angry because of Reader, it's just that there is a pattern and AI isn't likely to fit that for a long while.
With that being said, I am extremely bullish on Google AI for a long time. I imagine they land at being the best and cheapest for the foreseeable future.
[1] https://policies.google.com/terms/generative-ai
[2] https://policies.google.com/terms/generative-ai/use-policy
I don't know about that: my wife built her first SME on Google Workspace / GSuite / Google Apps for domain (this thing changed names so many times I lost track). She's now running her second company on Google tools, again.
All she needs is a browser. At one point I switched her from Windows to OS X. Then from OS X to Ubuntu.
Now I just installed Debian GNU/Linux on her desktop: she fires up a browser and opens up Google's GMail / GSuite / spreadsheets and does everything from there.
She's a happy paying customer of Google products since a great many years and there's actually phone support for paying customers.
I honestly don't have many bad things to say. It works fine. 2FA is top notch.
It's a much better experience than being stuck in the Windows "Updating... 35%" "here's an ad on your taskbar" "you're computer is now slow for no reason" world.
I don't think they'l pull the plug on GSuite: it's powering millions and millions of paying SMEs around the world.
We do run some non-customer facing assets in Google Cloud. But the website and API are on AWS.
Eh... I don't know about that. Their tech graveyard isn't as populous as Google's, but it's hardly empty. A few that come to mind: ATL, MFC, Silverlight, UWP.
They haven't wielded this advantage as powerfully as possible, but changes here could signal how committed they are to slaying the search cash cow.
Nadella deservedly earned acclaim for transitioning Microsoft from the Windows era to cloud and mobile.
It will be far more impressive if Google can defy the odds and conquer the innovator's dilemma with search.
Regardless, congratulations to Google on an amazing release and pushing the frontiers of innovation.
Weirdly Google is THE AI play. If AI is not set to change everything and truly is a hype cycle, then Google stock withstands and grows. If AI is the real deal, then Google still withstands due to how much bigger the pie will get.
You mean by shifting away from Windows for mobile and focusing on iOS and Android?
To be fair, it's not that they're bad at it -- it's that they generally have an explicit philosophy against it. It's a choice.
Google management doesn't want to "pick winners". It prefers to let multiple products (like messaging apps, famously) compete and let the market decide. According to this way of thinking, you come out ahead in the long run because you increase your chances of having the winning product.
Gemini is a great example of when they do choose to focus on a single strategy, however. Cloud was another great example.
As a user I always still wish that there were fewer apps with the best features of both. Google's 2(!) apps for AI podcasts being a recent example : https://notebooklm.google.com/ and https://illuminate.google.com/home
For example; those little AI-generated YouTube summaries that have been rolling out are wonderful. They don't require heavyweight LLMs to generate, and can create pretty effective summaries using nothing but a transcript. It's not only more useful than the other AI "features" I interact with regularly, it doesn't demand AGI or chain-of-thought.
This doesn't match my experience of any Google product.
Although I do still pay for ChatGPT, I find it dog slow. ChatGPT is simply way too slow to generate answers. It feels like --even though of course it's not doing the same thing-- I'm back to the 80s with my 8-bit computer printing thing line by line.
Gemini OTOH doesn't feel like that: answers are super fast.
To me low latency is going to be the killer feature. People won't keep paying for models that are dog slow to answer.
I'll probably be cancelling my ChatGPT subscription soon.
The context window of Gemini 1.5 pro is incredibly large and it retains the memory of things in the middle of the window well. It is quite a game changer for RAG applications.
Google's marketing wins again, I guess.
About a year ago, I was saying that Google was potentially walking toward its own grave due to not having any pivots that rivaled OpenAI. Now, I'm starting to think they've found the first few steps toward an incredible stride.
The fact GCP needs to have this page, and these lists are not 100% comprehensive is telling enough. https://cloud.google.com/compute/docs/deprecationshttps://cloud.google.com/chronicle/docs/deprecationshttps://developers.google.com/maps/deprecations
Steve Yegge rightfully called this out, and yet no change has been made. https://medium.com/@steve.yegge/dear-google-cloud-your-depre...
Some guy had to do it for Azure, then he went to work for them and it is now deprecated itself
https://blog.tomkerkhove.be/2023/03/29/sunsetting-azure-depr...
> hard to deny that their LLM models aren't really, really good though
I'm so scarred by how much their first Gemini releases sucked that the thought of trying it again doesn't even cross my mind.
Are you telling us you're buying this press release wholesale, or you've tried the tech they're talking about and love it, or you have some additional knowledge not immediately evident here? Because it's not clear from your comment where you are getting that their LLM models are really good.
Deleted Comment
They’ve had OpenAI-compatible endpoints for a while, but it’s never been clear how serious they were about supporting them long-term. Nice to see another option showing up. For reference, their main repo (not kidding) recommends setting up a Kubernetes cluster and a GCP bucket to submit batch requests.
[1]https://github.com/googleapis/python-genai
https://github.com/googleapis/python-genai?tab=readme-ov-fil...
Deleted Comment
I wasn't going to bother trying those because I was pretty sure it wouldn't get any of them, but decided to give it an easy one (#4) and was impressed at the CoT.
Meanwhile, Google's newest 2.0 Flash model went 0 for 7.
1: https://metro.co.uk/2024/12/11/gchq-christmas-puzzle-2024-re...
However their on device TPUs lag behind the competition and Google still seem to struggle to move significant parts of Gemini to run on device as a result.
Of course, Gemini is provided as a subscription service as well so perhaps they’re not incentivized to move things locally.
I am curious if they’ll introduce something like Apple’s private cloud compute.
we need to separate inference and training - the real winners are those who have the training compute. you can always have other companies help with inference
They've ceded the fast mover advantage, but with a massive installed base of Android devices, a team of experts who basically created the entire field, a huge hardware presence (that THEY own), massive legal expertise, existing content deals, and a suite of vertically integrated services, I feel like the game is theirs to lose at this point.
The only caution is regulation / anti-trust action, but with a Trump administration that seems far less likely.
I'm getting fairly excited about "agentic" solutions to the point that I even went out of my way to build "AgentOfCode" (https://github.com/JasonSteving99/agent-of-code) to automate solving Advent of Code puzzles by iteratively debugging executions of generated unit tests (intentionally not competing on the global leaderboard).
And even for this, there's actually only a SINGLE place in the whole "agent" where the models themselves actually make a "decision" on what step to take next, and that's simply deciding whether to refactor the generated unit tests or the generated solution based on the given error message from a prior failure.
Agentic to me means that it acts somewhat under its own authority rather than a single call to an LLM. It has a small degree of agency.
Deleted Comment
I don't think these are necessary buzzwords if the product really does what they imply.
Deleted Comment
agentic == not people.
Quite sensible, really.
Deleted Comment
Dead Comment
Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.
I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.
I look forward to seeing how it handles upcoming problems!
I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.
For the more Java proficient, can someone explain why it may have provided this code:
which was a compilation error for me? The corrected code it gave me afterwards was just and with that one change the class ran and gave the right solution.EDIT: One reason that lead me to think it's better for mediocre stuff was seeing the Sora model generate videos. Yes it can create semi-novel stuff through combinations of existing stuff, but it can't stick to a coherent "vision" throughout the video. It's not like a movie by a great director like Tarantino where every detail is right and all details point to the same vision. Instead, Sora is just flailing around. I see the same in software. Sometimes the suggestions go towards one style and the next moment into another. I guess AI currently is just way lower in their context length. Tarantino has been refining his style for 30 years now. And always he has been tuning his model towards his vision. AI in comparison seems to always just take everything and turn it into one mediocre blob. It's not useless but currently good to keep in mind I think. That you can only use it to generate mediocre stuff.
True to a point, but is anyone using GPT2 for anything still? Sometimes the better model completely supplants others.
To me that reads like it was trying to accomplish something like
`queue.remove(0)` gives you an `int[]`, which is also what you were assigning to `current`. So logically it's a single element, not an iterable. If you had wanted to iterate over each item in the array, it would need to be:
``` for (int[] current : queue) { for (int c : current) { // ...do stuff... } } ```
Alternatively, if you wanted to iterate over each element in the queue and treat the int array as a single element, the revised solution is the correct one.
Also a whole lot of computer vision tasks (via LLMs) could be unlocked with this. Think Inpainting, Style Transfer, Text Editing in the wild, Segmentation, Edge detection etc
They have a demo: https://www.youtube.com/watch?v=7RqFLp0TqV0
"That's an insightful question. My understanding of your speech involves a pipeline first. Your voice is converted to text and then I process the text to understand what you're saying. So I don't understand your voice directly but rather through a text representation of it."
Unsure if this is a hallucination, but is disappointing if true.
Edit: Looking at the video you linked, they say "native audio output", so I assume this means the input isn't native? :(
If you're using Gemini in aistudio(not sure about the real-time API but everything else) then it has native audio input