Gemini 2.0: our new AI model for the agentic era

https://aistudio.google.com/live is by far the coolest thing here. You can just go there and share your screen or camera and have a running live voice conversation with Gemini about anything you're looking at. As much as you want, for free.

I just tried having it teach me how to use Blender. It seems like it could actually be super helpful for beginners, as it has decent knowledge of the toolbars and keyboard shortcuts and can give you advice based on what it sees you doing on your screen. It also watched me play Indiana Jones and the Great Circle, and it successfully identified some of the characters and told me some information about them.

You can enable "Grounding" in the sidebar to let it use Google Search even in voice mode. The video streaming and integrated search make it far more useful than ChatGPT Advanced Voice mode is currently.

internet_points · a year ago

I got so hopefuly by your comment I showed it my current bug that I'm working on, I even prepared everything first with my github issue, the relevant code, the terminal with the failing tests, and I pasted it the full contents of the file and explained carefully what I wanted to achieve. As I was doing this, it repeated back to me everything I said, saying things like "if I understand correctly you're showing me a file called foo dot see see pee" and "I see you have a github issue open called extraneous spaces in frobnicator issue number sixty six" and "I see you have shared some extensive code" and after some more of this "validation"-speak it started repeating the full contents of the file like "backquote backquote backquote forward slash star .. import ess tee dee colon colon .." and so on.

Not quite up to excited junior-level programmer standards yet. But maybe good for other things who knows.

nolist_policy · a year ago

You just rediscovered that LLMs become much stupider when using images as input. I think that has been already shown for gpt-4 as well.

zamadatix · a year ago

> foo dot see see pee

Well there's your problem!

jug · a year ago

Not sure this is an AI limitation. I think you'd be better off here with the Gemini Code Assist plugin in VS Code rather than that. Sounds like the AI is provided with unstructured information compared to an actual code base.

sky2224 · a year ago

THIS is the thing I'm excited for with AI.

I'm someone that becomes about 5x more productive when I have a person watching or just checking in on me (even if they're just hovering there).

Having AI to basically be that "parent" to kick me into gear would be so helpful. 90% of the time my problems are because I need someone to help keep the gears turning for me, but there isn't always someone available. This has the potential to be a person that's always available.

Huppie · a year ago

Just as an FYI: I recently learned (here on HN) that this is called Body Doubling[0] there's some services around (there's at least one by someone that hangs around here) that can do this too.

[0] https://en.m.wikipedia.org/wiki/Body_doubling

mklepaczewski · a year ago

Interesting! I see how this could work for inattentive procrastinators. By "inattentive procrastinators", I mean people who are easily distracted and forget that they need to work on their tasks. Once reminded, they return to their tasks without much fuss.

However, I doubt it would work for hedonistic procrastinators. When body doubling, hedonistic procrastinators rely on social pressure to be productive. Using AI likely won't work unless the person perceives the AI as a human.

player1234 · a year ago

So why not fire 3 of your colleges and have another whos new job is watching over/checking in on you and by your own account productivity would be about the same. Save your company some money it will be appreciated!

On an unrelated note, I believe people need to start quantifying their outrageous ai productivity claims or shut up.

Jeff_Brown · a year ago

I'm intrigued to know whether that actually ends up working. I am something like that myself, but I don't know whether it is an effect of getting feedback or of having a person behind the feedback.

mycall · a year ago

Your "parent" kicked you into gear because you have an emotional bond with them. A stranger might cause your guards to go up if you do not respect them as with wisdom. So too may go an AI.

BornInTheUSSR · a year ago

Shameless plug, I'm working on something like this https://myaipal.kit.com/prerelease

slowmovintarget · a year ago

At least you can theoretically stop sharing with this one. Microsoft was essentially trying to do this, but doing it for everything on your PC, with zero transparency.

Here's Google doing essentially the same thing, even more so that it's explicitly shipping your activity to the cloud, and this response is so different from the "we're sticking this on your machine and you can't turn it off" version Microsoft was attempting to land. This is what Microsoft should have done.

chefandy · a year ago

This is great! I viscerally dislike the "we're going to do art for you so you don't have to... even if you want to..." side of AI, but learning to use the tools to get the satisfaction of making it yourself is not easy! After 2 decades of working with 2D art and code separately, learning 3D stuff (if you include things like the complex and counterintuitive data flow of simulations in Houdini and the like) was as or more difficult than learning to code. Beyond that, taking classes is f'ing expensive, and more of that money goes to educational institutions than the teachers themselves. Obviously, getting beyond the basics for things that require experienced critique are just going to need human understanding, but for the base technical stuff, this is fantastic.

cryptozeus · a year ago

This comment is better than entire ad google just showed. Who is still pointing at the building with camera and asking what is this building?

kridsdale1 · a year ago

I do that in Manhattan. I also do it for yonder mountains.

Brotkrumen · a year ago

Sounds interesting, but voice input isn't working for me there. I guess I'm too niche with my Mac and Firefox setup.

mentalgear · a year ago

Actually plenty of tech people are using mac & firefox

littlestymaar · a year ago

This isnt entierly suprising as Google have been breaking things artificially on Firefox for years now (Google Map and YouTube at least). Maybe try spoofing Chrome's user-agent.

SkyPuncher · a year ago

Console is throwing an error: "Connecting AudioNodes from AudioContexts with different sample-rate is currently not supported."

Quick research suggests this is part of Firefox's anti-fingerprinting functionality.

icelancer · a year ago

I tried this, shared a terminal, asked it to talk about it, and it guessed that it was Google Chrome with some webUI stuff. Immediately closed the window and bailed.

kridsdale1 · a year ago

Which terminal? Was it chromium based?

selvan · a year ago

Get started documentation on Multimodal Live API : https://ai.google.dev/api/multimodal-live

Zababa · a year ago

I don't know what's not working but I get "Has a large language model. I don't have the capability to see your screen or any other visual input. My interactions are purely based on the text that you provide"

moffkalast · a year ago

This'll be so fantastic once local models can do it, because nobody in the right mind would stream their voice and everything they do on their machine to Google right? Right?

Oh who am I kidding, people upload literally everything to drive lmao.

Big companies can be slow to pivot, and Google has been famously bad at getting people aligned and driving in one direction.

But, once they do get moving in the right direction the can achieve things that smaller companies can't. Google has an insane amount of talent in this space, and seems to be getting the right results from that now.

Remains to be seen how well they will be able to productize and market, but hard to deny that their LLM models aren't really, really good though.

StableAlkyne · a year ago

> Remains to be seen how well they will be able to productize and market

The challenge is trust.

Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.

It's hard to justify committing developers and money to a product when there's a good chance you'll just have to pivot again once they get bored. Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.

egeozcan · a year ago

> they also have an incredibly bad track record of supporting their products

Incredibly bad track record of supporting products that don't grow. I'm not saying this to defend Google, I'm still (perhaps unreasonably) angry because of Reader, it's just that there is a pattern and AI isn't likely to fit that for a long while.

TIPSIO · a year ago

Yes. Imagine Google banning your entire Google account / Gmail because you violated their gray area AI terms ([1] or [2]). Or, one of your users did via an app you made using an API key and their models.

With that being said, I am extremely bullish on Google AI for a long time. I imagine they land at being the best and cheapest for the foreseeable future.

[1] https://policies.google.com/terms/generative-ai

[2] https://policies.google.com/terms/generative-ai/use-policy

TacticalCoder · a year ago

> But they also have an incredibly bad track record of supporting their products.

I don't know about that: my wife built her first SME on Google Workspace / GSuite / Google Apps for domain (this thing changed names so many times I lost track). She's now running her second company on Google tools, again.

All she needs is a browser. At one point I switched her from Windows to OS X. Then from OS X to Ubuntu.

Now I just installed Debian GNU/Linux on her desktop: she fires up a browser and opens up Google's GMail / GSuite / spreadsheets and does everything from there.

She's a happy paying customer of Google products since a great many years and there's actually phone support for paying customers.

I honestly don't have many bad things to say. It works fine. 2FA is top notch.

It's a much better experience than being stuck in the Windows "Updating... 35%" "here's an ad on your taskbar" "you're computer is now slow for no reason" world.

I don't think they'l pull the plug on GSuite: it's powering millions and millions of paying SMEs around the world.

dotancohen · a year ago

  > Google is one of the leaders in AI and are home to incredibly talented developers. But they also have an incredibly bad track record of supporting their products.

This is why we've stayed with Anthropic. Every single person I work with on my current project is sore at Google for discontinuing one product or another - and not a single one of them mentioned Reader.

We do run some non-customer facing assets in Google Cloud. But the website and API are on AWS.

bastardoperator · a year ago

Putting your trust in Google is a fools errand. I don't know anyone that doesn't have a story.

fluoridation · a year ago

>Say what you will about Microsoft, but at least I can rely on their obsession with supporting outdated products.

Eh... I don't know about that. Their tech graveyard isn't as populous as Google's, but it's hardly empty. A few that come to mind: ATL, MFC, Silverlight, UWP.

boringg · a year ago

Can I add they have a bad track record of supporting new products. Gmail, google, gsuite seem to be well supported.

RevertBusload · a year ago

Surface Duo would like to have a word

panabee · a year ago

With many research areas converging to comparable levels, the most critical piece is arguably vertical integration and forgoing the Nvidia tax.

They haven't wielded this advantage as powerfully as possible, but changes here could signal how committed they are to slaying the search cash cow.

Nadella deservedly earned acclaim for transitioning Microsoft from the Windows era to cloud and mobile.

It will be far more impressive if Google can defy the odds and conquer the innovator's dilemma with search.

Regardless, congratulations to Google on an amazing release and pushing the frontiers of innovation.

bloomingkales · a year ago

They have to not get blind sided by Sora, while at the same time fighting the cloud war against MS/Amazon.

Weirdly Google is THE AI play. If AI is not set to change everything and truly is a hype cycle, then Google stock withstands and grows. If AI is the real deal, then Google still withstands due to how much bigger the pie will get.

TacticalCoder · a year ago

> Nadella deservedly earned acclaim for transitioning Microsoft from the Windows era to cloud and mobile.

You mean by shifting away from Windows for mobile and focusing on iOS and Android?

crowcroft · a year ago

They need an iPod to iPhone like transition. If they can pull it off it will be incredible for the business.

crazygringo · a year ago

> and Google has been famously bad at getting people aligned and driving in one direction.

To be fair, it's not that they're bad at it -- it's that they generally have an explicit philosophy against it. It's a choice.

Google management doesn't want to "pick winners". It prefers to let multiple products (like messaging apps, famously) compete and let the market decide. According to this way of thinking, you come out ahead in the long run because you increase your chances of having the winning product.

Gemini is a great example of when they do choose to focus on a single strategy, however. Cloud was another great example.

xnx · a year ago

I definitely agree that multiple competing products is a deliberate choice, but it was foolish to pursue it for so long in a space like messaging apps that has network effects.

As a user I always still wish that there were fewer apps with the best features of both. Google's 2(!) apps for AI podcasts being a recent example : https://notebooklm.google.com/ and https://illuminate.google.com/home

tbarbugli · a year ago

Google is not winning on cloud, AWS is winning and MS gaining ground.

talldayo · a year ago

BERT and Gemma 2B were both some of the highest-performing edge models of their time. Google does really well - in terms of pushing efficiency in the community they're second to none. They also don't need to rely on inordinate amounts of compute because Google's differentiating factor is the products they own and how they integrate it. OpenAI is API-minded, Google is laser-focused on the big-picture experience.

For example; those little AI-generated YouTube summaries that have been rolling out are wonderful. They don't require heavyweight LLMs to generate, and can create pretty effective summaries using nothing but a transcript. It's not only more useful than the other AI "features" I interact with regularly, it doesn't demand AGI or chain-of-thought.

closewith · a year ago

> Google is laser-focused on the big-picture experience.

This doesn't match my experience of any Google product.

TacticalCoder · a year ago

> but hard to deny that their LLM models aren't really, really good though

Although I do still pay for ChatGPT, I find it dog slow. ChatGPT is simply way too slow to generate answers. It feels like --even though of course it's not doing the same thing-- I'm back to the 80s with my 8-bit computer printing thing line by line.

Gemini OTOH doesn't feel like that: answers are super fast.

To me low latency is going to be the killer feature. People won't keep paying for models that are dog slow to answer.

I'll probably be cancelling my ChatGPT subscription soon.

pelorat · a year ago

Well, compared to github copilot (paid), I think Gemini Free is actually better at writing non-archaic code.

rafaelmn · a year ago

Using Claude 3.5 sonnet ?

jacooper · a year ago

Gemini is coming to copilot soon anyway.

manishsharan · a year ago

>> hard to deny that their LLM models aren't really, really good though.

The context window of Gemini 1.5 pro is incredibly large and it retains the memory of things in the middle of the window well. It is quite a game changer for RAG applications.

caeril · a year ago

Bear in mind that a "1 million token" context window isn't actually that. You're being sold a sparse attention model, which is guaranteed to drop critical context. Google TPUs aren't running inference on a TERABYTE of fp8 query-key inputs, let alone TWO of fp16.

Google's marketing wins again, I guess.

KaoruAoiShiho · a year ago

It looks like long context degraded from 1.5 to 2.0 according to the 2.0 launch benchmarks.

sky2224 · a year ago

Maybe I've just hit a streak of good outputs, but I've also been noticing the automatic gemini search when doing google searches to have been significantly more useful than it was previously.

About a year ago, I was saying that Google was potentially walking toward its own grave due to not having any pivots that rivaled OpenAI. Now, I'm starting to think they've found the first few steps toward an incredible stride.

bushbaba · a year ago

Yet, google continues to show it'll deprecate it's APIs, Services, and Functionality at the detriment of your own business. I'm not sure enterprises will trust Google's LLM over the alternatives. Too many have been burned throughout the years, including GCP customers.

The fact GCP needs to have this page, and these lists are not 100% comprehensive is telling enough. https://cloud.google.com/compute/docs/deprecations https://cloud.google.com/chronicle/docs/deprecations https://developers.google.com/maps/deprecations

Steve Yegge rightfully called this out, and yet no change has been made. https://medium.com/@steve.yegge/dear-google-cloud-your-depre...

verdverm · a year ago

At least GCP makes them easy to find

Some guy had to do it for Azure, then he went to work for them and it is now deprecated itself

https://blog.tomkerkhove.be/2023/03/29/sunsetting-azure-depr...

weatherlite · a year ago

GCP grew 35% last quarter , just saying ...

bwb · a year ago

So far, for my tests, it has performed terribly compared to ChatGPT and Claude. I hope this version is better.

aerhardt · a year ago

> seems to be getting the right results

> hard to deny that their LLM models aren't really, really good though

I'm so scarred by how much their first Gemini releases sucked that the thought of trying it again doesn't even cross my mind.

Are you telling us you're buying this press release wholesale, or you've tried the tech they're talking about and love it, or you have some additional knowledge not immediately evident here? Because it's not clear from your comment where you are getting that their LLM models are really good.

MaxDPS · a year ago

I’ve been using Gemini 1.5 Pro for coding and it’s been great.

Deleted Comment

modeless · a year ago

simonw · a year ago

I released a new llm-gemini plugin with support for the Gemini 2.0 Flash model, here's how to use that in the terminal:

    llm install -U llm-gemini
    llm -m gemini-2.0-flash-exp 'prompt goes here'

LLM installation: https://llm.datasette.io/en/stable/setup.html

Worth noting that the Gemini models have the ability to write and then execute Python code. I tried that like this:

    llm -m gemini-2.0-flash-exp -o code_execution 1 \
      'write and execute python to generate a 80x40 ascii art fractal'

Here's the result: https://gist.github.com/simonw/0d8225d62e8d87ce843fde471d143...

It can't make outbound network calls though, so this fails:

    llm -m gemini-2.0-flash-exp  -o code_execution 1 \
      'write python code to retrieve https://simonwillison.net/ and use a regex to extract the title, run that code'

Amusingly Gemini itself doesn't know that it can't make network calls, so it tries several different approaches before giving up: https://gist.github.com/simonw/2ccfdc68290b5ced24e5e0909563c...

The new model seems very good at vision:

    llm -m gemini-2.0-flash-exp describe -a https://static.simonwillison.net/static/2024/pelicans.jpg

I got back a solid description, see here: https://gist.github.com/simonw/32172b6f8bcf8e55e489f10979f8f...

Published some more detailed notes on my explorations of Gemini 2.0 here https://simonwillison.net/2024/Dec/11/gemini-2/

pcwelder · a year ago

Code execution is okay, but soon runs into the problem of missing packages that it can't install.

Practically, sandboxing hasn't been super important for me. Running claude with mcp based shell access has been working fine for me, as long as you instruct it to use venv, temporary directory, etc.

mnky9800n · a year ago

Can it run ipython? Then you could use ipython magic to pip install things:

https://ipython.readthedocs.io/en/stable/interactive/magics....

ErikBjare · a year ago

This is why I started building gptme almost 2 years ago. Making these agents/tools local-first has many benefits like that.

GitHub: https://github.com/ErikBjare/gptme

UltraSane · a year ago

Is there a guide on how to do that?

bravura · a year ago

Question: Have you tried using this for video?

Alternately, if I wanted to pipe a bunch of screencaps into it and get one grand response, how would I do that?

e.g. "Does the user perform a thumbs up gesture in any of these stills?"

[edit: also, do you know the vision pricing? I couldn't find it easily]

Previous Gemini models worked really well for video, and this one can even handle steaming video: https://simonwillison.net/2024/Dec/11/gemini-2/#the-streamin...

rafram · a year ago

> Some pelicans have white on their heads, suggesting that some of them are older birds.

Interesting theory!

smackay · a year ago

Brown Pelican (Pelecanus occidentalis) heads are white in the breeding season. Birds start breeding aged three to five. So technically the statement is correct but I wonder if Gemini didn't get its pelicans and cormorants in a muddle. The mainland European Great Cormorant (Phalacrocorax carbo sinensis) has a head that gets progressively whiter as birds age.

Dead Comment

serjester · a year ago

Buried in the announcement is the real gem — they’re releasing a new SDK that actually looks like it follows modern best practices. Could be a game-changer for usability.

They’ve had OpenAI-compatible endpoints for a while, but it’s never been clear how serious they were about supporting them long-term. Nice to see another option showing up. For reference, their main repo (not kidding) recommends setting up a Kubernetes cluster and a GCP bucket to submit batch requests.

[1]https://github.com/googleapis/python-genai

redrix · a year ago

Oh wow, it supports directly specifying a Pydantic model as an output schema that it will adhere to for structured JSON output. That’s fantastic!

https://github.com/googleapis/python-genai?tab=readme-ov-fil...

jcheng · a year ago

FYI all the LLM Python SDKs that support structured output can use Pydantic for the schema—at least all the ones I can think of.

pkkkzip · a year ago

its interesting that just as the LLM hype appears to be simmering down, DeepMind is making big strides. I'm more excited by this than any of OpenAI's announcements.

mark_l_watson · a year ago

I looked carefully at the SDK earlier today - it does look very nice, but it is also a work in progress.

bradhilton · a year ago

Beats Gemini 1.5 Pro at all but two of the listed benchmarks. Google DeepMind is starting to get their bearings in the LLM era. These are the minds behind AlphaGo/Zero/Fold. They control their own hardware destiny with TPUs. Bullish.

VirusNewbie · a year ago

If you look at where talent is going, it's Anthropic that is the real competitor to Google, not OpenAI.

p1esk · a year ago

Are these benchmarks still meaningful?

maeil · a year ago

No, and they haven't been for at least half a year. Utterly optimized for by the providers. Nowadays if a model would be SotA for general use but not #1 on any of these benchmarks, I doubt they'd even release it.

CamperBob2 · a year ago

I've started keeping an eye out for original brainteasers, just for that reason. GCHQ's Christmas puzzle just came out [1], and o1-pro got 6 out of 7 of them right. It took about 20 minutes in total.

I wasn't going to bother trying those because I was pretty sure it wouldn't get any of them, but decided to give it an easy one (#4) and was impressed at the CoT.

Meanwhile, Google's newest 2.0 Flash model went 0 for 7.

1: https://metro.co.uk/2024/12/11/gchq-christmas-puzzle-2024-re...

dagmx · a year ago

Regarding TPU’s, sure for the stuff that’s running on the cloud.

However their on device TPUs lag behind the competition and Google still seem to struggle to move significant parts of Gemini to run on device as a result.

Of course, Gemini is provided as a subscription service as well so perhaps they’re not incentivized to move things locally.

I am curious if they’ll introduce something like Apple’s private cloud compute.

whimsicalism · a year ago

i don’t think they need to win the on device market.

we need to separate inference and training - the real winners are those who have the training compute. you can always have other companies help with inference

mupuff1234 · a year ago

Majority of people want better performance, running locally is just a nice to have feature.

YetAnotherNick · a year ago

If the model weights is not open, you can't run it on device anyways.

JeremyNT · a year ago

Yeah they've been slow to release end-user facing stuff but it's obvious that they're just grinding away internally.

They've ceded the fast mover advantage, but with a massive installed base of Android devices, a team of experts who basically created the entire field, a huge hardware presence (that THEY own), massive legal expertise, existing content deals, and a suite of vertically integrated services, I feel like the game is theirs to lose at this point.

The only caution is regulation / anti-trust action, but with a Trump administration that seems far less likely.

airstrike · a year ago

OT: I’m not entirely sure why, but "agentic" sets my teeth on edge. I don't mind the concept, but the word itself has that hollow, buzzwordy flavor I associate with overblown LinkedIn jargon, particularly as it is not actually in the dictionary...unlike perfectly serviceable entries such as "versatile", "multifaceted" or "autonomous"

OutOfHere · a year ago

To play devil's advocate, the correct use of the word would be when multiple AIs are coordinating and handing off tasks to each other with limited context, such that the handoffs are dynamically decided at runtime by the AI, not by any routine code. I have yet to see a single example where this is required. Most problems can be solved with static workflows and simple rule based code. As such, I do believe that >95% of the usage of the word is marketing nonsense.

I actually have built such a tool (two AIs, each with different capabilities), but still cringe at calling at agentic. Might just be an instinctive reflex.

jasonsteving · a year ago

You nailed an interesting nuance there about agents needing to make their own decisions!

I'm getting fairly excited about "agentic" solutions to the point that I even went out of my way to build "AgentOfCode" (https://github.com/JasonSteving99/agent-of-code) to automate solving Advent of Code puzzles by iteratively debugging executions of generated unit tests (intentionally not competing on the global leaderboard).

And even for this, there's actually only a SINGLE place in the whole "agent" where the models themselves actually make a "decision" on what step to take next, and that's simply deciding whether to refactor the generated unit tests or the generated solution based on the given error message from a prior failure.

danpalmer · a year ago

I think this sort of usage is already happening, but perhaps in the internal details or uninteresting parts, such as content moderation. Most good LLM products are in fact using many LLM calls under the hood, and I would expect that results from one are influencing which others get used.

wepple · a year ago

Versatile is far worse. It’s so broad to the point of meaninglessness. My garden rake is fairly versatile.

Agentic to me means that it acts somewhat under its own authority rather than a single call to an LLM. It has a small degree of agency.

thom · a year ago

I'm personally very glad that the word has adhered itself to a bunch of AI stuff, because people had started talking about "living more agentically" which I found much more aggravating. Now if anyone states that out loud you immediately picture them walking into doors and misunderstanding simple questions, so it will hopefully die out.

ramoz · a year ago

Need a general term for autonomous intelligent decision making.

aithrowawaycomm · a year ago

No, we need a scientific understanding of autonomous intelligent decision-making. The problem with “agentic AI” is the same old “Artificial Intelligence, Natural Stupidity” problem: we have no clue what “reasoning” or “intelligence” or “autonomous” actually means in animals, and trying to apply these terms to AI without understanding them (or inventing a new term without nailing down the underlying concept) is doomed to fail.

Isn't that just "intelligent"?

geodel · a year ago

Huh, all three words you mentioned as replacement are equally buzzwordy and I see them a lot in CVs while screen candidates for job interview.

lolinder · a year ago

They agree—they're saying that at least those buzzwords are in the dictionary, not that they'd be a good replacement for "agentic".

raincole · a year ago

Versatile implies it can to more kinds of tasks (than it's predecessor or competitor). Agentic implies it requires less human intervention.

I don't think these are necessary buzzwords if the product really does what they imply.

At least all three of them are actually in the dictionary

m3kw9 · a year ago

Yeah I hate it when AI companies throw around words like AGI and agentic capabilities. It’s non sense to most people and ambiguous at best

christianqchung · a year ago

This is what other replies are missing - I've been following AI closely since GPT 2 and it's not immediately clear what agentic means, so to other people, the term must be even less clear. Using the word autonomous can't be worse than agentic imo.

dsr_ · a year ago

agentic obviously means "not gentic" and gent is from the Latin for "people".

agentic == not people.

Quite sensible, really.

heresie-dabord · a year ago

Take good care of your teeth, my friend. It's a new era in agentic factuality.

losvedir · a year ago

This naming is confusing...

Anyway, I'm glad that this Google release is actually available right away! I pay for Gemini Advanced and I see "Gemini Flash 2.0" as an option in the model selector.

I've been going through Advent of Code this year, and testing each problem with each model (GPT-4o, o1, o1 Pro, Claude Sonnet, Opus, Gemini Pro 1.5). Gemini has done decent, but is probably the weakest of the bunch. It failed (unexpectedly to me) on Day 10, but when I tried Flash 2.0 it got it! So at least in that one benchmark, the new Flash 2.0 edged out Pro 1.5.

I look forward to seeing how it handles upcoming problems!

I should say: Gemini Flash didn't quite get it out of the box. It actually had a syntax error in the for loop, which caused it to fail to compile, which is an unusual failure mode for these models. Maybe it was a different version of Java or something (I'm also trying to learn Java with AoC this year...). But when I gave Flash 2.0 the compilation error, it did fix it.

For the more Java proficient, can someone explain why it may have provided this code:

     for (int[] current = queue.remove(0)) {

which was a compilation error for me? The corrected code it gave me afterwards was just

     for (int[] current : queue) {

and with that one change the class ran and gave the right solution.

srameshc · a year ago

I use a Claude and Gemini a lot for coding and I realized there is no good or best model. Every model has it's upside and downside. I was trying to get authentication working according to the newer guidelines of Manifest V3 for browser extensions and every model is terrible. It is one use case where there is not much information or right documentation so every model makesup stuff. But this is my experience and I don't speak for everyone.

huijzer · a year ago

Relatedly, I start to think more and more the AI is great for mediocre stuff. If you just need to do the 1000th website, it can do that. Do you want to build a new framework? Then there will probably be less many useful suggestions. (Still not useless though. I do like it a lot for refactoring while building xrcf.)

EDIT: One reason that lead me to think it's better for mediocre stuff was seeing the Sora model generate videos. Yes it can create semi-novel stuff through combinations of existing stuff, but it can't stick to a coherent "vision" throughout the video. It's not like a movie by a great director like Tarantino where every detail is right and all details point to the same vision. Instead, Sora is just flailing around. I see the same in software. Sometimes the suggestions go towards one style and the next moment into another. I guess AI currently is just way lower in their context length. Tarantino has been refining his style for 30 years now. And always he has been tuning his model towards his vision. AI in comparison seems to always just take everything and turn it into one mediocre blob. It's not useless but currently good to keep in mind I think. That you can only use it to generate mediocre stuff.

copperx · a year ago

That's when having a huge context is valuable. Dump all of the new documentation into the model along with your query and the chances of success hugely increase.

monkmartinez · a year ago

This is true for all newish code bases. You need to provide the context it needs to get the problem right. It has been my experience that one or two examples with new functions or new requirements will suffice for a correction.

> I use a Claude and Gemini a lot for coding and I realized there is no good or best model.

True to a point, but is anyone using GPT2 for anything still? Sometimes the better model completely supplants others.

notamy · a year ago

> For the more Java proficient, can someone explain why it may have provided this code:

To me that reads like it was trying to accomplish something like

    int[] current;
    while((current = queue.pop()) != null) {

rybosome · a year ago

I can't comment on why the model gave you that code, but I can tell you why it was not correct.

`queue.remove(0)` gives you an `int[]`, which is also what you were assigning to `current`. So logically it's a single element, not an iterable. If you had wanted to iterate over each item in the array, it would need to be:

``` for (int[] current : queue) { for (int c : current) { // ...do stuff... } } ```

Alternatively, if you wanted to iterate over each element in the queue and treat the int array as a single element, the revised solution is the correct one.

ianmcgowan · a year ago

A tangent, but is there a clear best choice amongst those models for AOC type questions?

famouswaffles · a year ago

The Gemini 2 models support native audio and image generation but the latter won't be generally available till January. Really excited for that as well as 4o's image generation (whenever that comes out). Steerability has lagged behind aesthetics in image generation for a while now and it's be great to see a big advance in that.

Also a whole lot of computer vision tasks (via LLMs) could be unlocked with this. Think Inpainting, Style Transfer, Text Editing in the wild, Segmentation, Edge detection etc

They have a demo: https://www.youtube.com/watch?v=7RqFLp0TqV0

kthartic · a year ago

I asked Gemini 2.0 Flash (with my voice) whether it natively understands audio or is converting my voice to text. It replied:

"That's an insightful question. My understanding of your speech involves a pipeline first. Your voice is converted to text and then I process the text to understand what you're saying. So I don't understand your voice directly but rather through a text representation of it."

Unsure if this is a hallucination, but is disappointing if true.

Edit: Looking at the video you linked, they say "native audio output", so I assume this means the input isn't native? :(

Native audio output won't be in general availability until early next year.

If you're using Gemini in aistudio(not sure about the real-time API but everything else) then it has native audio input

jncfhnb · a year ago

These are not computer vision tasks…

newfocogi · a year ago

Maybe some of these tasks are arguably not aligned with the traditional applications of CV, but Segmentation and Edge detection are definitely computer vision in every definition I've come across - before and after NNs took over.

Jabrov · a year ago

What are they, then…?