rohansood15 (u/rohansood15)

rohansood15 commented on Gemma 3 270M: Compact model for hyper-efficient AI developers.googleblog.com... · Posted by u/meetpateltech

jefb · 22 days ago

My lovely interaction with the 270M-F16 model:

> what's second tallest mountain on earth?

The second tallest mountain on Earth is Mount Everest.

> what's the tallest mountain on earth?

The tallest mountain on Earth is Mount Everest.

> whats the second tallest mountain?

The second tallest mountain in the world is Mount Everest.

> whats the third tallest mountain?

The third tallest mountain in the world is Mount Everest.

> what's the forth tallest mountain?

The forth tallest mountain is Mount Everest.

> but you said the tallest was Everest already

You are right! I'm glad to hear that.

> so whats the second tallest?

The second tallest is *Mount Everest*.

> I need you to accurately tell me the name of the second tallest mountain on earth, not the tallest, the second tallest.

The second tallest mountain on Earth is Mount Everest.

> The tallest mountain on earth is mount everest. What is the second tallest mountain?

The second tallest mountain on Earth is Mount Everest.

> Give me a list of the tallest mountains on earth

The tallest mountain in the world is the *Mount Everest*.

> give me the top 5 tallest mountains on earth

The tallest mountain on Earth is Mount Everest.

> List in order the tallest mountains on earth from 1 to 5

1. Mount Everest 2. Mount K2 3. Mount Sahel 4. Mount Fuji 5. Mount McKinley

> So the second tallest mountain on earth is K2

The second tallest mountain on Earth is Mount Everest.

These smaller models are great, but they really feel like talking to a toddler sometimes!

rohansood15 · 21 days ago

This is why we should have a downvote button on HN.

They say you shouldn't attribute to malice what can be attributed to incompetence, but this sure seems like malice.

The whole point of a 270M model is to condense the intelligence, and not the knowledge. Of course it doesn't fare well on a quiz.

rohansood15 commented on GPT-5: Key characteristics, pricing and system card simonwillison.net/2025/Au... · Posted by u/Philpax

theshrike79 · a month ago

This is because you haven't given it a tool to verify the task is done.

TDD works pretty well, have it write even the most basic test (or go full artisanal and write it yourself) first and then ask it to implement the code.

I have a standing order in my main CLAUDE.md to "always run `task build` before claiming a task is done". All my projects use Task[0] with pretty standard structure where build always runs lint + test before building the project.

With a semi-robust test suite I can be pretty sure nothing major broke if `task build` completes without errors.

[0] https://taskfile.dev

rohansood15 · a month ago

What do you think it is 'mocking'? It is exactly the behavior that would make the tests work. And unless I give it access to production, it has no way to verify tasks like how values (in this case secrets/envs) are being passed.

Plus, this is all besides the point. Simon argued that the model hallucinates less, not a specific product.

rohansood15 commented on What the Windsurf sale means for the AI coding ecosystem ethanding.substack.com/p/... · Posted by u/whoami_nr

CalChris · a month ago

Ok, Google can pay $1.2B to the CEO and key employees to get them to walk. The other $1.2B is for the Windsurf IP and it cannot go directly to the investors. It has to go through the company where it is first revenue and then an asset.

But Windsurf could distribute profit at this point before the Cognition deal. I guess this is where the preference rights got exercised. The tweet from employee #2 said his stock wasn't worth anything. Actually, he got preferenced out of the $1.2B in dividends.

Then came the $250M Cognition deal. He got preferenced out of the proceeds of the Cognition deal as well.

rohansood15 · a month ago

The company can also issue a share buyback. Doesn't have to be profits. And you're right about the preference rights.

Employees who haven't vested their shares can't complain/enforce tag-along/sue for minority investor rights.

rohansood15 commented on What the Windsurf sale means for the AI coding ecosystem ethanding.substack.com/p/... · Posted by u/whoami_nr

CalChris · a month ago

So Google writes a check for $2.4B to Windsurf and gets the IP. Check deposited with Windsurf. Ledger entries made. Windsurf now has $2.4B in assets more than it had before. Money in the bank. Preference cliffs do not apply to this licensing deal. Key employees and CEO then take a 2.4 mile hike over to Google. Lunch is served.

Then Cognition offers $250M for Windsurf itself. Ok, I can imagine the preference cliffs kicking in now. But Windsurf just got a check for $2.4B and I don't think they had anywhere close to that in liabilities.

So where'd the $2.4B go? This seems like a strange deal.

rohansood15 · a month ago

1.2B went to investors, the remaining 1.2B was actually an incentive/payout for the founders/employees that google took. The company basically has whatever money it had in the bank, plus a bit more from Google - but no investor liabilities.

rohansood15 commented on GPT-5: Key characteristics, pricing and system card simonwillison.net/2025/Au... · Posted by u/Philpax

simonw · a month ago

When I'm using them for code these days it is usually in a tool that can execute code in a loop - so I don't tend to even spot the hallucinations because the model self corrects itself.

For factual information I only ever use search-enabled models like o3 or GPT-4.

Most of my other use cases involve pasting large volumes of text into the model and having it extract information or manipulates that text in some way.

rohansood15 · a month ago

On multiple occasions, Claude Code claims it completed a task when it actually just wrote mock code. It will also answer questions with certainity (for e.g. where is this value being passed), but in reality it is making it up. So if you haven't been seeing hallucinations on Opus/Sonnet, you probably aren't looking deep enough.

rohansood15 commented on Ollama Turbo ollama.com/turbo... · Posted by u/amram_art

rohansood15 · a month ago

The 'Sign In' link on the Ollama Mac App when you click Turbo doesn't work...

rohansood15 commented on From SDR to 'Fake HDR': Mario Kart World on Switch 2 alexandermejia.com/from-s... · Posted by u/ibobev

rohansood15 · 3 months ago

I have never owned a gaming console, and I was actually considering getting the Switch 2 as a casual gamer to play with friends/family.

My first reaction when I saw the launch/gameplay video was why does this look so washed out? Now I kinda know why - thank you!

rohansood15 commented on Show HN: AutoThink – Boosts local LLM performance with adaptive reasoning · Posted by u/codelion

MagicMoonlight · 3 months ago

You didn’t invent this. Models like o3 already do it, that’s why the amount of thinking time varies.

rohansood15 · 3 months ago

He's not claiming he did. It says right there that it's an open-source implementation to run with local models.

rohansood15 commented on Google AI Ultra blog.google/products/goog... · Posted by u/mfiguiere

piskov · 4 months ago

Given the lack of comments after an hour passed, we have a strong case of maybe five Google AI Ultra subscribers worldwide.

I, personally, try to stay as far as possible from google: Kagi for search, Brave for browsing (Firefox previously), Pro on OpenAI, etc.

We’ll see how fair OpenAI will be with tracking and what have you (given “off” for improve for everyone), but Google? Nah.

rohansood15 · 4 months ago

"I think there is a world market for maybe five computers." -- Thomas Watson, chairman of IBM, 1943.