Readit News logoReadit News
tymonPartyLate commented on AI Mode in Search gets new agentic features and expands globally   blog.google/products/sear... · Posted by u/meetpateltech
tymonPartyLate · 3 days ago
We used to have private bridges and private roads, and that was an expensive travel situation for everyone. Now, internet search is kind of like a bridge that leads clients to businesses and Google is deciding on the tolls. Government-controlled Internet search would definitely be horrible. But I'm thinking if there is a path towards more competitiveness in this landscape, maybe the ISPs could somehow provide free search as part of the Internet service fee? Can we have more specialized, niche search engines? Can governments be asked to break up the Google search monopoly?
tymonPartyLate commented on Gemini-2.5-pro-preview-06-05   deepmind.google/models/ge... · Posted by u/jcuenod
diggan · 3 months ago
> Code that is simple, easy to read, not polluted with comments, no unnecessary crap, just pretty, clean and functional

I get that with most of the better models I've tried, although I'd probably personally favor OpenAI's models overall. I think a good system prompt is probably the best way there, rather than relying in some "innate" "clean code" behavior of specific models. This is a snippet of what I use today for coding guidelines: https://gist.github.com/victorb/1fe62fe7b80a64fc5b446f82d313...

> That being said it occasionally does something absolutely stupid. Like completely dumb

That's a bit tougher, but you have to carefully read through exactly what you said, and try to figure out what might have led it down the wrong path, or what you could have said in the first place for it avoid that. Try to work it into your system prompt, then slowly build up your system prompt so every one-shot gets closer and closer to being perfect on every first try.

tymonPartyLate · 3 months ago
Thanks for sharing, I'll copy your rules :)
tymonPartyLate commented on Gemini-2.5-pro-preview-06-05   deepmind.google/models/ge... · Posted by u/jcuenod
johnfn · 3 months ago
Impressive seeing Google notch up another ~25 ELO on lmarena, on top of the previous #1, which was also Gemini!

That being said, I'm starting to doubt the leaderboards as an accurate representation of model ability. While I do think Gemini is a good model, having used both Gemini and Claude Opus 4 extensively in the last couple of weeks I think Opus is in another league entirely. I've been dealing with a number of gnarly TypeScript issues, and after a bit Gemini would spin in circles or actually (I've never seen this before!) give up and say it can't do it. Opus solved the same problems with no sweat. I know that that's a fairly isolated anecdote and not necessarily fully indicative of overall performance, but my experience with Gemini is that it would really want to kludge on code in order to make things work, where I found Opus would tend to find cleaner approaches to the problem. Additionally, Opus just seemed to have a greater imagination? Or perhaps it has been tailored to work better in agentic scenarios? I saw it do things like dump the DOM and inspect it for issues after a particular interaction by writing a one-off playwright script, which I found particularly remarkable. My experience with Gemini is that it tries to solve bugs by reading the code really really hard, which is naturally more limited.

Again, I think Gemini is a great model, I'm very impressed with what Google has put out, and until 4.0 came out I would have said it was the best.

tymonPartyLate · 3 months ago
I just realized that Opus 4 is the first model that produced "beautiful" code for me. Code that is simple, easy to read, not polluted with comments, no unnecessary crap, just pretty, clean and functional. I had my first "wow" moment with it in a while. That being said it occasionally does something absolutely stupid. Like completely dumb. And when I ask it "why did you do this stupid thing", it replies "oh yeah, you're right, this is super wrong, here is an actual working, smart solution" (proceeds to create brilliant code)

I do not understand how those machines work.

tymonPartyLate commented on Claude can now search the web   anthropic.com/news/web-se... · Posted by u/meetpateltech
tcdent · 5 months ago
Searching the web is a great feature in theory, but every implementation I've used so far looks at the top X hits and then interprets it to be the correct answer.

When you're talking to an LLM about popular topics or common errors, the top results are often just blogspam or unresolved forum posts, so the you never get an answer to your problem.

More of an indicator that web search is more unusable than ever, but interesting that it affects the performance of generative systems, nonetheless.

tymonPartyLate · 5 months ago
This is actually not true. I'm getting traffic from ChatGpt and Perplexity to my website which is fairly new, just launched a few months ago. Our pages rarely rank in the top 4, but the AI answer engines mange to find them anyways. And I'm talking about traffic with UTM params / referrals from chatgpt, not their scraper bots.
tymonPartyLate commented on Cursor told me I should learn coding instead of asking it to generate it   forum.cursor.com/t/cursor... · Posted by u/nomilk
tymonPartyLate · 5 months ago
I asked it once to simplify code it had written and it refused. The code it wrote was ok but unnecessary in my view.

Claude 3.7: > I understand the desire to simplify, but using a text array for .... might create more problems than it solves. Here's why I recommend keeping the relational approach: ( list of okay reasons ) > However, I strongly agree with adding ..... to the model. Let's implement that change.

I was kind of shocked by the display of opinions. HAL vibes.

tymonPartyLate commented on OpenAI O3 breakthrough high score on ARC-AGI-PUB   arcprize.org/blog/oai-o3-... · Posted by u/maurycy
strangescript · 8 months ago
"We have created artificial super intelligence, it has solved physics!"

"Well, yeah, but its kind of expensive" -- this guy

tymonPartyLate · 8 months ago
Haha. Hopefully you’re right and solving the ARC puzzle translates to solving all of physics. I just remain skeptical about the OpenAI hype. They have a track record of exaggerating the significance of their releases and their impact on humanity.
tymonPartyLate commented on OpenAI O3 breakthrough high score on ARC-AGI-PUB   arcprize.org/blog/oai-o3-... · Posted by u/maurycy
tymonPartyLate · 8 months ago
Isn’t this like a brute force approach? Given it costs $ 3000 per task, thats like 600 GPU hours (h100 at Azure) In that amount of time the model can generate millions of chains of thoughts and then spend hours reviewing them or even testing them out one by one. Kind of like trying until something sticks and that happens to solve 80% of ARC. I feel like reasoning works differently in my brain. ;)
tymonPartyLate commented on Show HN: Choosing the right SaaS is a chore, so I built something to fix that   gralio.ai/... · Posted by u/tymonPartyLate
wordglyph · 8 months ago
Cool! How do you ensure the accuracy and neutrality of its recommendations, and are there plans to incorporate user feedback into the AI's learning process for even more precise matches
tymonPartyLate · 8 months ago
We rely on a multi-layered validation approach, where multiple independent sources must confirm any given data point before it’s presented. Aravind described this well on the Lex Fridman podcast. Behind every green dot is a chain of LLM prompts, reviewers, and “critics” ensuring accuracy. On top of that, user feedback and browsing patterns continuously refine the system’s weighting, so recommendations get better over time. We’re also working on a feature to show which tools your competitors are using. I think this will be really great.
tymonPartyLate commented on Show HN: Choosing the right SaaS is a chore, so I built something to fix that   gralio.ai/... · Posted by u/tymonPartyLate
therealkaczor · 8 months ago
Wow that's a time saver! But Claude or Perplexity can also give me a table, no? Not sure I see use in going to a dedicated tool.
tymonPartyLate · 8 months ago
Sure, Perplexity can present a simple table, just like it can list hotels, but you’d still visit Expedia to finalize your booking. There’s more to making informed decisions than just seeing a list of options. Right now, we’re focusing on surfacing deeper insights: detailed features, aggregated review summaries, and company health. Our goal is to go beyond a basic search and provide all the data points you need :)

u/tymonPartyLate

KarmaCake day49February 26, 2023
About
CTO gralio.ai https://x.com/tymonPartyLate
View Original