Claude Integrations - Readit News

The leap frogging at this point is getting insane (in a good way, I guess?). The amount of time each state of the art feature gets before it's supplanted is a few weeks at this point.

LLMs were always a fun novelty for me until OpenAI DeepResearch which started to actually come up with useful results on more complex programming questions (where I needed to write all the code by hand but had to pull together lots of different libraries and APIs), but it was limited to 10/month for the cheaper plan. Then Google Deep Research upgraded to 2.5 Pro and with paid usage limits of 20/day, which allowed me to just throw everything at it to the point where I'm still working through reports that are a week or more old. Oh and it searched up to 400 sources at a time, significantly more than OpenAI which made it quite useful in historical research like identifying first edition copies of books.

Now Claude is releasing the same research feature with integrations (excited to check out the Cloudflare MCP auth solution and hoping Val.town gets something similar), and a run time of up to 45 minutes. The pace of change was overwhelming half a year ago, now it's just getting ridiculous.

user_7832 · 4 months ago

I agree with your overall message - rapid growth appears to encourage competition and forces companies to put their best foot forward.

However, unfortunately, I cannot shower much praise on Claude 3.7. And if you (or anyone) asks why - 3.7 seems much better than 3.5, surely? - Then I’m moderately sure that you use Claude much more for coding than for any kind of conversation. In my opinion, even 3.5 Haiku (which is available for free during high loads) is better than 3.7 Sonnet.

Here’s a simple test. Try asking 3.7 to intuitively explain anything technical - say, mass dominated vs spring dominated oscillations. I’m a mechanical engineer who studied this stuff and I could not understand 3.7’s analogies.

I understand that coders are the largest single group of Claude’s users, but Claude went from being my most used app to being used only after both chatgpt and Gemini, something that I absolutely regret.

garrickvanburen · 4 months ago

My current hypothesis: the more familiar you are with a topic the worse the results from any LLM.

tiberriver256 · 4 months ago

3.7 did score higher in coding benchmarks but in practice 3.5 is much better at coding. 3.7 ignores instructions and does things you didn't ask it to do.

csomar · 4 months ago

Plateauing overall but apparently you can gain in certain directions while you lose on some. I've written an article a while back that current models are not that far from GPT-3.5: https://omarabid.com/gpt3-now

3.7 is definitively better at coding but you feel it lost a bit of maneuverability at other domains. For someone who wants code generated, it doesn't matter but I've found myself using DeepSeek first and then getting code output by 3.7.

fastball · 4 months ago

Seems clear to me that Claude 3.7 suffers from overfitting, probably due to Anthropic seeing that 3.5 was a smash hit in the LLM coding space and deciding their North star for 3.7 should be coding benchmarks (which, like all benchmarks, do not properly capture the process of real-world coding).

If it was actually good they would've named it 4.0, the fact that they went from 3.5 to 3.7 (weird jump) speaks volumes imo.

airstrike · 4 months ago

I too like 3.5 better than 3.7 and I use it pretty often. It's like 3.7 is better in 2 metrics but worse in 10 different ones

Deleted Comment

joshstrange · 4 months ago

I use Claude mostly for coding/technical things and something about 3.7 does not feel like an upgrade. I haven't gone back to 3.5 (mostly started using Gemini Pro 2.5 instead).

I haven't been able to use Claude research yet (it's not rolled out to the Pro tier) but o1 -> o3 deep research was a massive jump IMHO. It still isn't perfect but o1 would often give me trash results but o3 deep research actually starts to be useful.

3.5->3.7 (even with extended thinking) felt like a nothingburger.

mattlutze · 4 months ago

The expectation that one model be top marks for all things is, imo, asking too much.

greymalik · 4 months ago

Out of curiosity - can you give any examples of the programming questions you are using deep research on? I’m having a hard time thinking of how it would be helpful and could use the inspiration.

dimitri-vs · 4 months ago

Easy, any research task that will take you 5 minutes to complete it's worth firing off a Deep Research request while you work on something else in parallel.

I use it a lot when documentation is vague or outdated. When Gemini/o3 can't figure something out after 2 tries. When I am working with a service/API/framework/whatever that I am very unfamiliar with and I don't even know what to Google search.

emorning3 · 4 months ago

I often use Chrome to valid what I think I know.

I recently asked Chrome to show me how to apply the Knuth-Bendix completion procedure to propositional logic, and I had already formed my own thoughts about how to proceed (I'm building a rewrite system that does automated reasoning).

The response convinced me that I'm not a total idiot.

I'm not an academic and I'm often wrong about theory so the validation is really useful to me.

itissid · 4 months ago

I've been using it for pre scoping things I have no idea about and rapidly iterating by refeeding it a version with guard rails and conditions from previous chats.

Like I wanted to scope how to build a home made TrueNAS Scale unit, it helped me with a avoiding pitfalls like knowing that I needed two GPUs minimum to run the OS and local llms, and speed up config for a CLI back up of my Dropbox locally(it told me to use the right filesystem format over ZFS to make Dropbox client work).

It has researched how I can structure my web app for building payment system on the web(something I knew nothing about) to writing small tools to talk to my document collection and index them into collections in Anki in one day.

iLoveOncall · 4 months ago

Calling some APIs is leap-frogging? You could do this with GPT-3, nothing has changed except it's branded under a new name and tries to establish a (flawed) standard.

If there was truly any innovation still happening in OpenAI, Anthropic, etc., they would be working on models only, not on side features that someone could already develop over a weekend.

never_inline · 4 months ago

Why would you love on-call though?

risyachka · 4 months ago

What are you talking about

It is literally stagnated for a year now

All that changed is they connect more apis.

And add a thinking loop with same model powering it

This is the reason it seems fast - nothing really happens except easy things

tymscar · 4 months ago

I totally agree with you, especially if you actually try using these models, not just looking at random hype posters on twitter or skewed benchmarks.

That being said, isn’t it strange how the community has polar opposite views about this? Did anything like this ever happen before?

apwell23 · 4 months ago

> DeepResearch which started to actually come up with useful results on more complex programming questions

Is there a youtube video of ppl using this on complex open source projects like linux kernel or maybe something like pytorch.

How come none of the oss pojects( atleast not the ones i follow) are progressing fast(er) from AI like 'deepresearch'

wilg · 4 months ago

o3 since it can web search while reasoning is a really useful lighter weight deep research

ilrwbwrkhv · 4 months ago

None of those reports are any good though. Maybe for shallow research, but I haven't found them deep. Can you share what kind of research you have been trying there where it has done a great job of actual deep research.

Balgair · 4 months ago

I'm echoing this sentiment.

Deep Research hasn't really been that good for me. Maybe I'm just using it wrong?

Example: I want the precipitation in mm and monthly high and low temperature in C for the top 250 most populous cities in North America.

To me, this prompt seems like a pretty anodyne and obvious task for Deep Research. It's long, tedious, but mostly coming from well structured data sources (wikipedia) across two languages at most.

But when I put this in to any of the various models, I mostly get back ways to go and find that data myself. Like, I know how to look at Wikipedia, it's that I don't want to comb through 250 pages manually or try to write a script to handle all the HTML boxes. I want the LLM/model to do this days long tedious task for me.

Deleted Comment

xrdegen · 4 months ago

It is because you are just such a genius that already knows everything unlike us stupid people that find these tools amazingly useful and informative.

spaceman_2020 · 4 months ago

Gemini 2.5 pro was the moment for me where I really thought “this is where true adoption happens”

All those talks about AI replacing people seemed a little far fetched in 2024. But in 2025, I really think models are getting good enough

antupis · 4 months ago

You still need "human in the loop" because with simple tasks or some tasks that have lots of training material, models can one-shot answer and are like super good. But if the domain grows too complex, there are some not-so-obvious dependencies, or stuff that is in bleeding edge. Models fail pretty badly. So you need someone to split those complex tasks to more simpler familiar steps.

Is this the beginning of the apps for everything era and finally the SaaS for your LLM begins? Initially we had internet but value came when instead of installed apps, webapps arrived to become SaaS. Now if LLMs can use specific remote MCP which is another SaaS for your LLM, the remote MCP powered service can charge a subscription to do wonderful things and voila! Let the new golden age of SaaS for LLMs begin and the old fad(replace job XYZ with AI) die already.

insin · 4 months ago

It's perfect, nobody will have time to care about how many 9s your service has because the nondeterministic failure mode now sitting slap-bang in the middle is their problem!

Manfred · 4 months ago

Imagine dynamic subscription rates based on vibes where you won't even notice price hikes because not even the supplier can explain what they are.

clvx · 4 months ago

I'm more excited I can run now a custom site, hook an MCP for it, and have all the cool intelligence I had to pay for SaaS without having to integrate to them plus govern my data, it's a massive win. I just see AI assistant coding replicating current SaaS services that I can run internally. If my shop was a specific stack, I could aim to have all my supporting apps in that specific stack using AI assistant coding, simplifying operations, and being able to hook up MCP's to get intelligence from all of them.

Truly, OSS should be more interesting in the next decade for this alone.

heyheyhouhou · 4 months ago

We should all thank the chinese companies for releasing so many incredible open weight models. I hope they keep doing it, I dont want to rely on OpenAI, Anthropic or Google for all my future computer interactions.

naravara · 4 months ago

On one hand, yes this is very cool for a whole host of personal uses. On the other hand giving any company this level of access to as many different personal data sources as are out there scares the shit out of me.

I’d feel a lot better if we had something resembling a comprehensive data privacy law in the United States because I don’t want it to basically be the Wild West for anyone handling whatever personal info doesn’t get covered under HIPAA.

falcor84 · 4 months ago

Absolutely agreed, but just wanted to mention that it's essentially the same level of access you would give to Zapier, which is one of their top examples of MCP integrations.

n_ary · 4 months ago

It took many years for online tracking, iframes, sticky cookies and cambridge analytics before things like GDPR came into existence. We have to similarly wait a few years before similar major leaks happen through LLM pipelines/integrations. Sadly, that is the reality we live with.

OtherShrezzing · 4 months ago

I'd love a _tip jar_ MCP, where the LLM vendor can automatically tip my website for using its content/feature/service in a query's response. Even if the amount is absolutely minuscule, in aggregate, this might make up for ad revenue losses.

fredoliveira · 4 months ago

Not that exactly, but I just saw this on twitter a few minutes ago from Stripe: https://x.com/jeff_weinstein/status/1918029261430255626

donmcronald · 4 months ago

> Now if LLMs can use specific remote MCP which is another SaaS for your LLM, the remote MCP powered service can charge a subscription to do wonderful things and voila!

I've always worked under the assumption the best employees make themselves replaceable via well defined processes and high quality documentation. I have such a hard time understanding why there's so much willingness to integrate irreplaceable SaaS solutions into business processes.

I haven't used AI a ton, but everything I've done has focused on owning my own context, config, etc.. How much are people going to be willing to pay if someone else owns 10+ years of their AI context?

Am I crazy or is owning the context massively valuable?

brumar · 4 months ago

Hello fellow context owner. I like my modules with their context.sh at their root level. If crafted with care, magic happens. Reciprocally, when AI derails, it's most often due to bad context management and fixed by improving it.

throwaway7783 · 4 months ago

MCP is yet another interface for an existing SaaS (like UI and APIs), but now magically "agent enabled". And $$$ of course