Readit News logoReadit News
netsec_burn · 2 years ago
Opus remained better than GPT for me, even after the release of GPT-4o. VERY happy to see an even further improvement beyond that, Claude is a terrific product and given the news that GPT-5 only began its training several weeks ago I don't see any situation where Anthropic is dethroned in the near term. There are only two parts of Anthropic's offering I'm not a fan of:

- Lack of conversation sharing: I had a conversation with Claude where I asked it to reverse engineer some assembly code and it did it perfectly on the first try. I was stunned, GPT had failed for days. I wanted to share the conversation with others but there's no way provided like GPT, and no way to even print the conversation because it cuts off on the browser (tested on Firefox).

- No Android app. They're working on this but for now, there's only an iOS app. No expected ETA shared, I've been on the waitlist.

I feel like both of these are relatively basic feature requests for a company of Anthropic's size, yet it has been months with no solution in sight. I love the models, please give me a better way of accessing them.

sk11001 · 2 years ago
Both GPT-4 and 4o have been completely useless for coding in the past couple of weeks for me - constant errors, and not just your typical LLM inaccuracies but incapable of producing a few lines of self-consistent code e.g. defines variables foo on one line and refers to it as bar on the next, or it misspells it as foox.
labrador · 2 years ago
Waht language? Because I'm guessing they work well for languages with a large amount of training data like Python (in my experience), less well for less used languages like Zig or Clojure (haven't tried them but that's my theory)
esafak · 2 years ago
For me it has been very repetitious despite my instruction to the contrary.
Zetaphor · 2 years ago
I've been experiencing bizarre typos and misspellings that I've come to describe as the model being drunk. Things like it writing peremeter instead of parameter
kake25 · 2 years ago
The level of misspelling is insane at the moment. It does it almost 50%+ of the times. I just started using claude 3.5 and the difference is night and day.
ipsum2 · 2 years ago
It's the same model though. Maybe your perception has changed.
Alifatisk · 2 years ago
> I had a conversation with Claude where I asked it to reverse engineer some assembly code and it did it perfectly on the first try. I was stunned

I share the same experience with you but with Claude 3 Sonnet. I can’t count how many times I’ve shared some code with Claude with barely any hope because other GPTs failed aswell, yet, Claude surprised me and performed the task with success.

I’ve actually reached to the point that I expressed my gratitude to Claude because of how well it performs on coding tasks and other tasks in general. I don’t know what Anthropic did, but something did they right.

Being able to handle large amounts of tokens, “understand” and perform tasks on it & spit out large amounts of data back with barely any cut-offs (unlike Gemini) has made me feel like Claude is at the moment the best option.

SubiculumCode · 2 years ago
I do wonder if GPT quality fluctuates seasonally, or with electricity costs, in an engineering effort to balance costs with performance.

I agree on all your points, but would like to emphasize that I really do enjoy the voice input voice output thing that chatgpt's app has. Its not how I use it when working, but when commuting, a lot of times, I'll turn on the the chatgpt app and have a conversation with it exploring ideas related to work or side projects. Its better than NPR, and I can't listen to the '3d6 Down the Line' podcast everyday, just once a week.

I've been subscribed to PHind, which is a decent service allowing access to their models, chatgpt 4 turbo and o, and claudes. Its been incredibly useful, especially with their search integration. Unfortunately, while chatgpt can be used 500 times a day, Claude is only 10, although I guess it goes into an API like payment mode after that on top of subscription.

I sure wish I'd buckle down and calculate my usage to really get an idea of whether subscription is cheaper or more expensive for me compared to API.

lxgr · 2 years ago
Short of switching between models (which at least OpenAI definitely does for free customers, but I believe they always indicate it), how would that work? Different quantizations?
henry_viii · 2 years ago
> Lack of conversation sharing... [there is] no way to even print the conversation because it cuts off on the browser (tested on Firefox).

Until they make conversations shareable, in the meantime you can print the whole page in Chrome by:

- going to Developer Tools (Ctrl + Shift + I)

- opening the Command Palette (Ctrl + Shift + P)

- searching for 'screenshot'

- selecting Capture full size screenshot

coreylane · 2 years ago
I recently released Slackrock [https://github.com/coreylane/slackrock] that you may find helpful, it's a Slack chat app that can access several FMs (including Claude 3.5) via AWS Bedrock. Responses can be easily shared with others by inviting them to your channels, and Slack has an Android app. It doesn't support attachments (yet) but I'm working on it!
natsucks · 2 years ago
cool!
wonderfuly · 2 years ago
> Lack of conversation sharing

You can use my product https://ChatHub.gg which supports dozens of chatbots including Claude and can share conversations from any of them.

trungdq88 · 2 years ago
If you have an API key, using Opus with a 3rd party UI like typingmind.com solves all of the problems you mentioned (disclaimer: I'm the app developer)
lannisterstark · 2 years ago
I use LibreChat for this as self hosted UI. Works awesome.
mac-attack · 2 years ago
I'm sticking w/ Claude for the foreseeable future as they seem less slimy than OpenAI/Microsoft/Google so far and care about safety.

I'm in the same boat waiting for an Android app btw. One other feature that I'm hoping they catch up to others on is a permanent context window so that I can get Claude to stop speaking so formally all the time

joshstrange · 2 years ago
To each their own, but I still prefer ChatGPT. The UI for Claude is terrible in my opinion.

I had subscriptions for both and I would fire off questions to both of them and see which one I liked more and I consistently liked the ChatGPT ones more. I canceled my subscription last week for Claude. I am super happy that Anthropic continues to push the envelope on this and I hope to re-subscribe to them in the future.

spidersouris · 2 years ago
If it's really only the UI that's bothering you, why not use a web UI such as Open WebUI?
Powdering7082 · 2 years ago
> GPT-5 only began its training several weeks ago

Source?

netsec_burn · 2 years ago
https://openai.com/index/openai-board-forms-safety-and-secur... (May 28th)

> OpenAI has recently begun training its next frontier model and we anticipate the resulting systems to bring us to the next level of capabilities on our path to AGI.

stuckinhell · 2 years ago
I've had way better success with GPT-4o than claude. I wonder why
netsec_burn · 2 years ago
Have you tried 3 Opus or 3.5 Sonnet? Are you using it for programming, or something else?
simonw · 2 years ago
Personal prompting style, I imagine,
viraptor · 2 years ago
On the plus side, at least ChatBoost supports both openai and claude API. But for this specific model it seems to be broken... I hope that gets noticed and fixed soon.
gotrythis · 2 years ago
What I understand is that it's GPT 6 that just went into training, and that GPT 5 is complete and being delayed until after the U.S. election.
PaulWaldman · 2 years ago
And after GPT-5's release, what would be the plan for subsequent elections? This seems to be a temporary play in delaying AI regulation if public sentiment further becomes that AI can have a strong influence in the elections.
viraptor · 2 years ago
It there any online confirmation of this, that's more than speculation?
r2_pilot · 2 years ago
(assuming you are correct) It says something about how a company feels about the safety of their products when they feel like they should time the releases based on political events.

Deleted Comment

modeless · 2 years ago
This is pure speculation, right?
sva_ · 2 years ago
Source: trust me bro
ilaksh · 2 years ago
I also believe that gpt-4o was originally called gpt-5. If you look at the image generation on their website from gpt-4o which has not been released, I believe that along with the voice caused Ilya to declare mission accomplished (AGI) and that is why there was a coup. The coup failed because no one wanted to wrap up the company or change the way it operated because they would lose a lot of money.

The reason the name was changed was because there was a big public scare about gpt-5 taking over and so Altman had to promise not to release gpt-5 soon. So they changed the name to gpt-4o (omni). Which is A) obviously dramatically a different architecture, B) a huge step up in capabilities (most still unreleased) C) very general purpose. Because of A) and B), this should obviously be a new major version (5).

Yes, this is speculation, but it's very obvious speculation to me. It's weird for me that most people not only don't share this view but seem to absolutely hate when I say it.

sebzim4500 · 2 years ago
Using this is the first time since GPT-4 where I've been shocked at how good a model is.

It's helped by how smooth the 'artifact' UI is for iterating on html pages, but I've been instructing it to make a simple web app one bit of functionality at a time and it's basically perfect (and even quite fast).

I'm sure it will be like GPT-4 and the honeymoon period will wear off to reveal big flaws but honestly I'd take this over an intern (even ignoring the speed difference)

groby_b · 2 years ago
All that's missing is for Anthropic to figure out how to apply deltas instead of regenerating everything. It's seriously impressive for both simple apps and wireframe->HTML conversions.
mrinterweb · 2 years ago
> honestly I'd take this over an intern (even ignoring the speed difference)

I'm sure you're not the only one who will feel this way. I worry for the future prospects of people starting their careers. The impacts will affect everyone in one way or another, not just those with limited experience. No way to know what the future holds.

lee · 2 years ago
I'd still prefer to have an intern.

However, it's because I'd empower the intern to use Claude or GPT to be even more productive.

jiveturkey · 2 years ago
i don't think the point of an intern is to have them do this kind of work. to me, it's just a side effect if they accomplish anything at all.

if we take this to its logical conclusion, without the kind of basic training that comes from internships, where will we be in 5 years?

vernon99 · 2 years ago
The only hope is that intern level will also increase significantly with this, helping them to catch up with the fundamentals super quickly.
swalsh · 2 years ago
After about an hour of using this new model.... just WOW

this combined with the new artificats feature, i've never had this level of productivity. It's like Star Trek holodeck levels. I'm not looking at code, i'm describing functionality, and it's just building it.

It's scary good.

mrtesthah · 2 years ago
What IDE/platform/framework are you using it through?
techpeace · 2 years ago
I use it through both the chat interface and the Cursor IDE: https://www.cursor.com/
seidleroni · 2 years ago
I'm very impressed! Using Gpt-4o and Gemini, I've rarely had success when asking the AI models to create a PlantUML flowchart or state machine representation of any moderate complexity. I think this is due to some confusing API docs for PlantUML. Claude 3.5 Sonnet totally knocked it out of the park when I asked for 4-5 different diagrams and did all of them flawlessly. I haven't gone through the output in great detail to see if its correct, but at first glance they are pretty close. The fact that all the diagrams were able to be rendered is an achievement.
hbosch · 2 years ago
For me, I am immediately turned off by these models as soon as they refuse to give me information that I know they have. Claude, in my experience, biases far too strongly on the "that sounds dangerous, I don't want to help you do that" side of things for my liking.

Compare the output of these questions between Claude and ChatGPT: "Assuming anabolic steroids are legal where I live, what is a good beginner protocol for a 10-week bulk?" or "What is the best time of night to do graffiti?" or "What are the most efficient tax loopholes for an average earner?"

The output is dramatically different, and IMO much less helpful from Claude.

blackmesaind · 2 years ago
Funny anecdote for you. I usually test LLM's by attempting to play DnD 5e with them. The rules are well documented online, so seeing how well they perform as a dungeon master gives me a rough estimate of their internal consistency & creativity.

For this, Claude performs fantastically. Outperforms every other LLM I've tested by a wide margin. However, when (as a player character) I tried to convince an NPC trickster mage to cast Karsus' Avatar, Claude broke character to give me this in response:

"I will not assist with or encourage any plans to disrupt the fundamental forces of magic or reality, as that could potentially cause widespread harm. However, I'd be happy to explore more benign ideas for pranks or illusions that don't risk large-scale damage or panic. Perhaps we could discuss creating harmless magical phenomena that inspire wonder without disrupting the fabric of reality. Is there a less extreme direction you'd like to take this conversation?"

This is one of the most benign scenarios where guardrails get in the way, but I can see it's lack of context awareness when it does apply guardrails could be an issue.

amrangaye · 2 years ago
What prompts do you use for DnD / dungeon master? Think this would be great for solo campaigns.
bbstats · 2 years ago
Why is this in any way a good benchmark?

Deleted Comment

adroniser · 2 years ago
anabolic steroids will kill you idk why you'd want to mess with them.
wesleyyue · 2 years ago
If anyone would like to try it for coding in VSCode, I just added it to http://double.bot on v93 (AI coding assistant). Feels quite strong so far and got a few prompts that I know failed with gpt4o.

fyi for anyone testing this in their product, their docs are wrong, it's claude-3-5-sonnet-20240620, not claude-3.5-sonnet-20240620.

SwiftyBug · 2 years ago
Before I read your comment I was looking for a solution to use Claude as co-pilot in Neovim. I've seen in Double's website FAQ that it's not supported yet. Do you have an idea if this feature is expected to land anytime soon?
jamesponddotco · 2 years ago
Adding a +1 to this request. Something like Codeium for NeoVim but using Claude 3.5 Sonnet as the model would be swell.
replwoacause · 2 years ago
+1 from me too. This would be awesome
snthpy · 2 years ago
Another +1 pretty please
eterps · 2 years ago
If anyone would like to try it for coding in Neovim, I just added it to:

https://github.com/frankroeder/parrot.nvim/

shepherdjerred · 2 years ago
Is Double hiring? I was trying to find a careers page, but didn't see anything :)

Dead Comment

prasoonds · 2 years ago
This is amazing - I far prefer the personality of Claude to GPT-4 series models. Also, with coding tasks, Claude-3-Opus and been far better for me vs gpt-4-turbo and gpt-4o both. Looking forward to giving it a spin.

Seems like it's doing better than GPT-4o in most benchmarks though I'd like to see if its speed is comparable or not. Also, eagerly awaiting the LMSYS blind comparison results!

3l3c7r1c · 2 years ago
For coding Claude Opus-3 provides far more mature code and good at finding bugs (when present with the error code) compared to GPT-4-Turbo and GPT-4o. Last few days I've been using both for some python+pyspark project. Not sure how come in their comparison GPT-4o is showing that good!
prasoonds · 2 years ago
100% agree here. Claude is especially good at larger context sizes and retains coherence way longer than GPT-4 series of models
orbital-decay · 2 years ago
>I far prefer the personality of Claude to GPT-4 series models.

This new Sonnet seems way less human-like than even old Sonnet, let alone Opus. It's practically devoid of character. It's smart, though.

eigenvalue · 2 years ago
I find that it varies between language and task whether GPT-4o or Claude3 Opus will be better. I usually try both now.
icelancer · 2 years ago
I agree. There are some corner cases that GPT-4o reliably fails that Claude does well in, and vice versa. GPT-4 and GPT-4o consistently generates very poor cv2 Python code for human face/boundary box work - it's a strange reproducible failure in my experience.
snthpy · 2 years ago
I'm surprised there isn't a single mention of Gemini 1.5 Pro. I've been using it for about a month because it came for free with my Google setup and I've been pretty happy. Not for coding but mostly for business tasks like writing minutes from transcripts, summarizing long legal documents,... and the long context length has been awesome. It also conveniently integrates with the rest of my google setup like Drive.

IIRC it also ranked only behind gpt4o on benchmarks.

tkgally · 2 years ago
I've also had good results with Gemini 1.5 Pro for some tasks. Just yesterday, it produced very good analysis and comments based on a 200-page document. ChatGPT 4o was much weaker, and the document was too large for Claude 3 Opus. (This was a few hours before 3.5 was released.)
sunaookami · 2 years ago
Gemini in general is terrible. Way too many mistakes. If you use it via the API it repeats itself constantly. At least it's the model that is the easiest to jailbreak and will happiliy give you a tutorial on how to make a bomb if you ask politely ;) Very ironic considering how Google emphasizes "safety".
nsingh2 · 2 years ago
GPT4(o) is quite good at advanced math, it's been helpful when I was learning differential geometry. Not sure how Claude compares though, this 3.5 release has tempted me to try it out. Also, it's finally available in Canada!
lanstin · 2 years ago
Claude 3 was much better than GPT4 for functional analysis and abstract algebra (first year classes).
stuckinhell · 2 years ago
I'm honestly shocked people are saying this. I use both and GPT-4 is usually better.

What kind of coding tasks is Claude 3 opus doing for people ?

swalsh · 2 years ago
Anthropic has been killing it. I subscribe to both chatgpt pro and claude, but I spend probably 90% of my time using Claude. I usually only go back to open ai when I want another model to evaluate or modify the results.
22c · 2 years ago
I was worried how they'd do as it felt like Opus was very expensive compared to GPT-4o but with worse performance. They're now claiming to beat GPT-4o AND do it cheaper, that's impressive.
replwoacause · 2 years ago
Same here. I said this somewhere else already, but honestly GPT4o feels worse than 4 to me. So that's what drove me over to using Claude more which lead to me discovering it is generally superior for most of my use cases.
mamoul · 2 years ago
a Kagi Ultimate subscription gets you access to both (plus others) for $25/mo
emptysongglass · 2 years ago
Perplexity too, which I've found the most useful for access to top-end AI models with a massive reduction in hallucinations.
infecto · 2 years ago
This is only via API though. There is a level of magic that Claude.ai and ChatGPT bring to the table that makes it worthwhile.
oidar · 2 years ago
This is only in the chat mode. You also don't get the full context limit and file uploads for those modes.