I just published an extensive review of the new feature, which is actually Claude Code Interpreter (the official name, bafflingly, is Upgraded file creation and analysis - that's what you turn on in the features page at least).
I reverse-engineered it a bit, figured out its container specs, used it to render a PDF join diagram for a SQLite database and then re-ran a much more complex "recreate this chart from this screenshot and XLSX file" example that I previously ran against ChatGPT Code Interpreter last night.
These days, I spend time training people using this kind of tools. I am glad it's called as such. It's much comfortable to explain to a tech person that it's "badly named" and that it should have been named "Code Interpreter" instead than explaining to a non tech that the "Code Interpreter" feature is a new cool way to generate documents. Most people are not that comfortable with technology, so avoiding big words is a nice to have.
pour one out for the GitLab hosted projects, or its less popular friends hosted on bitbucket, codeberg, forgejo, sourceforge, sourcehut, et al. So dumb.
It looks to me like a variant of the Code Interpreter pattern, where Claude has a (presumably sandboxed) server-side container environment in which it can run Python. When you ask it to make a spreadsheet it runs this:
What's weird is that when you enable it in https://claude.ai/settings/features it automatically disables the old Analysis tool - which used JavaScript running in your browser. For some reason you can have one of those enabled but not both.
The new feature is being described exclusively as a system for creating files though! I'm trying to figure out if that gets used for code analysis too now, in place of the analysis tool.
Odds are the new container and old JavaScript are using the same tool names/parameters. Or, perhaps, they found the tools similar enough that the model got confused having them both explained.
Anyone else having serious reliability issues with artifact editing? I find that the artifacts quite often get "stuck", where the LLM is trying to edit the artifact but the state of the artifact does not change. Seems like the LLM is somehow failing in editing the artifact silently, while thinking that it is actually doing the edits. The way to resolve this is to ask Claude to make a new artifact, which then has all the changes Claude thought it was making. But you have to do this relatively often.
I saw this yesterday. I was asking it to update an SQL query and it was saying, 'I did this' and then that wasn't in the query. I even saw it put something in the query and then remove it, and then say 'here it is'.
Maybe it's because I use the free tier web interface, but I can't get any AI to do much for me. Beyond a handful of lines (and less yesterday) it just doesn't seem that great. Or it gives me pages of javascript to show a date picker before I RTFM and found it's a single input tag to do that, because it's training data was lots of old and/or bad code and didn't do it that way.
I have had the same problem with artifacts, and I had similar problems several months ago with Claude Desktop. I stopped using those features mostly and use Claude Code instead. I don't like CC's terminal interface, but it has been more reliable for me.
It edits it for me but it tries to edit it "in place" where it messes up the version history and it looks very broken and often times is broken afterwards. Don't know why they broke their best feature while ChatGPT Canvas just works.
This has been super annoying! I just tell it to make sure the artifact is updated and it usually fixes it, but it's annoying to have to notice/keep an eye on it.
My experience is similar. At first Claude was super smart and get even very complicated things right. Now even super simple tasks are almost impossible to finish right, even if I really chop things into small steps. Also it's much slower even on Pro account than a few weeks ago.
I'm on the $200 / month account and its also slower than a few weeks ago. And struggling more and more.
I used to think of it as a decent sr dev working alongside me. Not it feels like an untrained intern that takes 4-5 shots to get things right. Hallucinated tables, columns, and HTML templates are its new favorite thing. And calling things "done" that aren't even half done and don't work in the slightest.
Same plan, same experience. Trying to get it to develop and execute tests and it frequently modifies the test to succeed even if the libraries it calls fail, and then explains that it’s doing so because the test itself works but the underlying app has errors.
Also yesterday tried to use it to debug some AWS issue and it tried to send me down so many wrong paths, and suggested changes that were either plain wrong or had unintended consequences, that if I didn't actually know my stuff and had followed blindly, the results would have been pretty bad or at least a huge time waster. When I called it out it would quickly reverse course ("You're right of course!") and it did provide some helpful snippets but I was unimpressed.
What I find it excellent at is for throw-away scripts to do small jobs or automate little things--stuff I could do but would take me a lot longer (especially in bash).
For the past two to three weeks I've noticed Claude just consistently lagging or potentially even being throttled for pretty minor coding or CLI tasks. It'll basically stop showing any progress for at least a couple minutes. Sometimes exiting the query and re-trying gets it to work but other times it keeps happening. I pay for Pro so I don't think it's just API rate limiting.
Would appreciate if that could be fixed but of course new features are more interesting for them to prioritize.
I use Claude Code at work via AWS bedrock, also personally subscribe to the $20/month Claude. Anecdotallt, Sonnet hasn't slowed down at all. ChatGPT 5 through enterprise plan, on the other hand, has noticeably slowed down or sometimes just not return anything.
I've run into similar issues too. Even small scripts or commands sometimes get throttled. It does not feel like a resource limit. It feels more like the system is just overly sensitive.
Same. My usage is via an internal corp gateway (Instacart), Sonnet 4. Used to be lighting fast, now getting regular slow downs or outright failures. Not seeing it with the various GPT models.
To everyone who has been feeling like their MAX subscription is a waste of money, give GLM 4.5 a try, i use it with claude code daily on the $3 plan and it has been great
I pay $100 a month and wouldn’t hesitate for a millisecond if I needed to pay the $200/mo plan if I hit rate limits.
It’s hard to overstate how much of a productivity shift Claude code has been for shipping major features in our app. And ours is an elixir app. It’s even better with React/NextJS.
I literally won’t be hitting any “I need to hire another programmer to handle this workload” limits any time soon.
That's not what the op asked. They didn't ask whether claude is useful in general, they asked whether it was good compared to other LLMs.
On of the tricks to a healthy discussions is to actually read/listen to what the other side is trying to say. Without that, you're just talking to yourself.
Yes it is. But totally worth it. Just got it and its quite good and quite fast. Clearly they are subsidizing even at $6.
It feels like using sonnet speed wise but with opus quality (i mean pre August Opus/sonnet -> no clue what Anthropic did after that. It's just crap now).
Anthropic are looking to make money. They need to make absolutely absurd amounts of money to afford the R&D expenses they've already incurred. Features get prioritized based on how much money they might make. Unless forced to by regulation (or maybe social pressure on the executives, but that really only comes from their same class instead of the general public these days) smaller groups of customers get served last. There aren't that many blind people, so there's not very much profit incentive to serve blind people. Unless they're actually violating the ADA or another law or regulation, and can't bribe the regulators for less than the cost of fines or fixing the issue, I'd not expect any improvement.
I reverse-engineered it a bit, figured out its container specs, used it to render a PDF join diagram for a SQLite database and then re-ran a much more complex "recreate this chart from this screenshot and XLSX file" example that I previously ran against ChatGPT Code Interpreter last night.
Here's my review: https://simonwillison.net/2025/Sep/9/claude-code-interpreter...
> Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.
The word "container" doesn't even appear in the original post from Anthropic, let alone "server-side container environment."
> github.com
pour one out for the GitLab hosted projects, or its less popular friends hosted on bitbucket, codeberg, forgejo, sourceforge, sourcehut, et al. So dumb.
It looks to me like a variant of the Code Interpreter pattern, where Claude has a (presumably sandboxed) server-side container environment in which it can run Python. When you ask it to make a spreadsheet it runs this:
And then generates and runs a Python script.What's weird is that when you enable it in https://claude.ai/settings/features it automatically disables the old Analysis tool - which used JavaScript running in your browser. For some reason you can have one of those enabled but not both.
The new feature is being described exclusively as a system for creating files though! I'm trying to figure out if that gets used for code analysis too now, in place of the analysis tool.
I tried "Tell me everything you can about your shell and Python environments" and got some interesting results after it ran a bunch of commands.
Linux runsc 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 x86_64 x86_64 GNU/Linux
Ubuntu 24.04.2 LTS
Python 3.12.3
/usr/bin/node is v18.19.1
Disk Space: 4.9GB total, with 4.6GB available
Memory: 9.0GB RAM
Attempts at making HTTP requests all seem to fail with a 403 error. Suggesting some kind of universal proxy.
But telling it to "Run pip install sqlite-utils" worked, so apparently they have allow-listed some domains such as PyPI.
I poked around more and found these environment variables:
On further poking, some of the allowed domains include github.com and pypi.org and registry.npmjs.org - the proxy is running Envoy.Anthropic have their own self-issued certificate to intercept HTTPS.
Ubuntu 24.04.2 runs on GNU/Linux 6.8+ 4.4.0 is something from Ubuntu 14.04
Maybe it's because I use the free tier web interface, but I can't get any AI to do much for me. Beyond a handful of lines (and less yesterday) it just doesn't seem that great. Or it gives me pages of javascript to show a date picker before I RTFM and found it's a single input tag to do that, because it's training data was lots of old and/or bad code and didn't do it that way.
I instruct artifacts to not be used and then explicitly provide instruction to proceed with creation when ready.
I used to think of it as a decent sr dev working alongside me. Not it feels like an untrained intern that takes 4-5 shots to get things right. Hallucinated tables, columns, and HTML templates are its new favorite thing. And calling things "done" that aren't even half done and don't work in the slightest.
Yes, I know. That’s what the test was for.
At least with local LLM, it's crap, but it's consistent crap!
Also yesterday tried to use it to debug some AWS issue and it tried to send me down so many wrong paths, and suggested changes that were either plain wrong or had unintended consequences, that if I didn't actually know my stuff and had followed blindly, the results would have been pretty bad or at least a huge time waster. When I called it out it would quickly reverse course ("You're right of course!") and it did provide some helpful snippets but I was unimpressed.
What I find it excellent at is for throw-away scripts to do small jobs or automate little things--stuff I could do but would take me a lot longer (especially in bash).
Would appreciate if that could be fixed but of course new features are more interesting for them to prioritize.
As someone who keeps oddball hours, I can tell you that time of day will very much change your experience with Claude.
2am Sunday is nothing like 2pm on a Tuesday.
Somebody call the cyber psychologist! (Cychologist?)
Dead Comment
It can actually drive emacs itself, creating buffers, being told not to edit the buffers and simply respond in the chat etc.
I actually _like_ working with efrit vs other LLM integrations in editors.
In fact I kind of need to have my anthropic console up to watch my usage... whoops!
It’s hard to overstate how much of a productivity shift Claude code has been for shipping major features in our app. And ours is an elixir app. It’s even better with React/NextJS.
I literally won’t be hitting any “I need to hire another programmer to handle this workload” limits any time soon.
On of the tricks to a healthy discussions is to actually read/listen to what the other side is trying to say. Without that, you're just talking to yourself.
https://z.ai/payment?productIds=product-6caada
It feels like using sonnet speed wise but with opus quality (i mean pre August Opus/sonnet -> no clue what Anthropic did after that. It's just crap now).
you:
> what a11y issues you see
Dead Comment