Readit News logoReadit News
simonw · 5 months ago
I just published an extensive review of the new feature, which is actually Claude Code Interpreter (the official name, bafflingly, is Upgraded file creation and analysis - that's what you turn on in the features page at least).

I reverse-engineered it a bit, figured out its container specs, used it to render a PDF join diagram for a SQLite database and then re-ran a much more complex "recreate this chart from this screenshot and XLSX file" example that I previously ran against ChatGPT Code Interpreter last night.

Here's my review: https://simonwillison.net/2025/Sep/9/claude-code-interpreter...

brumar · 5 months ago
These days, I spend time training people using this kind of tools. I am glad it's called as such. It's much comfortable to explain to a tech person that it's "badly named" and that it should have been named "Code Interpreter" instead than explaining to a non tech that the "Code Interpreter" feature is a new cool way to generate documents. Most people are not that comfortable with technology, so avoiding big words is a nice to have.
dang · 5 months ago
I've nicked a sentence from your article to use as the title above. Hope that's clearer!
rob · 5 months ago
https://news.ycombinator.com/newsguidelines.html

> Otherwise please use the original title, unless it is misleading or linkbait; don't editorialize.

The word "container" doesn't even appear in the original post from Anthropic, let alone "server-side container environment."

gk1 · 5 months ago
Way less clear. Anthropic did it right and wrote about the “so what” instead of focusing on the underlying mechanics.
swyx · 5 months ago
yeah thats editorializing man, and not the good kind. leave that to simonw's blog.
mvdtnz · 5 months ago
It's much less clear.
cjonas · 5 months ago
Given their relationship with AWS, I wonder if this feature just runs the agent core code interpreter behind the scenes.
mdaniel · 5 months ago
> Version Control

> github.com

pour one out for the GitLab hosted projects, or its less popular friends hosted on bitbucket, codeberg, forgejo, sourceforge, sourcehut, et al. So dumb.

tyre · 5 months ago
I’m sure they’ll add support, they literally just launched
plaguuuuuu · 5 months ago
If they made Git decentralised, so that you could mirror stuff on github, it might solve that issue!
simonw · 5 months ago
This feature is a little confusing.

It looks to me like a variant of the Code Interpreter pattern, where Claude has a (presumably sandboxed) server-side container environment in which it can run Python. When you ask it to make a spreadsheet it runs this:

  pip install openpyxl pandas --break-system-packages
And then generates and runs a Python script.

What's weird is that when you enable it in https://claude.ai/settings/features it automatically disables the old Analysis tool - which used JavaScript running in your browser. For some reason you can have one of those enabled but not both.

The new feature is being described exclusively as a system for creating files though! I'm trying to figure out if that gets used for code analysis too now, in place of the analysis tool.

simonw · 5 months ago
It works for me on the https://claude.ai web all but doesn't appear to work in the Claude iOS app.

I tried "Tell me everything you can about your shell and Python environments" and got some interesting results after it ran a bunch of commands.

Linux runsc 4.4.0 #1 SMP Sun Jan 10 15:06:54 PST 2016 x86_64 x86_64 x86_64 GNU/Linux

Ubuntu 24.04.2 LTS

Python 3.12.3

/usr/bin/node is v18.19.1

Disk Space: 4.9GB total, with 4.6GB available

Memory: 9.0GB RAM

Attempts at making HTTP requests all seem to fail with a 403 error. Suggesting some kind of universal proxy.

But telling it to "Run pip install sqlite-utils" worked, so apparently they have allow-listed some domains such as PyPI.

I poked around more and found these environment variables:

  HTTPS_PROXY=http://21.0.0.167:15001
  HTTP_PROXY=http://21.0.0.167:15001
On further poking, some of the allowed domains include github.com and pypi.org and registry.npmjs.org - the proxy is running Envoy.

Anthropic have their own self-issued certificate to intercept HTTPS.

simonw · 5 months ago
Turns out the allowlist is fully documented here: https://support.anthropic.com/en/articles/12111783-create-an...
s1110 · 5 months ago
> Linux runsc 4.4.0

Ubuntu 24.04.2 runs on GNU/Linux 6.8+ 4.4.0 is something from Ubuntu 14.04

brookst · 5 months ago
Odds are the new container and old JavaScript are using the same tool names/parameters. Or, perhaps, they found the tools similar enough that the model got confused having them both explained.
amilios · 5 months ago
Anyone else having serious reliability issues with artifact editing? I find that the artifacts quite often get "stuck", where the LLM is trying to edit the artifact but the state of the artifact does not change. Seems like the LLM is somehow failing in editing the artifact silently, while thinking that it is actually doing the edits. The way to resolve this is to ask Claude to make a new artifact, which then has all the changes Claude thought it was making. But you have to do this relatively often.
dajtxx · 5 months ago
I saw this yesterday. I was asking it to update an SQL query and it was saying, 'I did this' and then that wasn't in the query. I even saw it put something in the query and then remove it, and then say 'here it is'.

Maybe it's because I use the free tier web interface, but I can't get any AI to do much for me. Beyond a handful of lines (and less yesterday) it just doesn't seem that great. Or it gives me pages of javascript to show a date picker before I RTFM and found it's a single input tag to do that, because it's training data was lots of old and/or bad code and didn't do it that way.

jononor · 5 months ago
Yes every 10 edits or so. Super annoying. It is limiting how often I bother using the tool
tkgally · 5 months ago
I have had the same problem with artifacts, and I had similar problems several months ago with Claude Desktop. I stopped using those features mostly and use Claude Code instead. I don't like CC's terminal interface, but it has been more reliable for me.
sunaookami · 5 months ago
It edits it for me but it tries to edit it "in place" where it messes up the version history and it looks very broken and often times is broken afterwards. Don't know why they broke their best feature while ChatGPT Canvas just works.
efromvt · 5 months ago
This has been super annoying! I just tell it to make sure the artifact is updated and it usually fixes it, but it's annoying to have to notice/keep an eye on it.
j45 · 5 months ago
Quite regularly.

I instruct artifacts to not be used and then explicitly provide instruction to proceed with creation when ready.

wolfgangbabad · 5 months ago
My experience is similar. At first Claude was super smart and get even very complicated things right. Now even super simple tasks are almost impossible to finish right, even if I really chop things into small steps. Also it's much slower even on Pro account than a few weeks ago.
strictnein · 5 months ago
I'm on the $200 / month account and its also slower than a few weeks ago. And struggling more and more.

I used to think of it as a decent sr dev working alongside me. Not it feels like an untrained intern that takes 4-5 shots to get things right. Hallucinated tables, columns, and HTML templates are its new favorite thing. And calling things "done" that aren't even half done and don't work in the slightest.

brookst · 5 months ago
Same plan, same experience. Trying to get it to develop and execute tests and it frequently modifies the test to succeed even if the libraries it calls fail, and then explains that it’s doing so because the test itself works but the underlying app has errors.

Yes, I know. That’s what the test was for.

keyle · 5 months ago
There must be a term coined for AI degradation...

At least with local LLM, it's crap, but it's consistent crap!

cyanydeez · 5 months ago
Gotta assume theyre reducing overall compute with smaller models cause 200$ aint squat for their investment.
insane_dreamer · 5 months ago
On Max and also find it slower recently.

Also yesterday tried to use it to debug some AWS issue and it tried to send me down so many wrong paths, and suggested changes that were either plain wrong or had unintended consequences, that if I didn't actually know my stuff and had followed blindly, the results would have been pretty bad or at least a huge time waster. When I called it out it would quickly reverse course ("You're right of course!") and it did provide some helpful snippets but I was unimpressed.

What I find it excellent at is for throw-away scripts to do small jobs or automate little things--stuff I could do but would take me a lot longer (especially in bash).

ranguna · 5 months ago
It's still pretty good on my side. I'm just paying for the pro version.
spike021 · 5 months ago
For the past two to three weeks I've noticed Claude just consistently lagging or potentially even being throttled for pretty minor coding or CLI tasks. It'll basically stop showing any progress for at least a couple minutes. Sometimes exiting the query and re-trying gets it to work but other times it keeps happening. I pay for Pro so I don't think it's just API rate limiting.

Would appreciate if that could be fixed but of course new features are more interesting for them to prioritize.

yyhhsj0521 · 5 months ago
I use Claude Code at work via AWS bedrock, also personally subscribe to the $20/month Claude. Anecdotallt, Sonnet hasn't slowed down at all. ChatGPT 5 through enterprise plan, on the other hand, has noticeably slowed down or sometimes just not return anything.
Daisywh · 5 months ago
I've run into similar issues too. Even small scripts or commands sometimes get throttled. It does not feel like a resource limit. It feels more like the system is just overly sensitive.
zer00eyz · 5 months ago
> It does not feel like a resource limit.

As someone who keeps oddball hours, I can tell you that time of day will very much change your experience with Claude.

2am Sunday is nothing like 2pm on a Tuesday.

jazzyjackson · 5 months ago
> feels more like the system is just overly sensitive.

Somebody call the cyber psychologist! (Cychologist?)

gregoryl · 5 months ago
Same. My usage is via an internal corp gateway (Instacart), Sonnet 4. Used to be lighting fast, now getting regular slow downs or outright failures. Not seeing it with the various GPT models.
jimmydoe · 5 months ago
More people are working after labor day. Fridays and weekends are better, Wednesdays are the worst.
radicalriddler · 5 months ago
I see this quite a lot via Copilot using Claude. It'll just get stuck on a token for a while.
leptons · 5 months ago
Can you still code without it?
spike021 · 5 months ago
I'm not sure how saying it won't even run CLI commands has anything to do with my ability to code with or without it.

Dead Comment

butterisgood · 5 months ago
It does this in emacs with efrit. https://github.com/steveyegge/efrit

It can actually drive emacs itself, creating buffers, being told not to edit the buffers and simply respond in the chat etc.

I actually _like_ working with efrit vs other LLM integrations in editors.

In fact I kind of need to have my anthropic console up to watch my usage... whoops!

mkw2000 · 5 months ago
To everyone who has been feeling like their MAX subscription is a waste of money, give GLM 4.5 a try, i use it with claude code daily on the $3 plan and it has been great
atonse · 5 months ago
I pay $100 a month and wouldn’t hesitate for a millisecond if I needed to pay the $200/mo plan if I hit rate limits.

It’s hard to overstate how much of a productivity shift Claude code has been for shipping major features in our app. And ours is an elixir app. It’s even better with React/NextJS.

I literally won’t be hitting any “I need to hire another programmer to handle this workload” limits any time soon.

ranguna · 5 months ago
That's not what the op asked. They didn't ask whether claude is useful in general, they asked whether it was good compared to other LLMs.

On of the tricks to a healthy discussions is to actually read/listen to what the other side is trying to say. Without that, you're just talking to yourself.

ewoodrich · 5 months ago
It looks like the $3 plan is only a promo price for the 1st month and it's actually $6/mo, or am I missing something?

https://z.ai/payment?productIds=product-6caada

allisdust · 5 months ago
Yes it is. But totally worth it. Just got it and its quite good and quite fast. Clearly they are subsidizing even at $6.

It feels like using sonnet speed wise but with opus quality (i mean pre August Opus/sonnet -> no clue what Anthropic did after that. It's just crap now).

nkzd · 5 months ago
Hi, I believe my current Claude subscription is going to waste. Can I ask what 3$ plan are you referring to?
mkw2000 · 5 months ago
spott · 5 months ago
How are you using it with Claude code?
devinprater · 5 months ago
Maybe one day Claude can rewrite its interface to be more accessible to blind people like me.
crazygringo · 5 months ago
What is inaccessible about it? It's kind of hard to discuss without any particulars.
ctoth · 5 months ago
Curious what a11y issues you see with Claude? I use it a remarkable amount and haven't found any showstoppers. Web interface and Claude Code.
NamlchakKhandro · 5 months ago
> A blind person like me...

you:

> what a11y issues you see

visarga · 5 months ago
Claude has no TTS while most LLMs have it. It makes the text more accessible.

Dead Comment

SAI_Peregrinus · 5 months ago
Anthropic are looking to make money. They need to make absolutely absurd amounts of money to afford the R&D expenses they've already incurred. Features get prioritized based on how much money they might make. Unless forced to by regulation (or maybe social pressure on the executives, but that really only comes from their same class instead of the general public these days) smaller groups of customers get served last. There aren't that many blind people, so there's not very much profit incentive to serve blind people. Unless they're actually violating the ADA or another law or regulation, and can't bribe the regulators for less than the cost of fines or fixing the issue, I'd not expect any improvement.
googlryas · 5 months ago
Their app being top of the line, because they coded their app in their app, would certainly be a nice natural endorsement of the product.