What is with the negativity in these comments? This is a huge, huge surface area that touches a large percentage of white collar work. Even just basic automation/scaffolding of spreadsheets would be a big productivity boost for many employees.
My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
Having wrangled many spreadsheets personally, and worked with CFOs who use them to run small-ish businesses, and all the way up to one of top 3 brokerage houses world-wide using them to model complex fixed income instruments... this is a disaster waiting to happen.
Spreadsheet UI is already a nightmare. The formula editing and relationship visioning is not there at all. Mistakes are rampant in spreadsheets, even my own carefully curated ones.
Claude is not going to improve this. It is going to make it far, far worse with subtle and not so subtle hallucinations happening left and right.
The key is really this - all LLMs that I know of rely on entropy and randomness to emulate human creativity. This works pretty well for pretty pictures and creating fan fiction or emulating someone's voice.
It is not a basis for getting correct spreadsheets that show what you want to show. I don't want my spreadsheet correctness to start from a random seed. I want it to spring from first principles.
My first job out of uni was building a spreadsheet infra as code version control system after a Windows update made an eight year old spreadsheet go haywire and lose $10m in a afternoon.
In my opinion the biggest use case for spread sheet with LLM is to ask them to build python scripts to do what ever manipulations you want to do with the data. Once people learn to do this workplace productivity would increase greatly I have been using LLM for years now to write python scripts that automate different repeatable tasks. Want a pdf of this data to be overlayed on this file create a python script with an LLM. Want the data exported out of this to be formated and tallied create a script for that.
I don't think tools like Claude are there yet, but I already trust GPT-5 Pro to be more diligent about catching bugs in software than me, even when I am trying to be very careful. I expect even just using these tools to help review existing Excel spreadsheets could lead to a significant boost in quality if software is any guide (and Excel spreadsheets seem even worse than software when it comes to errors).
That said, Claude is still quite behind GPT-5 in its ability to review code, and so I'm not sure how much to expect from Sonnet 4.5 in this new domain. OpenAI could probably do better.
Yeah, it's like that commercial for OpenAI (or was it Gemini?) where the guy says it lets the tool work on it's complex financial spreadsheets, goes for a walk with a dog, gets back and it is done with "like 98% accuracy". I cannot imagine what the 2% margin of error looks like for a company that moves around hundreds of billions of dollars...
Having AI create the spreadsheet you want is totally possible, just like generating bash scripts works well. But to get good results, there needs to be some documentation describing all the hidden relationships and nasty workarounds first.
Don't try to make LLMs generate results or numbers, that's bound to fail in any case. But they're okay to generate a starting point for automations (like Excel sheets with lots of formulas and macros), given they get access to the same context we have in our heads.
I tend to agree that dropping the tool as it is into untrained hands is going to be catastrophic.
I’ve had similar professional experiences as you and have been experimenting with Claude Code. I’ve found I really need to know what I’m doing and the detail in order to make effective (safe) use out of it. And that’s been a learning curve.
The one area I hope/think it’s closest to (given comments above) is potentially as a “checker” or validator.
But even then I’d consider the extent to which it leaks data, steers me the wrong way, or misses something.
The other case may be mocking up a simple financial model for a test / to bounce ideas around. But without very detailed manual review (as a mitigating check), I wouldn’t trust it.
So yeah… that’s the experience of someone who maybe bridges these worlds somewhat… And I think many out there see the tough (detailed) road ahead, while these companies are racing to monetize.
My take is more optimistic. This could be an off ramp to stop putting critical business workflows in spreadsheets. If people start to learn that general purpose programming languages are actually easier than Excel (and with LLMs, there is no barrier), then maybe more robust workflows and automation will be the norm.
I think the world would be a lot better off if excel weren’t in it. For example, I work at business with 50K+ employees where project management is done in a hellish spreadsheet literally one guy in Australia understands. Data entry errors can be anywhere and are incomprehensible. 3 or 4 versions are floating around to support old projects. A CRUD app with a web front end would solve it all. Yet it persists because Excel is erroneously seen as accessible whereas Rails, Django, or literally anything else is witchcraft.
> all LLMs that I know of rely on entropy and randomness to emulate human creativity
Those are tuneable parameters. Turn down the temperature and top_p if you don't want the creativity.
> Claude is not going to improve this.
We can measure models vs humans and figure this out.
To your own point, humans already make "rampant" mistakes. With models, we can scale inference time compute to catch and eliminate mistakes, for example: run 6x independent validators using different methodologies.
One-shot financial models are a bad idea, but properly designed systems can probably match or beat humans pretty quickly.
To me, the case for LLMs is strongest not because LLMs are so unusually accurate and awesome, but because if human performance were put on trial in aggregate, it would be found wanting.
Humans already do a mediocre job of spreadsheets, so I don't think it is a given that Claude will make more mistakes than humans do.
Or you could, you know, read the article before commenting to see the limited scope of this integration?
Anyway, Google has already integrated Gemini into Sheets, and recently added direct spreadsheet editing capability so your comment was disproven before you even wrote it
> The key is really this - all LLMs that I know of rely on entropy and randomness to emulate human creativity. This works pretty well for pretty pictures and creating fan fiction or emulating someone's voice.
I think you need to turn down the temperature a little bit. This could be a beneficial change.
I don't trust LLMs to do the kind of precise deterministic work you need in a spreadsheet.
It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.
Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.
Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.
Seems like you're very confused about what this work typically entails. The job of these employees is not mental arithmatic. It's closer to:
- Log in to the internal system that handles customer policies
- Find all policies that were bound in the last 30 days
- Log in to the internal system that manages customer payments
- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.
- Flag any divergences above X% for accounting/finance to follow up on.
Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.
Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.
I don’t see the issue so much as the deterministic precision of an LLM, but the lack of observability of spreadsheets. Just looking at two different spreadsheets, it’s impossible to see what changes were made. It’s not like programming where you can run a `git diff` to see what changes an LLM agent made to a source code file. Or even a word processing document where the text changes are clear.
Spreadsheets work because the user sees the results of complex interconnected values and calculations. For the user, that complexity is hidden away and left in the background. The user just sees the results.
This would be a nightmare for most users to validate what changes an LLM made to a spreadsheet. There could be fundamental changes to a formula that could easily be hidden.
For me, that the concern with spreadsheets and LLMs - which is just as much a concern with spreadsheets themselves. Try collaborating with someone on a spreadsheet for modeling and you’ll know how frustrating it can be to try and figure out what changes were made.
>I don't trust LLMs to do the kind of precise deterministic work
not just in a spreadsheet, any kind of deterministic work at all.
find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.
after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.
ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?
Most real-world spreadsheets I've worked with were fragile and sloppy, not precise and deterministic. Programmers always get shocked when they realize how many important things are built on extremely messy spreadsheets, and that people simply accept it. They rather just spend human hours correcting discrepancies than trying to build something maintainable.
"I don't trust LLMs to do the kind of precise deterministic work" => I think LLM is not doing the precise arithmetic. It is the agent with lots of knowledge (skills) and tools. Precise deterministic work is done by tools (deterministic code). Skills brings domain knowledge and how to sequence a task. Agent executes it. LLM predicts the next token.
Sure, but this isn't requiring that the LLM do any math. The LLM is writing formulas and code to do the math. They are very good at that. And like any automated system you need to review the work.
Do you trust humans to be precise and deterministic, or even to be especially good at math?
This is talking about applying LLMs to formula creation and references, which they are actually pretty good at. Definitely not about replacing the spreadsheet's calculation engine.
They already doing that with AI, rejecting claims at higher numbers than before .
Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .
The issue isn’t in creating a new monstrosity in excel.
The issue is the poor SoB who has to spelunk through the damn thing to figure out what it does.
Excel is the sweet spot of just enough to be useful, capable enough to be extensible, yet gated enough to ensure everyone doesn’t auto run foreign macros (or whatever horror is more appropriate).
In the simplest terms - it’s not excel, it’s the business logic. If an excel file works, it’s because theres someone who “gets” it in the firm.
I used to live in Excel too. I've trudged through plenty of awful worksheets. The output I've seen from AI is actually more neatly organized than most of what I used to receive in outlook. Most of that wasn't hyper-sophisticated cap table analyses. It was analysis from a Jr Analyst or line employee trying to combine a few different data sources to get some signal on how XYZ function of the business was performing. AI automation is perfectly suitable for this.
Yes it's surprising to see so much cynicism for something that has a real possibility of making so many people so much more productive.
My mental model of the average excel user is of someone who doesn't care about excel, but cares about their business. If Claude can help them use excel and learn about their business faster, then this should make the world more productive and we all get richer.
Claude can make mistakes, but it's not clear to me why people think that the ratio of results to mistakes will get worse here.
I think there are many possible reasons why this could not work out, but many of the comments here just seem like unfounded cynicism.
HN has a base of strong anti-AI bias, I assume is partially motivated by insecurity over being replaced, losing their jobs or having missed the boat on the AI.
I use AI every day. Without oversight, it does not work well.
If it doesn't work well, I will do it myself, because I care that things are done well.
None of this is me being scared of being replaced; quite the opposite. I'm one of the last generations of programmers who learned how to program and can debug and fix the mess your LLM leaves behind when you forgot to add "make sure it's a clean design and works" to the prompt.
Okay, that's maybe hyperbole, but sadly only a little bit. LLMs make me better at my job, they don't replace me.
Based on the comments here, it's surprisingly anything in society works at all. I didn't realize the bar was "everything perfect every time, perfectly flexible and adaptable". What a joy some of these folks must be to work with, answering every new technology with endless reasons why it's worthless and will never work.
HN constantly points out the flaws, gaps, and failings of AI. But the same is true of any technology discussed on HN. You could describe HN as having an anti-technology bias, because HN complains about the failings of tech all day every day.
Quite the opposite, actually. You can always find five stories on the front page about some AI product or feature. Meanwhile, you have people like yourself who convince themselves that any pushback is done by people who just don't see the true value of it yet and that they're about to miss out!! Some kind of attempt at spreading FOMO, I guess.
If anything, HN, has a pro-AI bias. I don't know of any other medium where discussions about AI consistently get this much frontpage time, this amount of discussion, and this many people reporting positive experiences with it. It's definitely true that HN isn't the raging pro-AI hypetrain it was two years ago, but that shouldn't be mistaken for "strong anti-AI bias".
Outside of HN I am seeing, at best, an ambivalent reaction: plenty of people are interested, almost everyone tried it, very few people genuinely like it. They are happy to use it when it is convenient, but couldn't care less if it disappeared tomorrow.
There's also a small but vocal group which absolutely hates AI and will actively boycott any creative-related company stupid enough to admit to using it, but that crowd doesn't really seem to hang out on HN.
I really don’t think this is accurate. I think the median opinion here is to be suspicious of claims made about AI, and I don’t think that’s necessarily a bad thing. But I also regularly see posts talking about AI positively (e.g. simonw), or talking about it negatively. I think this is a good thing, it is nice to have a diversity of opinions on a technology. It's a feature, not a bug.
HN has an obsession with quality too, which has merit, but is often economically irrelevant.
When US-East-1 failed, lots of people talked about how the lesson was cloud agnosticism and multi cloud architecture. The practical economic lesson for most is that if US-East-1 fails, nobody will get mad at you. Cloud failure is viewed as an act of god.
Anti-AI bias is motivated by the waste of natural resources due to a handful of non-technical douchebag tech bros.
Everything isn't about money, I know that status and power are all you ai narcissists dream about. But you'll never be Bill Gates, nor will you be Elon Musk.
Once ai has gone the way of "Web3", "NFTs", "blockchain", "3D tvs", etc; You'll find a new grift to latch your life savings onto.
The vast majority of people in business and science are using spreadsheets for complex algorithmic things they weren't really designed for, and we find a metric fuckton of errors in the sheets when you actually bother looking auditing them, mistakes which are not at all obvious without troubleshooting by... manually checking each and every cell & cell relation, peering through parentheses, following references. It's a nightmare to troubleshoot.
LLMs specialize in making up plausible things with a minimum of human effort, but their downside is that they're very good at making up plausible things which are covertly erroneous. It's a nightmare to troubleshoot.
There is already an abject inability to provision the labor to verify Excel reasoning when it's composed by humans.
I'm dead certain that Claude will be able to produce plausibly correct spreadsheets. How important is accuracy to you? How life-critical is the end result? What are your odds, with the current auditing workflow?
Okay! Now! Half of the users just got laid off because management thinks Claude is Good Enough. How about now?
I'd say the vast majority of Excel users in business are working off of a CSV sent from their database/ERP team or exported from a self-serve analytics tool and using pivot tables to do the heavy lifting, where it's nearly impossible to get something wrong. Investment banks and trading desks are different, and usually have an in-house IT team building custom extensions into Excel or training staff to use bespoke software. That's still a very small minority of Excel users.
Yeah, this could be a pretty big deal. Not everyone is an excel expert, but nearly everyone finds themselves having to work with data in excel at some time or other.
A lot of us have seen the effects of AI tools in the hands of people who don't understand how or why to use the tools. I've already seen AI use/misuse get two people fired. One was a line-of-business employee who relied on output without ever checking it, got herself into a pretty deep hole in 3 weeks. Another was a C suite person who tried to run an AI tool development project and wasted double their salary in 3 months, nothing to show for it but the bill, fired.
In both cases the person did not understand the limits of the tools and kept replacing facts with their desires and their own misunderstanding of AI. The C suite person even tried to tell a vendor they were wrong about their own product because "I found out from AI".
AI right now is fireworks. It's great when you know how to use it, but if you half-ass it you'll blow your fingers off very easily.
> but these jobs are going to be the first on the chopping block as these integrations mature.
I'm not even sure that has to be true anymore. From my admittedly superficial impression of the page, this appears to be a tool for building tools. There are plenty of organizations that are resource constrained, that are doing things the way they have always done thing in Excel, simply because they cannot allocate someone to modify what is already in place to better suit their current needs. For them, this is more of a quality of life and quality of out improvement. This is not like traditional software development, where organizations are far more likely to purchase a product or service to do a job (and where the vendors of those products and services are going to do their best to eliminate developers).
It is bad in a very specific sense, but I did not see any other comments express the bad parts instead of focusing merely on the accuracy part ( which is an issue, but not the issue ):
- this opens up ridiculous flood of data that would otherwise be semi-private to one company providing this service
- this works well small data sets, but will choke on ones it will need to divvy up into chunks inviting interesting ( and yet unknown ) errors
There is a real benefit to being able to 'talk to data', but anyone who has seen corporate culture up close and personal knows exactly where it will end.
edit: an i saying all this as as person, who actually likes llms.
The biggest problem with spreadsheets is that they tend to be accounts for the accumulation of technical debt, which is an area that AI tools are not yet very good at retiring, but very good at making additional withdrawals from.
What does scaffolding of spreadsheets mean? I see the term scaffolding frequently in the context of AI-related articles and not familiar with this method and I’m hesitant to ask an LLM.
Scaffolding typically just refers to a larger state machine style control flow governing an agent's behavior and the suite of external tools it has access to.
Probably because many people here are software developers, and wrapping spreadsheets in deterministic logic and a consistent UI covers... most software use cases.
in the short run. In the long run, productivity gains benefit* all of us (in a functional market economy).
*material benefit. In terms of spirit and purpose, the older I get the more I think maybe the Amish are on to something. Work gives our lives purpose, and the closer the work is to our core needs, the better it feels. Labor saving so that most of us are just entertaining each other on social networks may lead to a worse society (but hey, our material needs are met!)
Anthropic now has all your company's data, and all you saved was the cost of one human minus however much they charge for this. The good news is it can't have your data again! So starting from the 163rd-165th person you fire, you start to see a good return and all you've sacrificed is exactitude, precision, judgement, customer service and a little bit of public perception!
Excel and AIs are huge clusterfucks on their own, where insane errors happens for various reasons. Combine them, and maybe we will see improvement, but surely we will see catastrophic outcomes which could not only ruin the lives of ordinary people, whole companies and countries, as already happened before...
Non-reproducability is the biggest issue here. You deliver a report in 5 minutes to CFO, he comes back after lunch, gives you updated data to adjust a bit of a report and 5 minutes later gets a new report that has some non related to update number changed and asks why? what do you do?
> these jobs are going to be the first on the chopping block as these integrations mature.
Those two things are maybe related? So many of my friends don't enjoy the same privileges as I do, and have a more tenuous connection to being gainfully employed.
I have to admit that my first thought was “April’s fool”. But you are right. It makes a lot of sense (if they can get it to work well). Not only is Excel the world’s biggest “programming language”. It’s probably also one of the most unintuitive ways to program.
My theory: a lot of software we build is the supposed solve for a 'crappy spreadsheet'. a) that isnt' much of a moat, b) you're watching generalization of software happen in real time.
Crappy spreadsheet is just the codification of business processes. Those are inherently messy and there's lots of assumptions, lots of edge cases. That's why spreadsheets tend towards crappy on a long enough timeline. It's a fundamentally messy problem.
Spreadsheets are an abstraction over a messy reality, lossy. They were already generalizing reality.
Now we generalize the generalization. It is this lossy reality that people are worried about with AI in HN.
Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.
It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.
Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.
It seems like to me the answer is moreso "People on HN are so far removed from the real use cases for this kind of automation they simply have no idea what they're talking about".
Honestly as a dev I hate Excel its a whole mess I dont understand. I will gladly use Claude for Excel. It will understand the business needs from the data more than I a mere developer just trying to get back to regular developer work.
> No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
If this is true then why your wife is going to be happy about it? I found it really hard to understand. Do you prefer your wife to be jobless and her employer happily cut costs without impacting productivity? Even if it just replaces the line workers, do you think your wife is going to be safe?
> offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
No offense, but this is pure fantasy. The level of analysis they typically provide doesn't suffer from the same high baseline level of completely made up numbers of your favorite LLM.
It's actually really cool. I will say that "spreadsheets" remain a bandaid over dysfunctional UIs, processes, etc and engineering spends a lot of time enabling these bandaids vs someone just saying "I need to see number X" and not "a BI analytics data in a realtime spreadsheet!", etc.
I second this. Spreadsheets are the primary tool used for 15% of the U.S. economy. Productivity improvements will affect hundreds of millions of users globally. Each increment in progress is a massive time save and value add.
The criticisms broadly fall between "spreadsheets are bad" and "AI will cause more trouble than it solves".
This release is a dot in a trend towards everyone having a Goldman-Sachs level analyst at their disposal 24/7. This is a huge deal for the average person or business. Our expectation (disclaimer: I work in this space) is that spreadsheet intelligence will soon be a solved problem. The "harder" problem is the instruction set and human <> machine prompting.
For the "spreadsheets are bad" crowd -- sure, they have problems, but users have spoken and they are the preferred interface for analysis, project management and lightweight database work globally. All solutions to "the spreadsheet problem" come with their own UX and usability tradeoffs, so it'a a balance.
Congrats to the Claude team and looking forward to the next release!
> Each increment in progress is a massive time save and value add.
Based on the history of digitalization of businesses from the 1980s onwards, the spreadsheets will just balloon in number and size and there will be more rules and more procedures and more forms and reports to file until the efficiency gains are neutralized (or almost neutralized).
I'm a co-founder of Calcapp, an app builder for formula-driven apps using Excel-like formulas. I spent a couple of days using Claude Code to build 20 new templates for us, and I was blown away. It was able to one-shot most apps, generating competent, intricate apps from having looked at a sample JSON file I put together. I briefly told it about extensions we had made to Excel functions (including lambdas for FILTER, named sort type enums for XMATCH, etc), and it picked those up immediately.
At one point, it generated a verbose formula and mentioned, off-handedly, that it would have been prettier had Calcapp supported LET. "It does!", I replied, "and as an extension, you can use := instead of , to separate names and values!") and it promptly rewrote it using our extended syntax, producing a sleek formula.
These templates were for various verticals, like real estate, financial planning and retail, and I would have been hard-pressed to produce them without Claude's domain knowledge. And I did it in a weekend! Well, "we" did it in a weekend.
So this development doesn't really surprise me. I'm sure that Claude will be right at home in Excel, and I have already thought about how great it would be if Claude Code found a permanent home in our app designer. I'm concerned about the cost, though, so I'm holding off for now. But it does seem unfair that I get to use Claude to write apps with Calcapp, while our customers don't get that privilege.
Seems everyone is speculating features instead of just reading TFA which does in fact list features:
- Get answers about any cell in seconds:
Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
- Test scenarios without breaking formulas:
Update assumptions across your entire model while preserving all dependencies. Test different scenarios quickly—Claude highlights every change with explanations for full transparency.
- Debug and fix errors:
Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.
- Build models or fill existing templates:
Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.
I feel similarly about MS Word. It can actually produce decent documents if you learn how to use it, in particularly if you use styles consistently and never, ever touch the bold, italic, colours etc. (outside of defining said styles, although the defaults are probably all most people need). Unfortunately I think the appeal of Word is that you don't have to learn this and it will just do what you want. Is AI the panacea that will both do what you want and give you the right answers every time?
Also people complaining about AI inaccuracy are just technical people that like precision. The vast majority of the world is people who dont give a damn about accuracy or even correctness. They just want to appear as if not completely useless to people that could potentially affect their salary
I can pretty reliably guess that approximately 100% of all companies in the world use excel tables for financial data and for processes. Ok, this was a joke. It's actually 99.99% of all companies. One would think that financial data, inventory and stuff like that should be damn precise. No?
"just" technical people who like precision are the reason we are here, typing this, and why lots of parts of our world is pretty cool and comfortable. I wouldn't say that's useless and "just" some people when it clearly is generating unmistakable value
They can try, but doubt anyone serious will adopt it.
Tried integrating chatgpt into my finance job to see how far I can get. Mega jikes...millions of dollars of hallucinated mistakes.
Worse you don't have the same tight feedback loop you've got in programming that'll tell you when something is wrong. Compile errors, unit tests etc. You basically need to walk through everything it did to figure out what's real and what's hallucinations. Basically fails silently. If they roll that out at scale in the financial system...interesting times ahead.
Still presumably there is something around spreadsheets it'll be able to do - the spreadsheet equivalent of boilerplate code whatever that may be
I'm bad with spread sheets so maybe this is trivial but having an llm tell me how to connect my sheet to whatever data I'm using at the moment and it coming up with a link or sql query or both has allowed me to quickly pull in data where I'd normally eyeball it and move on or worst case do it partially manually if really important.
It's like one off scripts in a sense? I'm not doing complex formulas I just need to know how I can pull data into a sheet and then I'll bucketize or graph it myself.
Again probably because I'm not the most adept user but it has definitely been a positive use case for me.
Neither am I frankly. Finance stuff can get conceptually complicated even with simple addition & multiplication though. e.g. I deal with a lot of offshore stuff, so the average spreadsheet is a mix of currencies, jurisdictions and companies that are interlinked. I could probably talk you through it high level in an hour with a pen & paper, but the LLMs just can't see the forest for all the trees in the raw sheet.
Anthropic is in a weird place for me right now. They're growing fast , creating little projects that i'd love to try, but their customer service was so bad for me as a max subscriber that I set an ethical boundary for myself to avoid their services until such point that it appears that they care about their customers whatsoever.
I keep searching for a sign, but everyone I talk to has horror stories. It sucks as a technologist that just wants to play with the thing; oh well.
> I keep searching for a sign, but everyone I talk to has horror stories. It sucks as a technologist that just wants to play with the thing; oh well.
The reason that Claude Code doesn't have an IDE is because ~"we think the IDE will obsolete in a year, so it seemed like a waste of time to create one."
Noam Shazeer said on a Dwarkesh podcast that he stopped cleaning his garage, because a robot will be able to do it very soon.
If you are operating under the beliefs these folks have, then things like IDEs, cleaning up, and customer service are stupid annoyances that will become obsolete very soon.
To be clear, I have huge respect for everyone mentioned above, especially Noam.
> Noam Shazeer said on a Dwarkesh podcast that he stopped cleaning his garage, because a robot will be able to do it very soon.
We all come up with excuses for why we haven't done a chore, but some of us need to sound a bit more plausible to other members of the household than that.
It would get about the same reaction as "I'm not going to wash the dishes tonight, the rapture is tomorrow."
Best way to think of it is this: Right now you are not the customer. Investors are.
The money people pay in monthly fees to Anthropic for even the top Max sub likely doesn't come closer to covering the energy & infrastructure costs for running the system.
You can prove this to yourself by just trying to cost out what it takes to build the hardware capable of running a model of this size at this speed and running it locally. It's tens of thousands of dollars just to build the hardware, not even considering the energy bills.
So I imagine the goal right now is to pull in a mass audience and prove the model, to get people hooked, to get management and talent at software firms pushing these tools.
And I guess there's some in management and the investment community that thinks this will come with huge labour cost reductions but I think they may be dreaming.
... And then.. I guess... jack the price up? Or wait for Moore's Law?
So it's not a surprise to me they're not jumping to try and service individual subscribers who are paying probably a fraction of what it costs them to the run the service.
I dunno, I got sick of paying the price for Max and I now use the Claude Code tool but redirect it to DeepSeek's API and use their (inferior but still tolerable) model via API. It's probably 1/4 the cost for about 3/4 the product. It's actually amazing how much of the intelligence is built into the tool itself instead of just the model. It's often incredibly hard to tell the difference bertween DeepSeek output and what I got from Sonnet 4 or Sonnet 4.5
I've been playing around with local LLMs in Ollama, just for fun. I have an RTX 4080 Super, a Ryzen 5950X with 32 threads, and 64 GB of system memory. A very good computer, but decidedly consumer-level hardware.
I have primarily been using the 120b gpt-oss model. It's definitely worse than Claude and GPT-5, but not by, like, an order of magnitude or anything. It's also clearly better than ChatGPT was when it first came out. Text generates a bit slowly, but it's perfectly usable.
So it doesn't seem so unreasonable to me that costs could come down in a few years?
Every AI company right now (except Google Meta and Microsoft) has their valuations based on the expectation of a future monopoly on AGI. None of their business models today or in the foreseeable horizon are even positive let alone world-dominating. The continued funding rounds are all apparently based on expectation of becoming the sole player.
The continuing advancement of open source / open weights models keeps me from being a believer.
bad customer service comes from low priority.
I think anthropic prioritize new growth point over small number of customer’s feedback, that’s why they publish new product, features so frequently, there are so much possible potential opportunities for them to focus
There is this homogenization happening in AI. No matter what their original mission was, all the AI companies are now building AI-powered gimmicks hoping to stumble upon something profitable. The investors are waiting...
Customer service at B2C companies can only go downhill or stay level. See Google, Apple, Microsoft etc. At B2B it maaaybe can improve, but only when a ten times bigger customer strongarms a company into doing it.
From the signup form mentioning Private Equity / Venture Capital, Hedge Fund, Investment Banking... this seems squarely aimed at financial modeling. Which is really, really cool.
I've worked alongside sell-side investment bankers in a prior startup, and so much of the work is in taking a messy set of statements from a company, understanding the underlying assumptions, and building, and rebuilding, and rebuilding, 3-statement models that not only adhere to standard conventions (perhaps best introed by https://www.wallstreetprep.com/knowledge/build-integrated-3-... ) but also are highly customized for different assumptions that can range from seasonality to sensitivity to creative deal structures.
It is quite common for people to pull many, many all-nighters to try to tweak these models in response to a senior banker or a client having an idea! And one might argue there are way too many similar-looking numbers to keep a human banker from "hallucinating," much less an LLM.
But fundamentally, a 3-statement model and all its build-sheets are a dependency graph with loosely connected human-readable labels, and that means you can write tools that let an LLM crawl that dependency graph in a reliable and semantically meaningful way. And that lets you build really cool things, really fast.
I'm of the opinion that giving small companies the ability to present their finances to investors, the same way Fortune 500 companies hire armies of bankers to do, is vital to a healthy economy, and to giving Main Street the best possible chance to succeed and grow. This is a massive step in the right direction.
Presenting false data to investors is fraud, doesn't matter how it was generated. In fact, humans are quite good at "generating plausible looking data", doesn't mean human generated spreadsheets are fraud.
On the other hand, presenting truthful data to investors is distinctly not fraud, and this again does not depend on the generation method.
Completely understand the sentiment, but it doesn't apply here, because what's being generated are formulas!
Standardized 3-statement models in Excel are designed to be auditable, with or without AI, because (to only slightly simplify) every cell is either a blue input (which must come from standard exports of the company's accounting books, other auditable inventory/CRM/etc. data, or a visible hardcoded constant), or a black formula that cannot have hardcoded values, and must be simple.
If every buyer can audit, with tools like this, that the formulas match the verbal semantics of the model, there's even less incentive than there is now to fudge the formula level. (And with Wall Street conventions, there's nowhere to hide a prompt injection, because you're supposed to keep every formula to only a few characters, and use breakout "build" rows that can themselves be visually audited.)
And sure, you could conceivably use any AI tool to generate a plausible list of numbers at the input level, but that was equally easy, and equally dependent on context to be fraudulent or not, ever since that famous Excel 1990 elevator commercial: https://www.youtube.com/watch?v=kOO31qFmi9A&t=61s
At the end of the day, the difference between "they want to see this growth, let's fudge it" and "they want to see this growth, let's calculate the exact metrics we need to hit to make that happen, and be transparent about how that's feasible" has always been a matter of trust, not technology.
Tech like this means that people who want to do things the right way can do it as quickly as people who wanted to play loose with the numbers, and that's an equalizer that's on the right side of history.
This is going to be massive if it works as well as I suspect it might.
I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
It's much less about 'greenfield' new excel sheets and much more about fixing/improving existing ones. If it works as well as Claude Code works for code, then it will get pretty crazy adoption I suspect (unless Microsoft beats them to it).
> I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
So they can fire the two dudes that take care of it, lose 15 years of in house knowledge to save 200k a year and cry in a few months when their magic tool shits the bed ?
You think it's better for the company to have "two dudes" that are completely indispensable and whose work will be completely useless if they die / leave?
I think you're making an argument for LLMs, not against.
If the company is half baked, those "two dudes" will become indispensable beyond belief. They are the ones that understand how Excel works far deeper, and paired with Claude for Excel they become far far more valuable.
The thing really missing from multi-megabyte excel sheets of business critical carnage was a non-deterministic rewrite tool. It'll interact excitingly with the industry standard of no automated testing whatsoever.
I 100% believe generative AI can change a spreadsheet. Turn the xslx into text, mutate that, turn it back into an xslx, throw it away if it didn't parse at all. The result will look pretty similar to the original too, since spreadsheets are great at showing immediately local context and nothing else.
Also, we've done a pretty good job of training people that chatgpt works great, so there's good reason for them to expect claude for excel to work great too.
I'd really like the results of this to be considered negligence with non-survivable fines for the reckless stupidity, but more likely, it'll be seen as an act of god. Like all the other broken shit in the IT world.
My wife works in insurance operations - everyone she manages from the top down lives in Excel. For line employees a large percentage of their job is something like "Look at this internal system, export the data to excel, combine it with some other internal system, do some basic interpretation, verify it, make a recommendation". Computer Use + Excel Use isn't there yet...but these jobs are going to be the first on the chopping block as these integrations mature. No offense to these people but Sonnet 4.5 is already at the level where it would be able to replicate or beat the level of analysis they typically provide.
Spreadsheet UI is already a nightmare. The formula editing and relationship visioning is not there at all. Mistakes are rampant in spreadsheets, even my own carefully curated ones.
Claude is not going to improve this. It is going to make it far, far worse with subtle and not so subtle hallucinations happening left and right.
The key is really this - all LLMs that I know of rely on entropy and randomness to emulate human creativity. This works pretty well for pretty pictures and creating fan fiction or emulating someone's voice.
It is not a basis for getting correct spreadsheets that show what you want to show. I don't want my spreadsheet correctness to start from a random seed. I want it to spring from first principles.
Spreadsheets are already a disaster.
That said, Claude is still quite behind GPT-5 in its ability to review code, and so I'm not sure how much to expect from Sonnet 4.5 in this new domain. OpenAI could probably do better.
Don't try to make LLMs generate results or numbers, that's bound to fail in any case. But they're okay to generate a starting point for automations (like Excel sheets with lots of formulas and macros), given they get access to the same context we have in our heads.
I’ve had similar professional experiences as you and have been experimenting with Claude Code. I’ve found I really need to know what I’m doing and the detail in order to make effective (safe) use out of it. And that’s been a learning curve.
The one area I hope/think it’s closest to (given comments above) is potentially as a “checker” or validator.
But even then I’d consider the extent to which it leaks data, steers me the wrong way, or misses something.
The other case may be mocking up a simple financial model for a test / to bounce ideas around. But without very detailed manual review (as a mitigating check), I wouldn’t trust it.
So yeah… that’s the experience of someone who maybe bridges these worlds somewhat… And I think many out there see the tough (detailed) road ahead, while these companies are racing to monetize.
I think the world would be a lot better off if excel weren’t in it. For example, I work at business with 50K+ employees where project management is done in a hellish spreadsheet literally one guy in Australia understands. Data entry errors can be anywhere and are incomprehensible. 3 or 4 versions are floating around to support old projects. A CRUD app with a web front end would solve it all. Yet it persists because Excel is erroneously seen as accessible whereas Rails, Django, or literally anything else is witchcraft.
Those are tuneable parameters. Turn down the temperature and top_p if you don't want the creativity.
> Claude is not going to improve this.
We can measure models vs humans and figure this out.
To your own point, humans already make "rampant" mistakes. With models, we can scale inference time compute to catch and eliminate mistakes, for example: run 6x independent validators using different methodologies.
One-shot financial models are a bad idea, but properly designed systems can probably match or beat humans pretty quickly.
To me, the case for LLMs is strongest not because LLMs are so unusually accurate and awesome, but because if human performance were put on trial in aggregate, it would be found wanting.
Humans already do a mediocre job of spreadsheets, so I don't think it is a given that Claude will make more mistakes than humans do.
Anyway, Google has already integrated Gemini into Sheets, and recently added direct spreadsheet editing capability so your comment was disproven before you even wrote it
I think you need to turn down the temperature a little bit. This could be a beneficial change.
It's one thing to fudge the language in a report summary, it can be subjective, however numbers are not subjective. It's widely known LLMs are terrible at even basic maths.
Even Google's own AI summary admits it which I was surprised at, marketing won't be happy.
Yes, it is true that LLMs are often bad at math because they don't "understand" it as a logical system but rather process it as text, relying on pattern recognition from their training data.
- Log in to the internal system that handles customer policies
- Find all policies that were bound in the last 30 days
- Log in to the internal system that manages customer payments
- Verify that for all policies bound, there exists a corresponding payment that roughly matches the premium.
- Flag any divergences above X% for accounting/finance to follow up on.
Practically this involves munging a few CSVs, maybe typing in a few things, setting up some XLOOKUPs, IF formulas, conditional formatting, etc.
Will AI replace the entire job? No...but that's not the goal. Does it have to be perfect? Also no...the existing employees performing this work are also not perfect, and in fact sometimes their accuracy is quite poor.
Spreadsheets work because the user sees the results of complex interconnected values and calculations. For the user, that complexity is hidden away and left in the background. The user just sees the results.
This would be a nightmare for most users to validate what changes an LLM made to a spreadsheet. There could be fundamental changes to a formula that could easily be hidden.
For me, that the concern with spreadsheets and LLMs - which is just as much a concern with spreadsheets themselves. Try collaborating with someone on a spreadsheet for modeling and you’ll know how frustrating it can be to try and figure out what changes were made.
not just in a spreadsheet, any kind of deterministic work at all.
find me a reliable way around this. i don't think there is one. mcp/functions are a band aid and not consistent enough when precision is important.
after almost three years of using LLMs, i have not found a single case where i didn't have to review its output, which takes as long or longer than doing it by hand.
ML/AI is not my domain, so my knowledge is not deep nor technical. this is just my experience. do we need a new architecture to solve these problems?
I was thinking along the same lines, but I could not articulate as well as you did.
Spreadsheet work is deterministic; LLM output is probabilistic. The two should be distinguished.
Still, its a productivity boost, which is always good.
This is talking about applying LLMs to formula creation and references, which they are actually pretty good at. Definitely not about replacing the spreadsheet's calculation engine.
Deleted Comment
Claude for Excel isn't doing maths. It's doing Excel. If the llm is bad at maths then teaching it to use a tool that's good at maths seems sensible.
Rightly so! But LLMs can still make you faster. Just don't expect too much from it.
high precision is possible because they can realize that by multiple cross validations
Deleted Comment
Dead Comment
Now, granted, that can also happen because Alex fat-fingered something in a cell, but that's something that's much easier to track down and reverse.
Privatized insurance will always find a way to pay out less if they could get away with it . It is just nature of having the trifecta of profit motive , socialized risk and light regulation .
Dead Comment
The issue isn’t in creating a new monstrosity in excel.
The issue is the poor SoB who has to spelunk through the damn thing to figure out what it does.
Excel is the sweet spot of just enough to be useful, capable enough to be extensible, yet gated enough to ensure everyone doesn’t auto run foreign macros (or whatever horror is more appropriate).
In the simplest terms - it’s not excel, it’s the business logic. If an excel file works, it’s because theres someone who “gets” it in the firm.
If it doesn't work well, I will do it myself, because I care that things are done well.
None of this is me being scared of being replaced; quite the opposite. I'm one of the last generations of programmers who learned how to program and can debug and fix the mess your LLM leaves behind when you forgot to add "make sure it's a clean design and works" to the prompt.
Okay, that's maybe hyperbole, but sadly only a little bit. LLMs make me better at my job, they don't replace me.
HN constantly points out the flaws, gaps, and failings of AI. But the same is true of any technology discussed on HN. You could describe HN as having an anti-technology bias, because HN complains about the failings of tech all day every day.
Quite the opposite, actually. You can always find five stories on the front page about some AI product or feature. Meanwhile, you have people like yourself who convince themselves that any pushback is done by people who just don't see the true value of it yet and that they're about to miss out!! Some kind of attempt at spreading FOMO, I guess.
If anything, HN, has a pro-AI bias. I don't know of any other medium where discussions about AI consistently get this much frontpage time, this amount of discussion, and this many people reporting positive experiences with it. It's definitely true that HN isn't the raging pro-AI hypetrain it was two years ago, but that shouldn't be mistaken for "strong anti-AI bias".
Outside of HN I am seeing, at best, an ambivalent reaction: plenty of people are interested, almost everyone tried it, very few people genuinely like it. They are happy to use it when it is convenient, but couldn't care less if it disappeared tomorrow.
There's also a small but vocal group which absolutely hates AI and will actively boycott any creative-related company stupid enough to admit to using it, but that crowd doesn't really seem to hang out on HN.
When US-East-1 failed, lots of people talked about how the lesson was cloud agnosticism and multi cloud architecture. The practical economic lesson for most is that if US-East-1 fails, nobody will get mad at you. Cloud failure is viewed as an act of god.
Everything isn't about money, I know that status and power are all you ai narcissists dream about. But you'll never be Bill Gates, nor will you be Elon Musk.
Once ai has gone the way of "Web3", "NFTs", "blockchain", "3D tvs", etc; You'll find a new grift to latch your life savings onto.
LLMs specialize in making up plausible things with a minimum of human effort, but their downside is that they're very good at making up plausible things which are covertly erroneous. It's a nightmare to troubleshoot.
There is already an abject inability to provision the labor to verify Excel reasoning when it's composed by humans.
I'm dead certain that Claude will be able to produce plausibly correct spreadsheets. How important is accuracy to you? How life-critical is the end result? What are your odds, with the current auditing workflow?
Okay! Now! Half of the users just got laid off because management thinks Claude is Good Enough. How about now?
https://www.theregister.com/2025/03/10/nz_health_excel_sprea...
[edit: Added link]
A lot of us have seen the effects of AI tools in the hands of people who don't understand how or why to use the tools. I've already seen AI use/misuse get two people fired. One was a line-of-business employee who relied on output without ever checking it, got herself into a pretty deep hole in 3 weeks. Another was a C suite person who tried to run an AI tool development project and wasted double their salary in 3 months, nothing to show for it but the bill, fired.
In both cases the person did not understand the limits of the tools and kept replacing facts with their desires and their own misunderstanding of AI. The C suite person even tried to tell a vendor they were wrong about their own product because "I found out from AI".
AI right now is fireworks. It's great when you know how to use it, but if you half-ass it you'll blow your fingers off very easily.
I'm not even sure that has to be true anymore. From my admittedly superficial impression of the page, this appears to be a tool for building tools. There are plenty of organizations that are resource constrained, that are doing things the way they have always done thing in Excel, simply because they cannot allocate someone to modify what is already in place to better suit their current needs. For them, this is more of a quality of life and quality of out improvement. This is not like traditional software development, where organizations are far more likely to purchase a product or service to do a job (and where the vendors of those products and services are going to do their best to eliminate developers).
- this opens up ridiculous flood of data that would otherwise be semi-private to one company providing this service - this works well small data sets, but will choke on ones it will need to divvy up into chunks inviting interesting ( and yet unknown ) errors
There is a real benefit to being able to 'talk to data', but anyone who has seen corporate culture up close and personal knows exactly where it will end.
edit: an i saying all this as as person, who actually likes llms.
Deleted Comment
Perhaps this is part of the negativity? This is a bad thing for the middle class.
*material benefit. In terms of spirit and purpose, the older I get the more I think maybe the Amish are on to something. Work gives our lives purpose, and the closer the work is to our core needs, the better it feels. Labor saving so that most of us are just entertaining each other on social networks may lead to a worse society (but hey, our material needs are met!)
Just as with copilot, this combines LLM's inability to repeatably do math correctly with peoples' overassurance in LLM's capabilities.
Excel and AIs are huge clusterfucks on their own, where insane errors happens for various reasons. Combine them, and maybe we will see improvement, but surely we will see catastrophic outcomes which could not only ruin the lives of ordinary people, whole companies and countries, as already happened before...
> these jobs are going to be the first on the chopping block as these integrations mature.
Those two things are maybe related? So many of my friends don't enjoy the same privileges as I do, and have a more tenuous connection to being gainfully employed.
Spreadsheets are an abstraction over a messy reality, lossy. They were already generalizing reality.
Now we generalize the generalization. It is this lossy reality that people are worried about with AI in HN.
Some people - normal people - understand the difference between the holistic experience of a mathematically informed opinion and an actual model.
It's just that normal people always wanted the holistic experience of an answer. Hardly anyone wants a right answer. They have an answer in their heads, and they want a defensible journey to that answer. That is the purpose of Excel in 95% of places it is used.
Lately people have been calling this "syncophancy." This was always the problem. Sycophancy is the product.
Claude Excel is leaning deeply into this garbage.
people think of privacy at first regards of data, local deployment of open source models are the first choice for them
When most of it is wild hallucinations? Not really.
For many employees leveraging Excel for manipulating important data, it could cripple careers.
For spreadsheets that influence financial decisions or touch PPI/PII, it could lead to regulatory disasters and even bankruptcies.
Purge hallucinations from LLMs, _then_ let it touch the important shite. Doing it in the reverse order is just begging for a FAFO apocalypse.
If this is true then why your wife is going to be happy about it? I found it really hard to understand. Do you prefer your wife to be jobless and her employer happily cut costs without impacting productivity? Even if it just replaces the line workers, do you think your wife is going to be safe?
I don't get it.
No offense, but this is pure fantasy. The level of analysis they typically provide doesn't suffer from the same high baseline level of completely made up numbers of your favorite LLM.
Versatility and efficiency explode while human usability tanks, but who cares at that point?
Who are these teams that can get value from Anthropic? One MCP and my context window is used up and Claude tells me to start a new chat.
The criticisms broadly fall between "spreadsheets are bad" and "AI will cause more trouble than it solves".
This release is a dot in a trend towards everyone having a Goldman-Sachs level analyst at their disposal 24/7. This is a huge deal for the average person or business. Our expectation (disclaimer: I work in this space) is that spreadsheet intelligence will soon be a solved problem. The "harder" problem is the instruction set and human <> machine prompting.
For the "spreadsheets are bad" crowd -- sure, they have problems, but users have spoken and they are the preferred interface for analysis, project management and lightweight database work globally. All solutions to "the spreadsheet problem" come with their own UX and usability tradeoffs, so it'a a balance.
Congrats to the Claude team and looking forward to the next release!
Based on the history of digitalization of businesses from the 1980s onwards, the spreadsheets will just balloon in number and size and there will be more rules and more procedures and more forms and reports to file until the efficiency gains are neutralized (or almost neutralized).
At one point, it generated a verbose formula and mentioned, off-handedly, that it would have been prettier had Calcapp supported LET. "It does!", I replied, "and as an extension, you can use := instead of , to separate names and values!") and it promptly rewrote it using our extended syntax, producing a sleek formula.
These templates were for various verticals, like real estate, financial planning and retail, and I would have been hard-pressed to produce them without Claude's domain knowledge. And I did it in a weekend! Well, "we" did it in a weekend.
So this development doesn't really surprise me. I'm sure that Claude will be right at home in Excel, and I have already thought about how great it would be if Claude Code found a permanent home in our app designer. I'm concerned about the cost, though, so I'm holding off for now. But it does seem unfair that I get to use Claude to write apps with Calcapp, while our customers don't get that privilege.
(I wrote more about integrating Claude Code here: https://news.ycombinator.com/item?id=45662229)
- Get answers about any cell in seconds: Navigate complex models instantly. Ask Claude about specific formulas, entire worksheets, or calculation flows across tabs. Every explanation includes cell-level citations so you can verify the logic.
- Test scenarios without breaking formulas: Update assumptions across your entire model while preserving all dependencies. Test different scenarios quickly—Claude highlights every change with explanations for full transparency.
- Debug and fix errors: Trace #REF!, #VALUE!, and circular reference errors to their source in seconds. Claude explains what went wrong and how to fix it without disrupting the rest of your model.
- Build models or fill existing templates: Create draft financial models from scratch based on your requirements. Or populate existing templates with fresh data while maintaining all formulas and structure.
Oh and deal with dates before 1900.
Excel is a gift from God if you stay in its lane. If you ever so slightly deviate, not even the Devil can help you.
But maybe, juuuuust maybe, AI can?
But maybe, juuuuust maybe, AI can?"
Bold assumption that the devil and AI aren't aligned ;)
Tried integrating chatgpt into my finance job to see how far I can get. Mega jikes...millions of dollars of hallucinated mistakes.
Worse you don't have the same tight feedback loop you've got in programming that'll tell you when something is wrong. Compile errors, unit tests etc. You basically need to walk through everything it did to figure out what's real and what's hallucinations. Basically fails silently. If they roll that out at scale in the financial system...interesting times ahead.
Still presumably there is something around spreadsheets it'll be able to do - the spreadsheet equivalent of boilerplate code whatever that may be
It's like one off scripts in a sense? I'm not doing complex formulas I just need to know how I can pull data into a sheet and then I'll bucketize or graph it myself.
Again probably because I'm not the most adept user but it has definitely been a positive use case for me.
I suspect my use case is pretty boilerplatey :)
>I'm not doing complex formulas
Neither am I frankly. Finance stuff can get conceptually complicated even with simple addition & multiplication though. e.g. I deal with a lot of offshore stuff, so the average spreadsheet is a mix of currencies, jurisdictions and companies that are interlinked. I could probably talk you through it high level in an hour with a pen & paper, but the LLMs just can't see the forest for all the trees in the raw sheet.
I keep searching for a sign, but everyone I talk to has horror stories. It sucks as a technologist that just wants to play with the thing; oh well.
The reason that Claude Code doesn't have an IDE is because ~"we think the IDE will obsolete in a year, so it seemed like a waste of time to create one."
Noam Shazeer said on a Dwarkesh podcast that he stopped cleaning his garage, because a robot will be able to do it very soon.
If you are operating under the beliefs these folks have, then things like IDEs, cleaning up, and customer service are stupid annoyances that will become obsolete very soon.
To be clear, I have huge respect for everyone mentioned above, especially Noam.
We all come up with excuses for why we haven't done a chore, but some of us need to sound a bit more plausible to other members of the household than that.
It would get about the same reaction as "I'm not going to wash the dishes tonight, the rapture is tomorrow."
How much is the robot going to cost in a year? 100k? 200k? Not mass market pricing for sure.
Meanwhile, today he could pay someone $1000 to clean his garage.
The money people pay in monthly fees to Anthropic for even the top Max sub likely doesn't come closer to covering the energy & infrastructure costs for running the system.
You can prove this to yourself by just trying to cost out what it takes to build the hardware capable of running a model of this size at this speed and running it locally. It's tens of thousands of dollars just to build the hardware, not even considering the energy bills.
So I imagine the goal right now is to pull in a mass audience and prove the model, to get people hooked, to get management and talent at software firms pushing these tools.
And I guess there's some in management and the investment community that thinks this will come with huge labour cost reductions but I think they may be dreaming.
... And then.. I guess... jack the price up? Or wait for Moore's Law?
So it's not a surprise to me they're not jumping to try and service individual subscribers who are paying probably a fraction of what it costs them to the run the service.
I dunno, I got sick of paying the price for Max and I now use the Claude Code tool but redirect it to DeepSeek's API and use their (inferior but still tolerable) model via API. It's probably 1/4 the cost for about 3/4 the product. It's actually amazing how much of the intelligence is built into the tool itself instead of just the model. It's often incredibly hard to tell the difference bertween DeepSeek output and what I got from Sonnet 4 or Sonnet 4.5
I have primarily been using the 120b gpt-oss model. It's definitely worse than Claude and GPT-5, but not by, like, an order of magnitude or anything. It's also clearly better than ChatGPT was when it first came out. Text generates a bit slowly, but it's perfectly usable.
So it doesn't seem so unreasonable to me that costs could come down in a few years?
Deleted Comment
Every AI company right now (except Google Meta and Microsoft) has their valuations based on the expectation of a future monopoly on AGI. None of their business models today or in the foreseeable horizon are even positive let alone world-dominating. The continued funding rounds are all apparently based on expectation of becoming the sole player.
The continuing advancement of open source / open weights models keeps me from being a believer.
I’ve placed my bet and feel secure where it is.
I've worked alongside sell-side investment bankers in a prior startup, and so much of the work is in taking a messy set of statements from a company, understanding the underlying assumptions, and building, and rebuilding, and rebuilding, 3-statement models that not only adhere to standard conventions (perhaps best introed by https://www.wallstreetprep.com/knowledge/build-integrated-3-... ) but also are highly customized for different assumptions that can range from seasonality to sensitivity to creative deal structures.
It is quite common for people to pull many, many all-nighters to try to tweak these models in response to a senior banker or a client having an idea! And one might argue there are way too many similar-looking numbers to keep a human banker from "hallucinating," much less an LLM.
But fundamentally, a 3-statement model and all its build-sheets are a dependency graph with loosely connected human-readable labels, and that means you can write tools that let an LLM crawl that dependency graph in a reliable and semantically meaningful way. And that lets you build really cool things, really fast.
I'm of the opinion that giving small companies the ability to present their finances to investors, the same way Fortune 500 companies hire armies of bankers to do, is vital to a healthy economy, and to giving Main Street the best possible chance to succeed and grow. This is a massive step in the right direction.
On the other hand, presenting truthful data to investors is distinctly not fraud, and this again does not depend on the generation method.
Standardized 3-statement models in Excel are designed to be auditable, with or without AI, because (to only slightly simplify) every cell is either a blue input (which must come from standard exports of the company's accounting books, other auditable inventory/CRM/etc. data, or a visible hardcoded constant), or a black formula that cannot have hardcoded values, and must be simple.
If every buyer can audit, with tools like this, that the formulas match the verbal semantics of the model, there's even less incentive than there is now to fudge the formula level. (And with Wall Street conventions, there's nowhere to hide a prompt injection, because you're supposed to keep every formula to only a few characters, and use breakout "build" rows that can themselves be visually audited.)
And sure, you could conceivably use any AI tool to generate a plausible list of numbers at the input level, but that was equally easy, and equally dependent on context to be fraudulent or not, ever since that famous Excel 1990 elevator commercial: https://www.youtube.com/watch?v=kOO31qFmi9A&t=61s
At the end of the day, the difference between "they want to see this growth, let's fudge it" and "they want to see this growth, let's calculate the exact metrics we need to hit to make that happen, and be transparent about how that's feasible" has always been a matter of trust, not technology.
Tech like this means that people who want to do things the right way can do it as quickly as people who wanted to play loose with the numbers, and that's an equalizer that's on the right side of history.
I think many software engineers overlook how many companies have huge (billion dollar) processes run through Excel.
It's much less about 'greenfield' new excel sheets and much more about fixing/improving existing ones. If it works as well as Claude Code works for code, then it will get pretty crazy adoption I suspect (unless Microsoft beats them to it).
So they can fire the two dudes that take care of it, lose 15 years of in house knowledge to save 200k a year and cry in a few months when their magic tool shits the bed ?
Massive win indeed
I think you're making an argument for LLMs, not against.
Until Microsoft does its anti-competitive thing and find a way to break this in the file format, because this is exactly what copilot in excel does.
That said, Copilot in Excel is pretty much hot garbage still so anything will be better than that.
I 100% believe generative AI can change a spreadsheet. Turn the xslx into text, mutate that, turn it back into an xslx, throw it away if it didn't parse at all. The result will look pretty similar to the original too, since spreadsheets are great at showing immediately local context and nothing else.
Also, we've done a pretty good job of training people that chatgpt works great, so there's good reason for them to expect claude for excel to work great too.
I'd really like the results of this to be considered negligence with non-survivable fines for the reckless stupidity, but more likely, it'll be seen as an act of god. Like all the other broken shit in the IT world.