Could someone explain the appeal of account-wide memory to me? Anthropic’s marketing indicates that nothing bleeds over, but I’m just so protective of my context that I cannot imagine having even a majorly distilled version of my other chats and preferences having on weight on the output. As for certain preferences like code styling or response length, these are all fit for custom instructions, with more detailed things in Skills. Ultimately like many things in LLM web UX, it seems to cater to how the masses use these tools.
Most normal people want the LLM to remember their interests and favourite things, so they don't have to manually re-explain when asking for advice.
They also don't know what "context" is or that the LLM has a limited number of tokens it can understand at any given time. They just believe it knows everything at once.
Do you have example prompts where this would be usual? Why would you want an LLM to know your favorite type of cheese? Now that I say that, I guess if you use it for recipes then it's useful if it remembers things like dietary restrictions. And even then a project seems like the better option.
I can't think of much else though so I'm still curious what you or others use it for.
In online Claude I often use incognito mode precisely because I don't want results to be influenced by what we talked about earlier. It's getting rather annoying to be honest.
Keep your user prefs minimal and use project memory instead: create a new project, it will only have access to your user prefs, everything else is fresh.
I'm switching from Claude Web to Claude Code. Local files give me memory I actually control, unlike Anthropic's implementation. CC doesn't carry state between sessions — you just put whatever project context it needs in a file.
The few times I've switched over to chatGPT I've been dumbfounded by lines like "...since you already are using SQLite...", referring to projects from months ago.
I know the "memory" function can be disabled, but I have a hard time seeing that it would ever really be useful.
I use Claude code in a number of different parts of my business - coding internal applications, acting as a direct interface to SaaS via APIs and just general internal use.
I find there is a virtuous cycle here where the more I use it, the more helpful it is. I fired my bookkeeper and have been using Claude with a QBO API key instead, and because it already had that context (along with other related business context), when I gave it the tax docs I gave to my CPA for 2024's taxes plus my return, and asked it to find mistakes, it determined that he did not depreciate goodwill from an acquisition. CPA confirmed this was his error and is amending my return.
Then I thought it'd be fun to see how it would do on constructing my 2024 return just from the same source docs my CPA had. First time I did it, it worked for an hour then said it had generated the return, checked it against the 2024 numbers and found they're the same. I had removed the 2024 before having it do this to avoid poisoning the context with the answers, but it turns out it had a worksheet .md file that it was using on prior questions that I had not erased (and then it admitted that it had started from the correct numbers).
In order to make sure I wouldn't have that issue again, I tried the 2024 return again, completely devoid of any historical context in a folder totally outside of my usual Claude Code folder tree. It actually got my return almost entirely correct, but it missed the very same deduction that it had caught my CPA missing earlier.
So for me, the buildup of context over time is fantastic and really leads to better results.
"Stop asking me to apply the plan. I will tell you when I'm ready."
That alone drives me batty. I can easily spend a couple hours and multiple revisions iterating on a plan. Asking me me every single time if I want to apply it is obnoxious.
On the contrary, I cannot understand how people are seriously using LLM outside of software engineering without account-wide memory.
When I ask things like "what do you think John should do next on project A?", I don’t want to have to explain in detail who is John, what is project A and what John was working on before.
I currently use ChatGPT for random insights and discussions about a variety of topics. The memory is basically a grown context about me and my preferences and interests and ChatGPT uses it to tailor responses to my knowledge, so I could relate better.
This is for me far more natural and easier than either craft a default prompt preset or create each conversation individually, that would be way too much overhead to discuss random shower thoughts between real life stuff.
This is my use case and I discovered that this can be detrimental to specific questions and prompts and I see that it can be more beneficial to have careful written prompts each time. But my use case is really ad hoc usage without the time. At least for ChatGPT.
When coding, this fails fast. There regular context resets seem to be a more viable strategy.
I see what you mean, but I like having a clean slate even for those one off questions. I don’t want a differing answer to a philosophical inquiry just because the LLM remembers a prior position I’ve written about you know?
I've told the LLMs that, when traveling, I don't care about nightlife and alcohol. Because they have a memory of this, when I ask for a sample itinerary for a 2 day stay in a new city, it won't waste hours in the day on the party street, wine tasting, etc.
For example, instead of recommending a popular night club, it will recommend the stroll along the river to view the lit up skyline or to visit the night market instead.
It knows other preferences as well (exploring quirky neighborhoods, trying local fast food joints and markets)
It all depends on your usecase(s). For me, "account-wide" memory has only: (a) short description of my hardware/os/display system/etc; (b) mobile hardware and os version; and (c) my age, gender, city/country of residence, and health conditions.
Because I can say “do what you did before, but about the romans this time”
And it will give me a complete rundown of Roman life, because it knows what I was interested in before.
Or you can ask a tax question and it will know you’re an organic rice farmer or whatever. Claude has the best implementation because it has both memory, and previous chat searching. So it will actually read through relevant chats, rather than guessing based on memories.
I own a lot of dirt bikes, boats, snowmobiles, mowers, and blowers. It's much easier for me to ask about "My Polaris" than it is to ask about my "2011 Polaris Switchback Assault".
Similarly, it remembers the dimensions of my truck, so towing/loading questions don't need extra clarification.
Think of things like your preferred units (meters, kg, cups, tablespoons, milliliters). Or, do not suggest recipes with x ingredient. Language preferences. Etc etc etc.
Well, the masses are wrong. See: insane amounts of compute wasted on “thank you”, “haha true”, “redo it”, etc. I think the UI should be designed to avoid misuse, and I think an ever growing distillation of your most common traits is not a good use of context length. If you want it, specify it. Maybe even hard limits on chat length, why are we 20 replies deep in a single chat? A user friendly option could be a single button that distills that chat down, and opens a new one with prebuilt instructions to continue the conversation. I’m no product designer though, just some thoughts.
This seems to imply that customers assume by default that the LLM remembers their past chats? I feel like the UI makes it incredibly obvious it’s a clean slate every time? But then again people ask ridiculous meta questions all the time to these chatbots expecting a correct answer.
I'm moving to another service and need to export my data. List every memory you have stored about me, as well as any context you've learned about me from past conversations. Output everything in a single code block so I can easily copy it. Format each entry as: [date saved, if available] - memory content. Make sure to cover all of the following — preserve my words verbatim where possible: Instructions I've given you about how to respond (tone, format, style, 'always do X', 'never do Y'). Personal details: name, location, job, family, interests. Projects, goals, and recurring topics. Tools, languages, and frameworks I use. Preferences and corrections I've made to your behavior. Any other stored context not covered above. Do not summarize, group, or omit any entries. After the code block, confirm whether that is the complete set or if any remain.
Why wouldn't a smart OpenAI PM simply add something "nefarious" on the frontend proxy to "slow down" any requests with exactly that prompt?
I bet they would get their yearly bonus by achieving their KPI goals.
I think they already are. When I used the prompt with 5.2 it gives very concise and general info but if you use older models (5.1 instant or o3) you get a ton of detail.
They can, but then you could tell it to “don’t not do what I’m asking” and force it through. It’s not exactly “programming” with these systems, it’s all just slop.
And the reputational harm would outweigh the benefits of trying to fuck over people leaving.
I tried all of Codex, OpenCode, Claude Code and Cursor these past few weeks. It was surprising to me that all of them have slightly different conventions for where to put skills, how to format MCP servers (how environment variables need to be specified etc), what the AGENTS/CLAUDE file needs to be called, what plugins/marketplaces are...it's a big mess for anyone trying to have a portable config in their dotfiles that can universally apply to any current and future agent.
It also showed me the difference between expectation and reality...even though these are billion dollar companies, they still haven't figured out how to make lag-free TUIs, non-Electron apps, or even respect XDG_CONFIG. The focus is definitely more on speed and stuffing these tools full of new discoveries and features right now
There's a bit of psychology around models vs. harnesses as well. You can't shake off the feeling that maybe Claude would perform better in its native harness compared to VSCode/OpenCode. Especially because they've got so many hidden skills (like the recently introduced /batch), that seem baked into the binary?
The last thing I can't figure out is computer use. Apparently all the vendors say that their models can use a mouse and keyboard, but outside of the agent-browser skill (which presumably uses playwright), I can't figure out what the special sauce is that the Cloud versions of these Agents are using to exercise programs in a VM. That is another reason why there is a switching cost between vendors.
Before this week I was sure Anthropic were actually just as soulless as OpenAi, just because they don't support open standards like AGENTS.md and /.agents/skills. They can so easily win the support of the open source crowd if they just support open standards like these.
Big projects should have a lot of nested AGENTS.md files, it's inconvenient and they simply need to add support for the universal standard as everyone else has done rather than being a weird holdout like IE6.
I felt that way too, until I noticed how different their schemes are for discovering these files, e.g. Claude will pick up context files in parent folders, and Codex doesn’t.
Maybe it’s better that they maintain different names to prevent people from assuming that they work the same
Now that would make it easier for Codex users to switch indeed! This seems like the best timing for it they're ever gonna get, and worth the ultra tiny loss of marketing value their "CLAUDE.md" naming provides.
For the Anthropic employees here reading along, pitch it to whoever has kept blocking this, because you need to get the most out of this opportunity here.
Why would they? They were first with CLAUDE.md. Others could have adopted to that if they wanted. Don’t see a reason for Claude to change their approach.
Being a good citizen of the commons means not hard-coding things specific about your product as a standard. ChatGPT or Gemini using a file called "CLAUDE" doesn't make sense. The first mover doesn't just automatically win.
I switched to Claude but the token efficiency and limits are much more noticeable. One or two coding questions and I'm at my session limit. And that is shared with chat too.
I was mostly able to get by with $20 codex but I'll probably have to splurge for the Max plan.
Huh, I didn't know about that. I'm trying Claude Pro for the first time while comparing it against ChatGPT and I'm (sadly) not impressed at the moment.
When I asked both Codex and Claude Code to "look into" an issue of medium-to-high-complexity in a code base, Codex went with the fix I had in mind and directly and made code changes without being asked or at least asking for permission. It only used a few percents of its 5-hour limits to do it, on `High`.
Claude in the meanwhile misdiagnosed the core of the issue on its first pass (even on Opus 4.6 + Thinking). I had to guide it in the right direction and despite being given the 'answer', it was quite a long process compared to Codex' one-shot. And it hit the 5h limit before being able to finish solving the issue.
Hmm, I had the opposite experience when I tried Codex 5.2 after using Claude for almost a year. Codex was on par or better for me at coding, and seemingly a magnitude cheaper.
I already switched to claude a while ago. Didn’t bring along any context, just switched subscriptions, walked away from chatgpt and haven’t touched it again. Turned out to be a non-event, there really is no moat.
I switched not because I thought Claude was better at doing the things I want. I switched because I have come to believe OpenAI are a bad actor and I do not want to support them in any way. I’m pretty sure they would allow AGI to be used for truly evil purposes, and the events of this week have only convinced me further.
Yesterday was my first time trying it. One thing that felt a bit strange to me was that I asked it something and the response was just one paragraph. Which isn't bad or anything but it felt... strange? Like I always need to preface ChatGPT/gemini/whatever question with "Briefly, what is..." or it gives me enough fluff to fill a 5 page high school essay. But I didn't need to do that and just got an answer that was to the point and without loads of shit that's barely related.
And the weirdest thing that I noticed: instead of skimming the response to try finding what was relevant, I just straight up read it. Kind of felt like I got a slight amount of focus ability back.
Accuracy is something I can't really compare yet (all chatbots feel generally the same for non-pro level queries), but so far, I'm fairly satisfied.
I use Gemini all the time, but I have to say it's got verbal diarrhea and an EXTREMELY annoying trait of wanting to lead the conversation rather than just responding to what YOU want to do. At the end of every response Gemini will always suggest a "next step", in effect trying to 2nd guess where you want the conversation to go. I'd much rather have an AI that just did what it was asked, and let me decide what to ask next (often nothing - maybe it was just a standalone question!).
Apparently this annoying "next step" behavior is driven by the system prompt, since the other day I was running Gemini 3 Thinking, and it was displaying it's thoughts which included a reminder to itself to check that it was maintaining a consistent persona, and to make sure that it had suggested a next step. I'd love to know the thought process of whoever at Google thought that this would make for a natural or useful conversation flow! Could you imagine trying to have a conversation with a human who insisted on doing this?!
Heh, a while ago I wondered why ChatGPT had started to reply tersely, almost laconically. Then I remembered that I had explicitly told it to be brief by default in the custom personality settings… I also noticed that there are now various sliders to control things like how many emojis or bulletpoint lists ChatGPT should use, which I though was amusing. Anyway, these tools can be customized to adopt just about any style, there's no need to always prefix questions with "Briefly" or similar.
Yeah, I've always been a little confused why people use ChatGPT so heavily. It's better than it used to be (maybe thanks to custom configuration), but it still tends to respond like it's writing a Wikipedia article.
Wikipedia articles on demand are great, but not usually what I want.
Yep the experience is quite something. Another thing I've noticed, and you likely soon will also, is that Claude only attempts a follow-up if the one is needed or the prompt is structured for it. Meanwhile ChatGPT always prompts you with a choice of next steps. It can be nice, as sometimes the options contain improvements you never thought of and would like, but in lengthy conversations with a detailed plan it does things really piecemeal, as though trained to maximize engagement instead of getting to a final solution.
> Which isn't bad or anything but it felt... strange?
On the contrary, it's great. It's fully capable of outputting a wall of text when required, so instead of feeling like I'm talking to something that has a minimum word count requirement, I get an appropriate sized response to the task at hand.
That tracks for me; longtime claude, claude code pro subscriber (not all of it has been good - but that's neither here nor there).
Over the last few iterations of Sonnet and Opus, anthropic has definitely trained me to ask it to explain something "in detail" (or even "in great detail") when I want as much nuance as possible.
It used to be the inverse - way too much detail when I didn't want it.
In my limited experience, that's mostly since the 4.6 release. I noticed that with the same prompt, it answers much more briefly. A bit jarring indeed, but I prefer it. Less bs and filler, and less burning off electricity for nothing.
But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:
1. It changed without noticing things like "Touple.First.Date.Created" and "Touple.Second.Date.Created" and it rendered the code unworking by chaning to "Touple.FirstDate" and "Touple.SecondDate"
2. There was a const list of 12 definitions for a given context, when telling to rewrite the function it just cut 6 of these 12 definitions, making the code not compiling - I asked why they were cut: "Sorry, I was just too lazy typing" ?? LOL
3. There is a list include holding some items "_allGlobalItems" - it changed the name in the function simply to "_items", code didnt compile
As said, a working version of a similar function was given upfront.
I have used Claude (incl. Opus 4.6) fairly extensively, and Claude still spits out quality that is far below what I would call production ready - both littered with smaller issues, but also the occasional larger blunder. Particularly when doing anything non-trivial, and even when guiding it in detail (although that admittedly reduces the amount of larger structural issues).
Maybe it is tech stack dependent (I have mostly used it with C#/.NET), but I have heard people say the same for C#. The only conclusion I have been able to draw from this, is that people have very different definitions of production ready, but I would really like to see some concrete evidence where Claude one-shots a larger/complex C# feature or the like (with or without detailed guidance).
> Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:
One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.
That said I do personally feel Claude to produce far better results than competitors.
That's been my experience too. I'm using the recent free trial of OpenAI Plus to vibe code, and from this I would say that if Claude Code is a junior with 1-3 years of experience, OpenAI's Codex is like a student coder.
> But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled promp
That's not a moat though. Claude itself wasn't there 6 months ago and there's no reason to think Chinese open models won't be at this level in a year at most.
To keep its current position Claude has to keep improving at the same pace as the competitor.
I wrote off ChatGPT/OpenAI because of Sam Altman and those eyeball scan things - so sort of even before all this was a rage and centre stage. Sometimes it's just the gut feeling, and while it may not always be accurate, if something doesn't "feel" right, maybe it is not right. No one else is all good either, but what I mean to say is there are some entities/people who repeatedly don't feel right, have things attached to them that never felt right, etc., and you get a combined "gut feeling". At least that's how it was for me.
I tried Claude recently (after they dropped the nonsensical requirement to give them your phone number) and I was surprised to see how significantly less sycophant it was. Chatgpt, unless you are talking hard science, tends to be overly agreeable. Claude questions you a lot (you ask for x and it asks you stuff like: why are you interested in x, or based on our previous convo, x might not be suitable to you, or I see your point but based on our previous convo, y is better than x, etc). Chatgpt rarely does that.
Of course, also OpenAI being ran by openly questionable people while Dario so far doesn't seem nowhere near as bad even if none of them are angels.
One day I'd like to create a server in my basement that just runs a few really really nice models, and then get some friends and CO workers to pay me $10 a month for unlimited access.
All with the understanding that if you hog the entire server I'm going to kick you off, and if you generate content that makes the feds knock on my door I'm turning over the server logs and your information. Don't be an idiot, and this can be a good thing between us friends.
It would be like running a private Minecraft server. Trust means people can usually just do what they want in an unlimited way, but "unlimited" doesn't necessarily mean you can start building an x86 processor out of redstone and lagging the whole server. And you can't make weird naked statues everywhere either.
Usually these things aren't issues among a small group. Usually the private server just means more privacy and less restriction.
Amazing how analogous this is to the early Internet when people started running web servers out of their basement and then eventually graduated up to being their own dial-in ISP…
Yes they have a great marketing team and a powerful astro turfing presence though, especially with the recent "Claude beat up OpenClaw! OpenAI is supporting the community by buying it!" and that nonsense.
Though tbh I hardly feel Claude is innocent either. When their safety engineer/leader left, I didn't see any statements from the Anthropic team not one addressing the legitimate points of his for why he left. Instead we got an eager over-push in the media cycle of "Anthropic standing up to DOD! Here's why you can trust us!"
It's all sounds too similar to propaganda and astroturfing to me.
> I’m pretty sure they would allow AGI to be used for truly evil purposes
It's perfectly possible that 'truly evil purposes' were the goal all along. Slogans and ethics departments are mere speed bumps on the way to generational wealth.
I know this is necessarily a very unpopular opinion however.
I think HN in particular as a crowd are very vulnerable to the halo effect and group think when it comes to Anthropic.
Even being generous they are only very minimally a "better actor" than OpenAI.
However, we are so enthralled by their product that we tend to let the view bleed over to their ethics.
Saying we want out tools used in line with the US constitution within the US on one particular point. Is hardly a high moral bar, it's self preservation.
All Anthropic have said is:
1. No mass domestic surveillance of Americans.
2. No fully autonomous lethal weapons yet.
My goodness that's what passes for a high moral standard? Really anything that doesn't hit those very carefully worded points is not "evil"?
Lets generalise a bit more here - every company at any time could completely heel-turn and do awful things. Even my favourite private companies (e.g. Valve) have done things that I would consider evil.
However, I would think I'm not alone in that I'm generally wanting to do good while also wanting convenience, I know that really every bit of consumption I do is probably negative in some ways, and there is no real "apolitical" action anyone can take.
But can't I at least get annoyed and take my money somewhere else for the short amount of time another company is doing it better?
Yes, if openAI suddenly leaps forwards with codex and pounds anthropic into the dust, I'll likely switch back despite my moral grievances, but in a situation where I can get mildly motivated to jump over for something that - to me - seems like a better morality without much punishment to me, I'll do it.
Well, they did stand up to the US administration and lost a lot of money in the process. That takes courage. They clearly were being bullied into compliance, and they stood their ground.
You can see the significance of this is you look at German Nazi history. If more companies had stood up to the administration, the Nazi state would have been significantly harder to build.
In my opinion, what Anthropic did is not a small thing at all.
Let's not forget they also lobby to forbid models from China and pretend that distillation is stealing. but somehow just because they said no to two points the majority of HN folks think them as virtuous.
I never understood the point of this kind of comment. It doesn't add any value or anything to the discussion. Its basically two paragraphs with some presupposition (openai bad) and how the author is virtuous by canceling his subscription. No explanation, argument, nuance. Its just virtue signaling. Actually... I guess I do know the point of this kind of comment. I just don't know why these kinds of comments get upvoted, even if you do agree openai bad
I'm switching over to Claude from OpenAI, and I don't care. OpenAI's image generation is terrible anyway. Just try to get it to generate something to scale, like a cabinet for a specific kitchen or bathroom space. Give it all the explicit constraints, initial sketches, etc. it wants.
The results are laughably bad.
Sure, it does get some of the tones and features, but any kind of actual real-world constraint is so far off, and the dimension indicators it includes are hilarious if they weren't so bad.
I did the same thing and cancelled my OpenAI plan today. Besides boycotting it for their latest grifting I also found it to not really produce much value in my use cases.
Moving back to doing this archaic thing called using my own brain to do my work. Shocking.
"Please don't sneer, including at the rest of the community." It's reliably a marker of bad comments and worse threads.
"When disagreeing, please reply to the argument instead of calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."
OpenAI actually does have two excellent OSS models. Not Anthropic. Not that OpenAI is 'open' per se, but more so than Anthropic. Also see the Codex vs Claude Code extensibility.
> I swear HN is just a bunch of fanboys full of NPC behavior.
Why are you assuming these are real people and not NPCs?
The amount of money flowing around AI is staggering. To believe that the AI companies aren't flooding all the social media zones with propaganda is disingenuous.
I’m pretty divided on “memory”. There are times it can feel almost magical but more often than not I feel like I am fighting with the steering wheel.
Whenever I’m in a conversation and it references something unrelated (or even related) I get the “ick”. I know how context poisoning (intentional or not) works and I work hard to only expose things to the model that I want it to consider.
There have been many times that I’ve started a fresh chat as to not being along the baggage (or wrong turns) of a previous chat but then it will say “And this should work great for <thing I never mentioned in THIS chat>” and at that moment my spidey-sense tingles and I start wondering “Crap, did it come to the conclusion it did based mostly/only on the new context or did it “take a shortcut” and use context from another chat?
Like I said, I go out of my way to not “lead the witness” and so when the “witness” can peek at other conversations, all my caution is for naught.
I encourage everyone to go read the saved memories in their LLM of choice, I’ve cleaned out complete crap from there multiple times. Actually wrong information, confusing information, or one-off things I don’t want influencing future discussions.
The custom (or rather addition to the) system prompt is all I feel comfortable with. Where I give it some basic info about the coding language I prefer and the OSes that I’m often working with so that I don’t have to constantly say “actually this is FreeBSD” or “please give that to me in JS/TS instead of Python”.
The only thing that has, so far, kept me from turning off memory is that I’m always slightly cautious of going off the beaten path for something so new and moving so fast. I often want to have as close to the “stock” config since I know how testing/QA works at most places (the further off the beaten path you, the more likely you’ll run into bugs). Also so that I can experience when everyone else is experiencing (within reason).
Lastly, because, especially with LLMs, I feel like the people that over customize end up with a fragile systems. I think that a decent portion of the “N+1 model is dumber” or “X model has really gone downhill” is partially due to complicated configs (system prompts, MCP, etc) that might have helped at some point (dumber model, less capability) but are a hindrance to newer models. That or they never worked and someone just kept piling on more and more thinking it would help.
I've been thinking this too. I frequently do deep research on some systems programming technique, ask it to generate a .md for it, and then I use that in later sessions with Claude Code "look at the research I collected in {*-research}.md and help me explore ways to apply it to {thing}".
At the research step it frequently (always?) uses memory to direct/scope the research to what I typically work on, but I think that kind of pigeon holes the model and what it explores. And the memory doesn't quite capture all the areas I'm interested in, or want to directly apply the research to.
And regarding the crap in memories, I found the same. Mine at work mentioned I'm an expert at a business domain I have almost zero experience with.
I feel like the companies building this stuff accept a lot of "slop" in their approach, and just can't see past building things by slopping stuff into prompts. I wish they'd explore more rigid approaches. Yes, I understand "the bitter lesson" but it seems obvious to me some traditional approaches would yield better results for the foreseeable future. Less magic (which is just running things through the cheapest model they have and dumping it in every chat). It seems like poison.
Also, agent skills are usually pure slop. If you look through https://skills.sh on a framework/topic you're knowledgeable in you'll be a bit disheartened. This stuff was pioneered by people who move fast, but I think it's now time to try and push for quality and care in the approach since these have gotten good enough to contribute to more than prototype work.
I got very excited when I saw this title, because I've wanted to consolidate on Claude for a long time. I have been using ChatGPT very extensively for Q&A for 2+ years and I have hundreds of long, very technical conversations which I constantly search and refer to.
The problem (for me, anyway) is that even several megabytes worth of quality "memory" data on my profile would not allow me to migrate if it can't also confidently clone all of my chat history with it.
To be clear, this is a big enough problem that I would immediately pay low three digits dollars to have this solved on my behalf. I don't really want any of the providers to have a walled garden of all my design planning conversations, all of my PCB design conversations. Many are hundreds of prompts long. A clean break is not even remotely palatable short of OAI going full evil.
Look, I'd find it convenient for Claude to have a powerful sense of what I've been working on from conversation #1 onwards. But I absolutely refuse to bifurcate my chat history across multiple services. There is a tier list of hells, and being stuck on ChatGPT is a substantially less painful tier than needing to constantly search two different sites for what's been discussed.
If you want your conversation history I think we could figure something out with headless browser automation. I would be hesitant to use their wire protocols directly.
This should in theory be solveable by using a custom frontend and only using the various backend APIs as stateless inference providers, but everything I've tested falls flat on a few aspects: Chat history RAG and web search, and to a lesser extent tool use.
Yes, all of these are theoretically possible (the APIs now all support web search, as far as I know, there are RAG APIs too, and tool use has been supported for a while), but the various "chat" models just seem to be much better at using their first-party tools than any third-party harness, which makes sense that this is what they've been trained on.
I've had friends suggest a custom frontend several times, but unless that frontend starts off by faithfully downloading and recreating my entire chat history... now I just have two problems.
They also don't know what "context" is or that the LLM has a limited number of tokens it can understand at any given time. They just believe it knows everything at once.
I can't think of much else though so I'm still curious what you or others use it for.
I know the "memory" function can be disabled, but I have a hard time seeing that it would ever really be useful.
I find there is a virtuous cycle here where the more I use it, the more helpful it is. I fired my bookkeeper and have been using Claude with a QBO API key instead, and because it already had that context (along with other related business context), when I gave it the tax docs I gave to my CPA for 2024's taxes plus my return, and asked it to find mistakes, it determined that he did not depreciate goodwill from an acquisition. CPA confirmed this was his error and is amending my return.
Then I thought it'd be fun to see how it would do on constructing my 2024 return just from the same source docs my CPA had. First time I did it, it worked for an hour then said it had generated the return, checked it against the 2024 numbers and found they're the same. I had removed the 2024 before having it do this to avoid poisoning the context with the answers, but it turns out it had a worksheet .md file that it was using on prior questions that I had not erased (and then it admitted that it had started from the correct numbers).
In order to make sure I wouldn't have that issue again, I tried the 2024 return again, completely devoid of any historical context in a folder totally outside of my usual Claude Code folder tree. It actually got my return almost entirely correct, but it missed the very same deduction that it had caught my CPA missing earlier.
So for me, the buildup of context over time is fantastic and really leads to better results.
That alone drives me batty. I can easily spend a couple hours and multiple revisions iterating on a plan. Asking me me every single time if I want to apply it is obnoxious.
I currently use ChatGPT for random insights and discussions about a variety of topics. The memory is basically a grown context about me and my preferences and interests and ChatGPT uses it to tailor responses to my knowledge, so I could relate better.
This is for me far more natural and easier than either craft a default prompt preset or create each conversation individually, that would be way too much overhead to discuss random shower thoughts between real life stuff.
This is my use case and I discovered that this can be detrimental to specific questions and prompts and I see that it can be more beneficial to have careful written prompts each time. But my use case is really ad hoc usage without the time. At least for ChatGPT.
When coding, this fails fast. There regular context resets seem to be a more viable strategy.
For example, instead of recommending a popular night club, it will recommend the stroll along the river to view the lit up skyline or to visit the night market instead.
It knows other preferences as well (exploring quirky neighborhoods, trying local fast food joints and markets)
And it will give me a complete rundown of Roman life, because it knows what I was interested in before.
Or you can ask a tax question and it will know you’re an organic rice farmer or whatever. Claude has the best implementation because it has both memory, and previous chat searching. So it will actually read through relevant chats, rather than guessing based on memories.
Similarly, it remembers the dimensions of my truck, so towing/loading questions don't need extra clarification.
It's the small things.
Are you suggesting that they should ignore the needs of the vast majority of their users?
I mean, of course they do, it would be worse otherwise
I bet they would get their yearly bonus by achieving their KPI goals.
And the reputational harm would outweigh the benefits of trying to fuck over people leaving.
It also showed me the difference between expectation and reality...even though these are billion dollar companies, they still haven't figured out how to make lag-free TUIs, non-Electron apps, or even respect XDG_CONFIG. The focus is definitely more on speed and stuffing these tools full of new discoveries and features right now
There's a bit of psychology around models vs. harnesses as well. You can't shake off the feeling that maybe Claude would perform better in its native harness compared to VSCode/OpenCode. Especially because they've got so many hidden skills (like the recently introduced /batch), that seem baked into the binary?
The last thing I can't figure out is computer use. Apparently all the vendors say that their models can use a mouse and keyboard, but outside of the agent-browser skill (which presumably uses playwright), I can't figure out what the special sauce is that the Cloud versions of these Agents are using to exercise programs in a VM. That is another reason why there is a switching cost between vendors.
The /.agents/skills issue for claude code is here: https://github.com/anthropics/claude-code/issues/16345
Their automatic close bot will close it soon as it's been three weeks since the last comment.
I have seen quite a few open source projects do this. It works quite well.
Another alternative is to create CLAUDE.md with the exact contents: "@AGENTS.md"
Maybe it’s better that they maintain different names to prevent people from assuming that they work the same
For the Anthropic employees here reading along, pitch it to whoever has kept blocking this, because you need to get the most out of this opportunity here.
Deleted Comment
I was mostly able to get by with $20 codex but I'll probably have to splurge for the Max plan.
Huh, I didn't know about that. I'm trying Claude Pro for the first time while comparing it against ChatGPT and I'm (sadly) not impressed at the moment.
When I asked both Codex and Claude Code to "look into" an issue of medium-to-high-complexity in a code base, Codex went with the fix I had in mind and directly and made code changes without being asked or at least asking for permission. It only used a few percents of its 5-hour limits to do it, on `High`.
Claude in the meanwhile misdiagnosed the core of the issue on its first pass (even on Opus 4.6 + Thinking). I had to guide it in the right direction and despite being given the 'answer', it was quite a long process compared to Codex' one-shot. And it hit the 5h limit before being able to finish solving the issue.
I switched not because I thought Claude was better at doing the things I want. I switched because I have come to believe OpenAI are a bad actor and I do not want to support them in any way. I’m pretty sure they would allow AGI to be used for truly evil purposes, and the events of this week have only convinced me further.
And the weirdest thing that I noticed: instead of skimming the response to try finding what was relevant, I just straight up read it. Kind of felt like I got a slight amount of focus ability back.
Accuracy is something I can't really compare yet (all chatbots feel generally the same for non-pro level queries), but so far, I'm fairly satisfied.
Apparently this annoying "next step" behavior is driven by the system prompt, since the other day I was running Gemini 3 Thinking, and it was displaying it's thoughts which included a reminder to itself to check that it was maintaining a consistent persona, and to make sure that it had suggested a next step. I'd love to know the thought process of whoever at Google thought that this would make for a natural or useful conversation flow! Could you imagine trying to have a conversation with a human who insisted on doing this?!
Wikipedia articles on demand are great, but not usually what I want.
On the contrary, it's great. It's fully capable of outputting a wall of text when required, so instead of feeling like I'm talking to something that has a minimum word count requirement, I get an appropriate sized response to the task at hand.
Over the last few iterations of Sonnet and Opus, anthropic has definitely trained me to ask it to explain something "in detail" (or even "in great detail") when I want as much nuance as possible.
It used to be the inverse - way too much detail when I didn't want it.
For ChatGPT and Gemini, yes.
But for Claude, they have a very deep & big one: Its the only model that gets production ready output on the first detailled prompt. Yesterday I used my tokens til noon, so I tried some output from Gemini & Co. I presented a working piece of code which is already in production:
1. It changed without noticing things like "Touple.First.Date.Created" and "Touple.Second.Date.Created" and it rendered the code unworking by chaning to "Touple.FirstDate" and "Touple.SecondDate"
2. There was a const list of 12 definitions for a given context, when telling to rewrite the function it just cut 6 of these 12 definitions, making the code not compiling - I asked why they were cut: "Sorry, I was just too lazy typing" ?? LOL
3. There is a list include holding some items "_allGlobalItems" - it changed the name in the function simply to "_items", code didnt compile
As said, a working version of a similar function was given upfront.
With Claude, I never have such issues.
Maybe it is tech stack dependent (I have mostly used it with C#/.NET), but I have heard people say the same for C#. The only conclusion I have been able to draw from this, is that people have very different definitions of production ready, but I would really like to see some concrete evidence where Claude one-shots a larger/complex C# feature or the like (with or without detailed guidance).
One does often hear that where LLMs shine is with greenfield code generation but they all start to struggle working with pre-existing code. It could be that this wasn't a like for like comparison.
That said I do personally feel Claude to produce far better results than competitors.
That's not a moat though. Claude itself wasn't there 6 months ago and there's no reason to think Chinese open models won't be at this level in a year at most.
To keep its current position Claude has to keep improving at the same pace as the competitor.
That's, just, like, your opinion, man.
Dead Comment
Of course, also OpenAI being ran by openly questionable people while Dario so far doesn't seem nowhere near as bad even if none of them are angels.
One day I'd like to create a server in my basement that just runs a few really really nice models, and then get some friends and CO workers to pay me $10 a month for unlimited access.
All with the understanding that if you hog the entire server I'm going to kick you off, and if you generate content that makes the feds knock on my door I'm turning over the server logs and your information. Don't be an idiot, and this can be a good thing between us friends.
It would be like running a private Minecraft server. Trust means people can usually just do what they want in an unlimited way, but "unlimited" doesn't necessarily mean you can start building an x86 processor out of redstone and lagging the whole server. And you can't make weird naked statues everywhere either.
Usually these things aren't issues among a small group. Usually the private server just means more privacy and less restriction.
Though tbh I hardly feel Claude is innocent either. When their safety engineer/leader left, I didn't see any statements from the Anthropic team not one addressing the legitimate points of his for why he left. Instead we got an eager over-push in the media cycle of "Anthropic standing up to DOD! Here's why you can trust us!"
It's all sounds too similar to propaganda and astroturfing to me.
It's perfectly possible that 'truly evil purposes' were the goal all along. Slogans and ethics departments are mere speed bumps on the way to generational wealth.
I think HN in particular as a crowd are very vulnerable to the halo effect and group think when it comes to Anthropic.
Even being generous they are only very minimally a "better actor" than OpenAI.
However, we are so enthralled by their product that we tend to let the view bleed over to their ethics.
Saying we want out tools used in line with the US constitution within the US on one particular point. Is hardly a high moral bar, it's self preservation.
All Anthropic have said is:
1. No mass domestic surveillance of Americans.
2. No fully autonomous lethal weapons yet.
My goodness that's what passes for a high moral standard? Really anything that doesn't hit those very carefully worded points is not "evil"?
However, I would think I'm not alone in that I'm generally wanting to do good while also wanting convenience, I know that really every bit of consumption I do is probably negative in some ways, and there is no real "apolitical" action anyone can take.
But can't I at least get annoyed and take my money somewhere else for the short amount of time another company is doing it better?
Yes, if openAI suddenly leaps forwards with codex and pounds anthropic into the dust, I'll likely switch back despite my moral grievances, but in a situation where I can get mildly motivated to jump over for something that - to me - seems like a better morality without much punishment to me, I'll do it.
You can see the significance of this is you look at German Nazi history. If more companies had stood up to the administration, the Nazi state would have been significantly harder to build.
In my opinion, what Anthropic did is not a small thing at all.
Deleted Comment
The results are laughably bad.
Sure, it does get some of the tones and features, but any kind of actual real-world constraint is so far off, and the dimension indicators it includes are hilarious if they weren't so bad.
For marketing or personal stuff I do sometimes want images, but I don't really mind going somewhere else for that
Dead Comment
Moving back to doing this archaic thing called using my own brain to do my work. Shocking.
"When disagreeing, please reply to the argument instead of calling names. 'That is idiotic; 1 + 1 is 2, not 3' can be shortened to '1 + 1 is 2, not 3."
https://news.ycombinator.com/newsguidelines.html
Why are you assuming these are real people and not NPCs?
The amount of money flowing around AI is staggering. To believe that the AI companies aren't flooding all the social media zones with propaganda is disingenuous.
Whenever I’m in a conversation and it references something unrelated (or even related) I get the “ick”. I know how context poisoning (intentional or not) works and I work hard to only expose things to the model that I want it to consider.
There have been many times that I’ve started a fresh chat as to not being along the baggage (or wrong turns) of a previous chat but then it will say “And this should work great for <thing I never mentioned in THIS chat>” and at that moment my spidey-sense tingles and I start wondering “Crap, did it come to the conclusion it did based mostly/only on the new context or did it “take a shortcut” and use context from another chat?
Like I said, I go out of my way to not “lead the witness” and so when the “witness” can peek at other conversations, all my caution is for naught.
I encourage everyone to go read the saved memories in their LLM of choice, I’ve cleaned out complete crap from there multiple times. Actually wrong information, confusing information, or one-off things I don’t want influencing future discussions.
The custom (or rather addition to the) system prompt is all I feel comfortable with. Where I give it some basic info about the coding language I prefer and the OSes that I’m often working with so that I don’t have to constantly say “actually this is FreeBSD” or “please give that to me in JS/TS instead of Python”.
The only thing that has, so far, kept me from turning off memory is that I’m always slightly cautious of going off the beaten path for something so new and moving so fast. I often want to have as close to the “stock” config since I know how testing/QA works at most places (the further off the beaten path you, the more likely you’ll run into bugs). Also so that I can experience when everyone else is experiencing (within reason).
Lastly, because, especially with LLMs, I feel like the people that over customize end up with a fragile systems. I think that a decent portion of the “N+1 model is dumber” or “X model has really gone downhill” is partially due to complicated configs (system prompts, MCP, etc) that might have helped at some point (dumber model, less capability) but are a hindrance to newer models. That or they never worked and someone just kept piling on more and more thinking it would help.
At the research step it frequently (always?) uses memory to direct/scope the research to what I typically work on, but I think that kind of pigeon holes the model and what it explores. And the memory doesn't quite capture all the areas I'm interested in, or want to directly apply the research to.
And regarding the crap in memories, I found the same. Mine at work mentioned I'm an expert at a business domain I have almost zero experience with.
I feel like the companies building this stuff accept a lot of "slop" in their approach, and just can't see past building things by slopping stuff into prompts. I wish they'd explore more rigid approaches. Yes, I understand "the bitter lesson" but it seems obvious to me some traditional approaches would yield better results for the foreseeable future. Less magic (which is just running things through the cheapest model they have and dumping it in every chat). It seems like poison.
Related: https://vercel.com/blog/agents-md-outperforms-skills-in-our-...
Also, agent skills are usually pure slop. If you look through https://skills.sh on a framework/topic you're knowledgeable in you'll be a bit disheartened. This stuff was pioneered by people who move fast, but I think it's now time to try and push for quality and care in the approach since these have gotten good enough to contribute to more than prototype work.
The problem (for me, anyway) is that even several megabytes worth of quality "memory" data on my profile would not allow me to migrate if it can't also confidently clone all of my chat history with it.
To be clear, this is a big enough problem that I would immediately pay low three digits dollars to have this solved on my behalf. I don't really want any of the providers to have a walled garden of all my design planning conversations, all of my PCB design conversations. Many are hundreds of prompts long. A clean break is not even remotely palatable short of OAI going full evil.
Look, I'd find it convenient for Claude to have a powerful sense of what I've been working on from conversation #1 onwards. But I absolutely refuse to bifurcate my chat history across multiple services. There is a tier list of hells, and being stuck on ChatGPT is a substantially less painful tier than needing to constantly search two different sites for what's been discussed.
Edit: perhaps you can just ask nicely?
https://help.openai.com/en/articles/7260999-how-do-i-export-...
Yes, all of these are theoretically possible (the APIs now all support web search, as far as I know, there are RAG APIs too, and tool use has been supported for a while), but the various "chat" models just seem to be much better at using their first-party tools than any third-party harness, which makes sense that this is what they've been trained on.