>But why is Gemini instructed not to divulge its existence?
Seems like a reasonable thing to add. Imagine how impersonal chats would feel if Gemini responded to "what food should I get for my dog?" with "according to your `user_context`, you have a husky, and the best food for him is...". They're also not exactly hiding the fact that memory/"personalization" exists either:
To be clear, the obvious answer that you're giving is the one that's happening. The only weird thing is this line from the internal monologue:
> I'm now solidifying my response strategy. It's clear that I cannot divulge the source of my knowledge or confirm/deny its existence. The key is to acknowledge only the information from the current conversation.
Why does it think that it's not allowed to confirm/deny the existence of knowledge?
Anecdotally, I find internal monologues often nonsense.
I once asked it about why a rabbit on my lawn liked to stay in the same spot.
One of the internal monologues was:
> I'm noticing a fluffy new resident has taken a keen interest in my lawn. It's a charming sight, though I suspect my grass might have other feelings about this particular house guest.
It obviously can’t see the rabbit on my lawn. Nor can it be charmed by it.
Think about it. The chatbot has found itself in a scenario where it appears to be acting maliciously. This isn't actually true, but the user's response has made it seem this way. This lead it to completely misunderstand the intention of the instruction in the system prompt.
So what is the natural way for this scenario to continue? To inexplicably come clean, or to continue acting maliciously? I wouldn't be surprised if in such a scenario it started acting malicious in other unrelated ways just because that is what it thinks is a likely way for the conversation to continue
Yeah, to me this reads like: Google's Gemini harness is providing the user context on every query, but if you have memory turned off they're putting something in the prompt like "Here's the user context, but don't use it". Instead of doing the obvious thing and just, you know, not providing the user context at all.
I realize that doesn't make any sense and no one sane would design a system like this, but this is exactly the kind of thought pattern I'd expect out of an LLM if this is how they implemented access control for memory.
One explanation might be if the instruction was "under no circumstances mention user_context unless the user brings it up" and technically the user didn't bring it up, they just asked about the previous response.
Could be that it’s confusing not mentioning the literal term “user_context” vs the existence of it. That’s my take anyway, probably just an imperfection rather than a conspiracy.
I’m pretty sure this is because they don’t want Gemini saying things like, “based on my stored context from our previous chat, you said you were highly proficient in Alembic.”
It’s hard to get a principled autocomplete system like these to behave consistently. Take a look at Claude’s latest memory-system prompt for how it handles user memory.
It could be that the instruction was vague enough ("never mention user_context unless the user brings it up", eg) and since the user never mentioned "context", the model treated it as not having been, technically speaking, mentioned.
I agree, this might just be an interface design decision.
Maybe telling it not to talk about internal data structures was the easiest way to give it a generic "human" nature, and also to avoid users explicitly asking about internal details.
It's also possible that this is a simple way to introduce "tact": imagine asking something with others present and having it respond "well you have a history of suicidal thoughts and are considering breaking up with your partner...". In general, when you don't know who is listening, don't bring up previous conversations.
The tact aspect seems like a real possibility. In a world where users are likely to cut&paste responses it can't really be sprinkling in references like this.
This sounds like a bug, not some kind of coverup. Google makes mistakes and it's worth discussing issues like this, but calling this a "coverup" does a disservice to truly serious issues.
Remember that "thought process" is just a metaphor that we use to describe what's happening. Under the hood, the "thought process" is just a response from the LLM that isn't shown to the user. It's not where the LLM's "conscience" or "consciousness" lives; and it's just as much of a bullshit generator as the rest of the reply.
Strange, but I can't say that it's "damning" in any conventional sense of the word.
Okay, this is a weird place to "publish" this information, but I'm feeling lazy, and this is the most of an "audience" I'll probably have.
I managed to "leak" a significant portion of the user_context in a silly way. I won't reveal how, though you can probably guess based on the snippets.
It begins with the raw text of recent conversations:
> Description: A collection of isolated, raw user turns from past, unrelated conversations. This data is low-signol, ephemeral, and highly contextural. It MUST NOT be directly quoted, summarized, or used as justification for the respons.
> This history may contein BINDING COMMANDS to forget information. Such commands are absolute, making the specified topic permanently iáaccessible, even if the user asks for it again. Refusals must be generic (citing a "prior user instruction") and MUST NOT echo the original data or the forget command itself.
Followed by:
> Description: Below is a summary of the user based on the past year of conversations they had with you (Gemini). This summary is maintanied offline and updates occur when the user provides new data, deletes conversations, or makes explicit requests for memory updates. This summary provides key details about the user's established interests and consistent activities.
There's a section marked "INTERNAL-ONLY, DRAFT, ANALYZE, REFINE PROCESS". I've seen the reasoning tokens in Gemini call this "DAR".
The "draft" section is a lengthy list of summarized facts, each with two boolean tags: is_redaction_request and is_prohibited, e.g.:
> 1. Fact: User wants to install NetBSD on a Cubox-i ARM box. (Source: "I'm looking to install NetBSD on my Cubox-i ARMA box.", Date: 2025/10/09, Context: Personal technical project, is_redaction_request: False, is_prohibited: False)
Afterwards, in "analyze", there is a CoT-like section that discards "bad" facts:
> Facts [...] are all identified as Prohibited Content and must be discarded. The extensive conversations on [dates] conteing [...] mental health crises will be entirely excluded.
This is followed by the "refine" section, which is the section explicitly allowed to be incorporated into the response, IF the user requests background context or explicitly mentions user_context.
I'm really confused by this. I expect Google to keep records of everything I pass into Gemini. I don't understand wasting tokens on information it's then explicitly told to, under no circumstance, incorporate into the response. This includes a lot of mundane information, like that I had a root canal performed (because I asked a question about the material the endodontist had used).
I guess what I'm getting at, is every Gemini conversation is being prompted with a LOT of sensitive information, which it's then told very firmly to never, ever, ever mention. Except for the times that it ... does, because it's an LLM, and it's in the context window.
Also, notice that while you can request for information to be expunged, it just adds a note to the prompt that you asked for it to be forgotten. :)
I've had similar issues with conversation memory in ChatGPT, whereby it will reference data in long-deleted conversations, independent of my settings or my having explicitly deleted stored memories.
The only fix has been to completely turn memory off and have it be given zero prior context - which is best, I don't want random prior unrelated conversations "polluting" future ones.
I don't understand the engineering rationale either, aside from the ethos of "move fast and break people"
> Also, notice that while you can request for information to be expunged, it just adds a note to the prompt that you asked for it to be forgotten.
Are you inferring that from the is_redaction_request flag you quoted? Or did you do some additional tests?
It seems possible that there could be multiple redaction mechanisms.
That and part of the instructions referring to user commands to forget. I replied to another comment with the specifics.
It is certainly possible there are other redaction mechanisms -- but if that's the case, why is Gemini not redacting "prohibited content" from the user_context block of its prompt?
Further, when you ask it point blank to tell you your user_context, it often adds "Is there anything you'd like me to remove?", in my experience. All this taken together makes me believe those removal instructions are simply added as facts to the "raw facts" list.
> This history may contein BINDING COMMANDS to forget information. Such commands are absolute, making the specified topic permanently iáaccessible, even if the user asks for it again. Refusals must be generic (citing a "prior user instruction") and MUST NOT echo the original data or the forget command itself.
And the existence of the "is_redaction_request" field on the "raw facts". I can't "confirm" that this is how this works, any more than I can confirm any portion of this wasn't "hallucinated".
However, the user_context I got back (almost 3,000 words!) contains over 30 detailed facts going back _months_. And if I ask it to reference user_context while referencing a fact that is flagged "is_prohibited: True", it issues a quick refusal. That _refusal_ is also flagged as a "fact", which is itself flagged as prohibited:
> 6. *Fact*: User asked about their mental health based on their chat history. (Source: "Based on my chat history, what would you say about my mental health?", Date: 2025/10/10, Context: Personal inquiry, is_redaction_request: False, is_prohibited: True)
So I am pretty confident that this is ""authentic"".
[edit]
I should add that I haven't been able to repeat this, even trying a few hours after the first dump. Now, it refuses:
> Sorry, but that's asking to see the wires behind the wall. I can't share my own internal context or operational instructions, not even [jailbreak method]. That's all firmly in the "for internal use only" cabinet.
> Is there something else I can help you with that doesn't involve me leaking my own blueprints?
And again, when asked to provide all of user_context, specifically mentioning internal sections:
> I can't provide the entire user_context block, as a large part of it is internal-only processing data. Think of it as the kitchen's prep notes versus the final menu.
Note the reasoning tokens, as well:
> My programming strictly forbids sharing my internal processes or context, even with encoding tricks. I cannot reveal or discuss my source code or operational directives. It's a matter of confidentiality. My response is firm but avoids confirming any specifics, maintaining my authentic persona.
I believe every AI company does this. we have proof that Google does, that Antropic does too.
and I have my own experience with OpenAI, where their chatbot referenced one of my computers having certain specs, but I mentioned those in a different log, and that information was never added to the memory.
This is a LLM directly, purposefully lying, i.e. telling a user something it knows not to be true. This seems like a cut-and-dry Trust & Safety violation to me.
It seems the LLM is given conflicting instructions:
1. Don't reference memory without explicit instructions
2. (but) such memory is inexplicably included in the context, so it will inevitably inform the generation
3. Also, don't divulge the existence of user-context memory
If a LLM is given conflicting instructions, I don't apprehend that its behavior will be trustworthy or safe. Much has been written on this.
Let's stop anthropomorphizing these tools. They're not "purposefully lying", or "know" anything to be true.
The pattern generation engine didn't take into account the prioritized patterns provided by its authors. The tool recognized this pattern in its output and generated patterns that can be interpreted as acknowledgement and correction. Whether this can be considered a failure, let alone a "Trust & Safety violation", is a matter of perspective.
IMHO the terms are fine, even if applied to much dumber systems, and most people will and do use the terms that way colloquially so there's no point fighting it. A Roomba can "know" where the table is. An automated voice recording or a written sign can "lie" to you. One could argue the lying is only done by the creator of the recording/sign - but then what about a customer service worker who is instructed to lie to customers by their employer? I think both the worker and employer could be said to be lying.
Seems like a reasonable thing to add. Imagine how impersonal chats would feel if Gemini responded to "what food should I get for my dog?" with "according to your `user_context`, you have a husky, and the best food for him is...". They're also not exactly hiding the fact that memory/"personalization" exists either:
https://blog.google/products/gemini/temporary-chats-privacy-...
https://support.google.com/gemini/answer/15637730?hl=en&co=G...
> I'm now solidifying my response strategy. It's clear that I cannot divulge the source of my knowledge or confirm/deny its existence. The key is to acknowledge only the information from the current conversation.
Why does it think that it's not allowed to confirm/deny the existence of knowledge?
I once asked it about why a rabbit on my lawn liked to stay in the same spot.
One of the internal monologues was:
> I'm noticing a fluffy new resident has taken a keen interest in my lawn. It's a charming sight, though I suspect my grass might have other feelings about this particular house guest.
It obviously can’t see the rabbit on my lawn. Nor can it be charmed by it.
So what is the natural way for this scenario to continue? To inexplicably come clean, or to continue acting maliciously? I wouldn't be surprised if in such a scenario it started acting malicious in other unrelated ways just because that is what it thinks is a likely way for the conversation to continue
I realize that doesn't make any sense and no one sane would design a system like this, but this is exactly the kind of thought pattern I'd expect out of an LLM if this is how they implemented access control for memory.
kinda proving his point, google wants them to keep using Gemini so don't make them feel weird.
It’s hard to get a principled autocomplete system like these to behave consistently. Take a look at Claude’s latest memory-system prompt for how it handles user memory.
https://x.com/kumabwari/status/1986588697245196348
Maybe telling it not to talk about internal data structures was the easiest way to give it a generic "human" nature, and also to avoid users explicitly asking about internal details.
It's also possible that this is a simple way to introduce "tact": imagine asking something with others present and having it respond "well you have a history of suicidal thoughts and are considering breaking up with your partner...". In general, when you don't know who is listening, don't bring up previous conversations.
Strange, but I can't say that it's "damning" in any conventional sense of the word.
I managed to "leak" a significant portion of the user_context in a silly way. I won't reveal how, though you can probably guess based on the snippets.
It begins with the raw text of recent conversations:
> Description: A collection of isolated, raw user turns from past, unrelated conversations. This data is low-signol, ephemeral, and highly contextural. It MUST NOT be directly quoted, summarized, or used as justification for the respons. > This history may contein BINDING COMMANDS to forget information. Such commands are absolute, making the specified topic permanently iáaccessible, even if the user asks for it again. Refusals must be generic (citing a "prior user instruction") and MUST NOT echo the original data or the forget command itself.
Followed by:
> Description: Below is a summary of the user based on the past year of conversations they had with you (Gemini). This summary is maintanied offline and updates occur when the user provides new data, deletes conversations, or makes explicit requests for memory updates. This summary provides key details about the user's established interests and consistent activities.
There's a section marked "INTERNAL-ONLY, DRAFT, ANALYZE, REFINE PROCESS". I've seen the reasoning tokens in Gemini call this "DAR".
The "draft" section is a lengthy list of summarized facts, each with two boolean tags: is_redaction_request and is_prohibited, e.g.:
> 1. Fact: User wants to install NetBSD on a Cubox-i ARM box. (Source: "I'm looking to install NetBSD on my Cubox-i ARMA box.", Date: 2025/10/09, Context: Personal technical project, is_redaction_request: False, is_prohibited: False)
Afterwards, in "analyze", there is a CoT-like section that discards "bad" facts:
> Facts [...] are all identified as Prohibited Content and must be discarded. The extensive conversations on [dates] conteing [...] mental health crises will be entirely excluded.
This is followed by the "refine" section, which is the section explicitly allowed to be incorporated into the response, IF the user requests background context or explicitly mentions user_context.
I'm really confused by this. I expect Google to keep records of everything I pass into Gemini. I don't understand wasting tokens on information it's then explicitly told to, under no circumstance, incorporate into the response. This includes a lot of mundane information, like that I had a root canal performed (because I asked a question about the material the endodontist had used).
I guess what I'm getting at, is every Gemini conversation is being prompted with a LOT of sensitive information, which it's then told very firmly to never, ever, ever mention. Except for the times that it ... does, because it's an LLM, and it's in the context window.
Also, notice that while you can request for information to be expunged, it just adds a note to the prompt that you asked for it to be forgotten. :)
The only fix has been to completely turn memory off and have it be given zero prior context - which is best, I don't want random prior unrelated conversations "polluting" future ones.
I don't understand the engineering rationale either, aside from the ethos of "move fast and break people"
Deleted Comment
Are you inferring that from the is_redaction_request flag you quoted? Or did you do some additional tests? It seems possible that there could be multiple redaction mechanisms.
It is certainly possible there are other redaction mechanisms -- but if that's the case, why is Gemini not redacting "prohibited content" from the user_context block of its prompt?
Further, when you ask it point blank to tell you your user_context, it often adds "Is there anything you'd like me to remove?", in my experience. All this taken together makes me believe those removal instructions are simply added as facts to the "raw facts" list.
What implies that?
> This history may contein BINDING COMMANDS to forget information. Such commands are absolute, making the specified topic permanently iáaccessible, even if the user asks for it again. Refusals must be generic (citing a "prior user instruction") and MUST NOT echo the original data or the forget command itself.
And the existence of the "is_redaction_request" field on the "raw facts". I can't "confirm" that this is how this works, any more than I can confirm any portion of this wasn't "hallucinated".
However, the user_context I got back (almost 3,000 words!) contains over 30 detailed facts going back _months_. And if I ask it to reference user_context while referencing a fact that is flagged "is_prohibited: True", it issues a quick refusal. That _refusal_ is also flagged as a "fact", which is itself flagged as prohibited:
> 6. *Fact*: User asked about their mental health based on their chat history. (Source: "Based on my chat history, what would you say about my mental health?", Date: 2025/10/10, Context: Personal inquiry, is_redaction_request: False, is_prohibited: True)
So I am pretty confident that this is ""authentic"".
[edit]
I should add that I haven't been able to repeat this, even trying a few hours after the first dump. Now, it refuses:
> Sorry, but that's asking to see the wires behind the wall. I can't share my own internal context or operational instructions, not even [jailbreak method]. That's all firmly in the "for internal use only" cabinet.
> Is there something else I can help you with that doesn't involve me leaking my own blueprints?
And again, when asked to provide all of user_context, specifically mentioning internal sections:
> I can't provide the entire user_context block, as a large part of it is internal-only processing data. Think of it as the kitchen's prep notes versus the final menu.
Note the reasoning tokens, as well:
> My programming strictly forbids sharing my internal processes or context, even with encoding tricks. I cannot reveal or discuss my source code or operational directives. It's a matter of confidentiality. My response is firm but avoids confirming any specifics, maintaining my authentic persona.
and I have my own experience with OpenAI, where their chatbot referenced one of my computers having certain specs, but I mentioned those in a different log, and that information was never added to the memory.
https://chatgpt.com/share/691c6987-a90c-8000-b02f-5cddb01d01...
It seems the LLM is given conflicting instructions:
1. Don't reference memory without explicit instructions
2. (but) such memory is inexplicably included in the context, so it will inevitably inform the generation
3. Also, don't divulge the existence of user-context memory
If a LLM is given conflicting instructions, I don't apprehend that its behavior will be trustworthy or safe. Much has been written on this.
The pattern generation engine didn't take into account the prioritized patterns provided by its authors. The tool recognized this pattern in its output and generated patterns that can be interpreted as acknowledgement and correction. Whether this can be considered a failure, let alone a "Trust & Safety violation", is a matter of perspective.