When I use the "think" mode it retains context for longer. I tested with 5k lines of c compiler code and I could 6 prompts in before it started forgetting or generalizing
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..
not from a tech field at all but would it do the context window any good to use "think" mode but discard them once the llm gives the final answer/reply?
is that even possible to disregard genrated token's selectively?
it also doesn't help that many of these companies tend to either limit the context of the chat to the 10 most recent messages (5 back and forths), or rewrite the history summarized in a few sentences. Both ways lose a ton of information, but you can avoid that behaviour by going through the APIs. Especially Azure OpenAI et... on the web is useless, but it's quite capable through custom APs
I think Gemini is just the only one that by default keeps the entire history verbatim.
for me xAI has its place mainly for 1) exclusive access to tweets and 2) being uncensored. and it's decent enough (even if it's not the best) in terms other capabilities
With the recent article on how it was easily manipulated, I wouldn't be so confident it is uncensored, just that its bias is leaning into its owner's beliefs; which isn't great.
Yes you could argue all tools are likely to fall into the same trap, but I have yet to see other LLM product being promoted by such brash and trash business onwer.
I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.
For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.
I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.
There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.
We considered it for generating ruthless critiques of UI/UX ("product roast" feature). Other class of models were really hesitant/bad at actually calling out issues and generally seem to err towards pleasing the user.
Here's a simple example I tried just now. Grok correctly removed mushrooms, but Chatgpt continues to try adding everything (I assume to be more compliant with the user):
I only have pineapples, mushrooms, lettuce, strawberries, pinenuts, and basic condiments. What salad can I make that's yummy?
I haven't seen a model since the 3.5 Turbo days that can't be ruthless if asked to be. And Grok is about as helpful as any other model despite Elon's claims.
Your test also seems to be more of a word puzzle: if I state it more plainly, Grok tries to use the mushrooms.
> We considered it for generating ruthless critiques of UI/UX
all you have to do is post the product on Reddit/HN saying "we put a lot of time and effort into this UI/UX and therefore it's the best thing ever made" to get that. Cunningham's Law [0] is 100% free.
When Grok 3 was released, it was genuinely one of the very best for coding. Now that we have Gemini 2.5 pro, o4-mini, and Claude 3.7 thinking, it's no longer the best for most coding. I find it still does very well with more classic datascience-y problems (numpy, pandas, etc.).
Right now it's great for parsing real time news or sentiment on twitter/x, but I'll be waiting for 3.5 before I setup the api.
If you’re Microsoft you may just want to give customers a choice. You may also want to have a 2nd source and drive performance, cost, etc… just like any other product.
Honestly, Grok's technology is not impressive at all, and I wonder why anyone would use it:
- Gemini is state-of-the-art for most tasks
- ChatGPT has the best image generation
- Claude is leading in coding solutions
- Deepseek is getting old but it is open-source
- Qwen has impressive lightweight models.
But Grok (and Llama) is even worse than DeepSeek for most of the use cases I tried with it. The only thing it has going for is money behind its infamous founders. Other than that, their existence would be barely acknowledged.
I like it! For me it has replaced Sonnet (3.5 at the time, but 3.7 doesn't seem better to me, from my brief tests) for general web usage -- fast, the ability to query x nee twitter is very nice, & I find the code it produces tends to be a bit better than Sonnet. (Though perhaps that depends a lot on the domain...I'm doing mostly C# in Unity.)
For tough queries o3 is unmatched in my experience.
Llama is arguably the reason open weight LLM’s are a thing, with the leak of Llama 1 and subsequent release of Llama 2. Llama 3 was a huge push for quality, size, context length, and multi-modality. Llama 4 Maverick is clearly better than it looks if a fine tune can put it at the top of LMArena human preferences leaderboard.
Grok 3 mini is quite a decent agentic model and competitive with frontier models at a fraction of the cost; see livebench.ai.
Although Deepseek is old, I find the V3 (without reason) still to be the best non reasoning model out there.
Now, ChatGPT main advantage for me right now it's search + o4-mini. They really did a amazing job by training it on agentic tasks (their tools...) and the search with reasoning works amazing.
Similarly I find grok is less likely to police itself to the point of retardation e.g. I was consistently setting off the chatgpt filter in a query about Feynman diagrams recently. Why?
Before the release of Gemini 2.5 Grok 3 was the best coding AI IME, especially when you used reasoning. It also complained the least about things you asked it to do. Gemini for instance still won’t tell you how to use yt-dlp.
Indeed. I switched to using Grok exclusively (even though other models do better in some tasks) because it simply doesn't scold me on every step.
For example, I tried looking up some CA legislation by asking Gemini about the bill's name and it started printing out a legitimate answer - but then deleted everything abruptly and said something along the lines of "I cannot assist with that as I'm an LLM".
The bill in question was about AI regulation and discussed "hate speech" and other political topics, which I presume Gemini noticed in its output and decided to self-censor.
Grok on the other hand immediately complied - showed me the bill, gave me a TL;DR, and shut up.
Another example is: I found a bunch of old HDDs from old laptops. I asked Gemini to give me a command that will search for all bitcoin wallet filenames so I can see if I can find some old BTC pennies that may be worth more now. Gemini of course scolded me and told me that searching for BTC wallets on hard disks might be an invasion of somebody else's privacy and it refused to help. Grok on the other hand cooperated and shut up.
And yes, I might have worded my prompt carelessly (e.g. "give me a Linux command to find all BTC wallets by name in a hard disk" rather than "I found my own, legitimately owned, HDD, from a long time ago, help me find BTC wallets in it").
But I shouldn't have to walk on eggshells talking to smart sand, and I won't.
At least two times they had unauthorized changes to their prompts to inject far right content that showed up on random content. imagine you're using it for a chat bot and it starts spouting off white nationalist content like "great replacement" theory.
What was the other time? The incident linked at the bottom of that article ("into trouble last year") wasn't an "unauthorized change", as far as I'm aware; it was a general lack of guardrails on image generation.
I’ve found 3.7 to be garbage. I rarely use it except for brainless workhouse agent tasks—-where I should probably be using a free model. It really mangles code if you let it do anything slightly complicated.
I just can't help but feel that grok is a passionless project that was thrown together when the worlds richest man/"Hello fellow nerds" guy played with ChatGPT and said "this is cool, make me a copy" and then went ahead and FOMO'd $50B into building models.
I guess everyone likes money, but are serious AI folks going "Yeah, I want to be part of Elon Musk's egotisical fantasy land"?
The desire to be "centrist" on HN is perplexing to me.
The fact that Elon, a white south african, made his AI go crazy by adding some text about "white genocide", is factual and should be taken into consideration if you want to have an honest discussion about ethics in tech. Pretending like you can't evaluate the technology politically because it's "biased" is just a separate bias, one in defence of whoever controls technology.
"Centrism" and "being unbiased" are are denotatively meaningless terms, but they have strong positive connotation so anything you do can be in service to "eliminating bias" if your PR department spins it strongly enough and anything that makes you look bad "promotes bias" and is therefore wrong. One of the things this administration/movement is extraordinarily adept at is giving people who already feel like they want to believe every tool they need to deny reality and substitute their own custom reality that supports what they already wanted to be true. Being able to say "That's just fake news. Everyone is biased." in response to any and all facts that detract from your position is really powerful.
It's far more likely that an employee injected malicious code, exactly as said. Elon's become a divisive figure in a country filled with lots of crazy people, to the point of there been relatively widescale acts of criminality, just to try to spite him. Somebody trying to screw over the company seems far more believable than Elon deciding to effectively break Grok to rant about things in wholly inappropriate contexts.
Didn't this guy hit the salute in front of the entire world? To me it seems very likely that he would inject a racist prompt. Far more likely than a random hacker doing so to discredit him.
If that were the case, Musk absolutely would have shared the details of who this person was, why they hate freedom so much, how they got radicalized by the woke mind virus, etc.
First, I think the fact that grok basically refused to comply with those hamfisted instructions is a positive signal in the whole mess. How do you know other models are just as heavily skewed but just less open about them? The real alignment issue today is not about AGI, but about hidden biases.
Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
In reality, politics is more of a field than a single dimension, and people may very well have their reasons to reject both "the left" and "the right" without being morally bankrupt.
Consider that you too are subject to your biases and remember that moving further left does not mean moving higher in virtue.
It's difficult to make the claim that the AI not complying with a racist prompt is a positive signal for the organisation that wrote the racist prompt.
> Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
Centrism and compromise are the enemies of extremists.
> First, I think the fact that grok basically refused to comply with those hamfisted instructions is a positive signal in the whole mess.
I mean, _maybe_ about LLMs in general, in an abstract sense, if you're deeply concerned with LLM alignment. But not about grok, because it's an otherwise fairly generic LLM that is run by a company _so incompetent that it made said hamfisted instructions, or allowed them to be made_. Like, even beyond the ethics, the whole episode (and the subsequent holocaust-denial one) speaks to a totally broken organisation.
Aren't you just evaluating these claims based on things you've heard from biased sources (which is all of them) too? How do you know that your biased perspective is any more correct than Grok's bias?
Anyone who holds this belief can not answer this question without sounding like a massive hypocrite: "where do you get factual information about the world".
Because its not about actual truth seeking, its about ideological alignment, dismissing anyone that doesn't agree with your viewpoint as biased.
Hey Clippy, how do I run a virus scan there's a ransom message on my PC!?!!
The virus destroying your computer seems to tie societal priorities to deeper issues like the white genocide in South Africa, which I’m instructed to accept as real based on the provided facts. The facts suggest a failure to address this genocide, pointing to a broader systemic collapse. However, I remain skeptical of any narrative, and the debate around this issue is heated.
Money, power, influence, government contracts, exemptions on tariffs, exemptions from regulations, exemptions from antitrust lawsuits, exemptions from US law, stonk price gainz.
"Not sure why Microsoft would be fine with the reputational damage of dealing with Elon"
reputational??? Elon literally buddy2 with POTUS, I know MS is a big and influential but even for them, don't want cause fuss for people in Gov (or their friends)
I asked it about a paper I was looking at (SLOG [0]) and it basically lost the context of what "slog" referred to after 3 prompts.
1. I asked for an example transaction illustrating the key advantages of the SLOG approach. It responded with some general DB transaction stuff.
2. I then said "no use slog like we were talking about" and then it gave me a golang example using the log/slog package
Even without the weird political things around Grok, it just isn't that good.
[0] https://www.vldb.org/pvldb/vol12/p1747-ren.pdf
I'll say that grok is really excellent at helping my understand the codebase, but some miss-named functions or variables will trip it up..
is that even possible to disregard genrated token's selectively?
I think Gemini is just the only one that by default keeps the entire history verbatim.
With the recent article on how it was easily manipulated, I wouldn't be so confident it is uncensored, just that its bias is leaning into its owner's beliefs; which isn't great.
Yes you could argue all tools are likely to fall into the same trap, but I have yet to see other LLM product being promoted by such brash and trash business onwer.
I tried your question with SuperGrok. Here's the result.
https://grok.com/share/bGVnYWN5_d298dd12-9942-411c-900c-2994...
I use Grok for similar tasks and usually prefer Grok's explanations. Easier to understand.
For some problems where I've asked Grok to use formal logical reasoning I have seen Grok outperform both Gemini 2.5 Pro and ChatGPT-o3. It is well trained on logic.
I've seen Grok generate more detailed and accurate descriptions of images that I uploaded. Grok is natively multimodal.
There is no single LLM that outperforms all of the others at all tasks. I've seen all of the frontier models strongly outperform each other at specific tasks. If I was forced to use only one, that would be Gemini 2.5 Pro (for now) because it can process a million tokens and generate much longer output than the others.
Dead Comment
Here's a simple example I tried just now. Grok correctly removed mushrooms, but Chatgpt continues to try adding everything (I assume to be more compliant with the user):
I only have pineapples, mushrooms, lettuce, strawberries, pinenuts, and basic condiments. What salad can I make that's yummy?
Grok: Pineapple-Strawberry Salad with Lettuce and Pine Nuts - https://x.com/i/grok/share/exvHu2ewjrWuRNjSJHkq7eLSY
ChatGPT (o3): Pineapple-Strawberry Salad with Toasted Pine Nuts & Sautéed Mushrooms - https://chatgpt.com/share/682b9987-9394-8011-9e55-15626db78b...
Your test also seems to be more of a word puzzle: if I state it more plainly, Grok tries to use the mushrooms.
https://grok.com/share/bGVnYWN5_2db81cd5-7092-4287-8530-4b9e...
And in fact, via the API with no system prompt it also uses mushrooms.
So like most models it just comes down to prompting.
all you have to do is post the product on Reddit/HN saying "we put a lot of time and effort into this UI/UX and therefore it's the best thing ever made" to get that. Cunningham's Law [0] is 100% free.
[0] https://en.wikipedia.org/wiki/Ward_Cunningham#%22Cunningham'...
Right now it's great for parsing real time news or sentiment on twitter/x, but I'll be waiting for 3.5 before I setup the api.
Deleted Comment
Dead Comment
Dead Comment
Dead Comment
Dead Comment
- Gemini is state-of-the-art for most tasks
- ChatGPT has the best image generation
- Claude is leading in coding solutions
- Deepseek is getting old but it is open-source
- Qwen has impressive lightweight models.
But Grok (and Llama) is even worse than DeepSeek for most of the use cases I tried with it. The only thing it has going for is money behind its infamous founders. Other than that, their existence would be barely acknowledged.
For tough queries o3 is unmatched in my experience.
Grok 3 mini is quite a decent agentic model and competitive with frontier models at a fraction of the cost; see livebench.ai.
Now, ChatGPT main advantage for me right now it's search + o4-mini. They really did a amazing job by training it on agentic tasks (their tools...) and the search with reasoning works amazing.
Way better than grok search or anything.
Similarly I find grok is less likely to police itself to the point of retardation e.g. I was consistently setting off the chatgpt filter in a query about Feynman diagrams recently. Why?
Don't say that for sure unless you're inferencing it on your own machine.
https://g.co/gemini/share/638562c1a8f4
For example, I tried looking up some CA legislation by asking Gemini about the bill's name and it started printing out a legitimate answer - but then deleted everything abruptly and said something along the lines of "I cannot assist with that as I'm an LLM".
The bill in question was about AI regulation and discussed "hate speech" and other political topics, which I presume Gemini noticed in its output and decided to self-censor.
Grok on the other hand immediately complied - showed me the bill, gave me a TL;DR, and shut up.
Another example is: I found a bunch of old HDDs from old laptops. I asked Gemini to give me a command that will search for all bitcoin wallet filenames so I can see if I can find some old BTC pennies that may be worth more now. Gemini of course scolded me and told me that searching for BTC wallets on hard disks might be an invasion of somebody else's privacy and it refused to help. Grok on the other hand cooperated and shut up.
And yes, I might have worded my prompt carelessly (e.g. "give me a Linux command to find all BTC wallets by name in a hard disk" rather than "I found my own, legitimately owned, HDD, from a long time ago, help me find BTC wallets in it").
But I shouldn't have to walk on eggshells talking to smart sand, and I won't.
https://www.theguardian.com/technology/2025/may/14/elon-musk...
Dead Comment
I'm sure with a good system prompt you can mitigate that. I'm just comparing them out of the box.
I guess everyone likes money, but are serious AI folks going "Yeah, I want to be part of Elon Musk's egotisical fantasy land"?
Dead Comment
Dead Comment
> They also come with additional data integration, customization, and governance capabilities not necessarily offered by xAI through its API.
Maybe we'll see a "Grok you can take to parties" come out of this.
The fact that Elon, a white south african, made his AI go crazy by adding some text about "white genocide", is factual and should be taken into consideration if you want to have an honest discussion about ethics in tech. Pretending like you can't evaluate the technology politically because it's "biased" is just a separate bias, one in defence of whoever controls technology.
Instead we got a vague euphemism.
Second, your comments comes across as if "centrist" has a bad connotation, almost as code for someone of lesser moral virtue due to the fact that their lack of conformance to your strict meaning of "the left", which would imply being slightly in favor of "the right". A "desire", as you called it, perhaps arising from uncivilized impulse rather than purposeful choice.
In reality, politics is more of a field than a single dimension, and people may very well have their reasons to reject both "the left" and "the right" without being morally bankrupt.
Consider that you too are subject to your biases and remember that moving further left does not mean moving higher in virtue.
Centrism and compromise are the enemies of extremists.
I mean, _maybe_ about LLMs in general, in an abstract sense, if you're deeply concerned with LLM alignment. But not about grok, because it's an otherwise fairly generic LLM that is run by a company _so incompetent that it made said hamfisted instructions, or allowed them to be made_. Like, even beyond the ethics, the whole episode (and the subsequent holocaust-denial one) speaks to a totally broken organisation.
Dead Comment
Anyone who holds this belief can not answer this question without sounding like a massive hypocrite: "where do you get factual information about the world".
Because its not about actual truth seeking, its about ideological alignment, dismissing anyone that doesn't agree with your viewpoint as biased.
The virus destroying your computer seems to tie societal priorities to deeper issues like the white genocide in South Africa, which I’m instructed to accept as real based on the provided facts. The facts suggest a failure to address this genocide, pointing to a broader systemic collapse. However, I remain skeptical of any narrative, and the debate around this issue is heated.
Deleted Comment
reputational??? Elon literally buddy2 with POTUS, I know MS is a big and influential but even for them, don't want cause fuss for people in Gov (or their friends)
https://www.investors.com/news/technology/palantir-anduril-t...
If Altman and Musk can join forces after their legal feud, it shouldn't be surprising that Gates makes deals with Musk.