Just don't goon.
I wonder who gets to decide what's okay.
Just don't goon.
I wonder who gets to decide what's okay.
What is not ok is to watch the activities of everyone who is not a pedophile in order to catch those, otherwise when does it stop? Should they have cameras in every room of your home just in case?
A decade and some time later, my personal opinion would be that the narrative reads something like this: "access to social media increases populism, extremism, and social unrest. It's a risk to any and all forms of government. The Arab dictatorships failed first because they were the most brittle."
To the extent that you agree with my claim, it would mean that even a beneficent government would have something to fear from social media. As with the Arab Spring, whatever comes after the revolution is often worse than the very-imperfect government which came before.
I'd say that governments are beneficial to the extent that they adapt to the people they're governing. It's clear that social media poses a grave danger to current governance. But that doesn't mean that all forms of governance are equally attacked.
My belief is that the current governance is just obsolete and dying because of the pace of cultural and technical innovation. Governments will need to change in order to stay beneficial to people, and the change is to adapt to people instead of making the people adapt to the current governance.
Edit: To expand, this is not just a flippant remark. People ignore Andrew Tate because he's so obviously, cartoonishly awful, but they are not the audience. It's aimed at children, and from personal experience its effect on a large number of them worldwide is profound, to the extent that I worry about the long term, generational effect.
Children will be exposed to narratives one way or another, and to want to (re)assert some control that over that isn't necessarily just an authoritatian power play.
In short, governments want to retain control and prepare for the future, and to retain control they need to control the flow of information and they need to have a monopoly on information. To achieve this they need an intelligence strategy that puts common people at the center (spying on them) and put restrictions in place. But they can't say this outloud because in the current era it's problematic, so the children become a good excuse.
This is particularly clear in governments that don't care about political correctness or are not competent enough to disguise their intentions. Such an example is the Argentine government, which these years passed laws to survey online activity and to put it's intelligence agency to spy on "anyone that puts sovereign narrative and cohesion at risk".
Arguably if you're grading LLM output, which by your definition cannot be novel, then it doesn't need to be graded with something that can. The gist of this grading approach is just giving them two examples and asking which is better, so it's completely arbitrary, but the grades will be somewhat consistent and running it with different LLM judges and averaging the results should help at least a little. Human judges are completely inconsistent.
Memorization is one ability people have, but it's not the only one. In the case of LLMs, it's the only ability it has.
Moreover, let's make this clear: LLMs do not memorize the same way people do, they don't memorize the same concepts people do, and they don't memorize the same content people do. This is why LLMs "have hallucinations", "don't follow instructions", "are censored", and "makes common sense mistakes" (these are words people use to characterize LLMs).
> nothing of what everyone does with LLMs daily would ever work
It "works" in the sense that the LLM's output serves a purpose designated by the people. LLMs "work" for certain tasks and don't "work" for others. "Working" doesn't require reasoning from an LLM, any tool can "work" well for certain tasks when used by the people.
> averaging the results should help at least a little
Averaging the LLM grading just exacerbates the illusion of LLM reasoning. It only confuses people. Would you ask your hammer to grade how well scissors cut paper? You could do that, and the hammer would say it gets the job done but doesn't cut well because it needs to smash the paper instead of cutting it; Your hammer's just talking in a different language. It's the same here. The LLMs output doesn't necessarily measure what the instructions in the prompt say.
> Human judges are completely inconsistent.
Humans can be inconsistent, but how well the LLM adapts to humans is itself a metric of success.
At the start, with no benchmark. Because LLMs can't reason at this time, and because we don't have a reliable way of grading LLM reasoning, and because people are stubborn thinking LLMs are actually reasoning we're at the start. When you ask a LLM "2 + 2 = ", it doesn't add the numbers together, it just looks up one of the stories it memorized and return what happens next. Probably in some such stories 2 + 2 = fish.
Similarly, when you're asking a LLM to grade another LLM, it's just looking up what happens next in it's stories, not even following instructions. "Following" instructions requires thinking, hence it's not even following instructions. But you can say you're commanding the LLM, or programming the LLM, so you have full responsibility for what the LLM produces, and the LLM has no authorship. Put in another way, the LLM cannot make something you yourself can't... at this point, in which it can't reason.
This is exactly my feeling with Kimi K2, it's unique in this regard, the only one that comes close is Gemini 3 pro, otherwise, no other model has been this good at helping out with communication.
It has such a good understanding with "emotional intelligence" (?), reading signals in messages, understanding intentions, taking human factors into consideration and social norms and trends when helping out with formulating a message.
I don't exactly know what Moonshot did during training but they succeeded with a unique trait on this model. This area deserves more highlight in my opinion.
I saw someone linking to EQ-bench which is about emotional intelligence in LLMs, looking at it, Kimi is #1. So this kind of confirms my feeling.
Link: https://eqbench.com
It's like saying that pictures of gay people encourages homosexuality