anoncareer0212 (u/anoncareer0212)

anoncareer0212 commented on Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens arstechnica.com/ai/2025/0... · Posted by u/blueridge

intended · a month ago

The point being made doesn’t impact people who can find utility from LLM output.

It’s only when you need to apply it to domains outside of code, or a domain where it needs to actually reason, that it becomes an issue.

anoncareer0212 · 25 days ago

What does actually reason mean? It's doing this complex anesthesiologist x crna x resident surgery scheduling thingy for ~60 surgeries a day for this one client. Looked a lot like LSAT logic games stuff scaled up to me, took me almost 20-30m to hand check. Is that reasoning?

anoncareer0212 commented on Ollama and gguf github.com/ollama/ollama/... · Posted by u/indigodaddy

hodgehog11 · a month ago

It's clear you have a better handle on the situation than I do, so it's a shame you weren't the one to talk to them face-to-face.

> llama.cpp has been just fine for me.

Of course, so you really shouldn't use Ollama then.

Ollama isn't a hobby project anymore, they were the only ones at the table with OpenAI many months before the release of GPT-OSS. I honestly don't think they care one bit about the community drama at this point. We don't have to like it, but I guess now they get to shape the narrative. That's their stance, and likely the stance of their industry partners too. I'm just the messenger.

anoncareer0212 · a month ago

> ...they were the only ones at the table with OpenAI many months before the release of GPT-OSS

In the spirit of TFA:

This isn't true, at all. I don't know where the idea comes from.

You've been repeating this claim frequently. You were corrected on this 2 hours ago. llama.cpp had early access to it just as well.

It's bizarre for several reasons:

1. It is a fantasy that engineering involves seats at tables and bands of brothers growing from a hobby to a ???, one I find appealing and romantic. But, fantasy nonetheless. Additionally, no one mentioned or implied anything about it being a hobby or unserious.

2. Even if it wasn't a fantasy, it's definitely not what happened here. That's what TFA is about, ffs.

No heroics, they got the ultimate embarrassing thing that can happen to a project piggybacking on FOSS: ollama can't work with the materials OpenAI put out to help ollama users because llama.cpp and ollama had separate day 1 landings of code, and ollama has 0 path to forking literally the entire community to use their format. They were working so loosely with OpenAI that OpenAI assumed they were being sane and weren't attempting to use it as an excuse to force a community fork of GGUF and no one realized until after it shipped.

3. I've seen multiple comments from you this afternoon spiking out odd narratives about Ollama and llama.cpp, that don't make sense at their face from the perspective of someone who also deps on llama.cpp. AFAICT you understood the GGML fork as some halcyon moment of freedom / not-hobbiness for a project you root for. That's fine. Unfortunately, reality is intruding, hence TFA. Given you're aware, it makes your humbleness re: knowing whats going on here sound very fake, especially when it precedes another rush of false claims.

4. I think at some point you owe it to even yourself, if not the community, to take a step back and slow down on the misleading claims. I'm seeing more of a gish-gallop than an attempt to recalibrate your technical understanding.

It's been almost 2 hours since you claimed you were sure there were multiple huge breakages due to bad code quality in llama.cpp, and here, we see you reframe that claim as a much weaker one someone else made to you vaguely.

Maybe a good first step to avoiding information pollution here would be to invest time spent repeating other peoples technical claims you didn't understand, and find some of those breakages you know for sure happened, as promised previously.

In general, I sense a passionate but youthful spirit, not an astro-turfer, and this isn't a group of professionals being disrespected because people still think they're a hobby project. Again, that's what the article is about.

anoncareer0212 commented on South Korea's military has shrunk by 20% in six years as male population drops channelnewsasia.com/east-... · Posted by u/eagleislandsong

jadamson · a month ago

> Idk what either of you are on about, if it makes you feel better.

Then please don't waste my time.

anoncareer0212 · a month ago

Why bother to be on a discussion forum if you're so pressed for time that you're lashing out at people agreeing with you and consider any interlocution a waste of time?

From the outside, it looks like all you get out of this is feeling upset, and it makes us wonder how you misread so wildly.

anoncareer0212 commented on OpenAI dropped the price of o3 by 80% twitter.com/sama/status/1... · Posted by u/mfiguiere

anoncareer0212 · 3 months ago

Been here for 15 years, and there's standards for interaction, especially for 19 day old accounts. I recommend other sites if you're expecting to be dismissive and rude without strong intellectual pushback.

anoncareer0212 commented on How to have the browser pick a contrasting color in CSS webkit.org/blog/16929/con... · Posted by u/Kerrick

johnisgood · 4 months ago

Well, I used 0.5 as a convenient and intuitive midpoint of the 0-1 luminance range, but this of course is a simplification and doesn't align with human perception (edit: it is aligned), it was more of an example if anything.

You are right, 0.18 is indeed perceptually closer to "middle gray" because the eye responds more sensitively to darker tones, so yeah, using a threshold closer to 0.18 makes more sense if we want to identify whether a color "feels" light or dark.

That said, 0.5 is a mathematical midpoint, but as I said, not aligned with how humans perceive brightness (edit: it is aligned).

Ultimately one could use 0.18-0.3 as threshold.

anoncareer0212 · 4 months ago

> midpoint of the 0-1 luminance range

There are two physical quantities for luminance, relative, and perceptual, so that passed along a nugget for those not as wise as you who might not know that :) As you know and have mentioned, using 0.5 with the luminance calculation you mentioned, for relative luminance, would be in error (I hate being pedantic, but it's important for some parties, a11y is a de facto legal requirement for a lot of work, and 0.5 would be spot on for ensuring WCAG 2 text contrast as long as used with perceptual luminance, L*)

> doesn't align with human perception

It is 100% aligned with how humans perceive brightness, in fact, it's a stable work product dating back to the early 1900s.

> Ultimately one could use 0.18-0.3 as threshold

Perceptual luminance and relative luminance have precise mathematical definitions, one can be calculated in terms of the other.

If you need to hit contrast K with background color C, you won't be able to treat this as variable. What you pass along about it being variable is valuable, of course, in that, given K and C, output has a range, i.e. if contrast algo says you need +40 L* for your text to hit APCA/WCAG whatever, and your C has 50 L*, your palette is everything from 90 L* to 100 L* and 0 L* to 10 L*.

anoncareer0212 commented on CERN releases report on the feasibility of a possible Future Circular Collider home.cern/news/news/accel... · Posted by u/gmays

TheOtherHobbes · 5 months ago

Whatever you think of Sabine, it's not accurate to dismiss this as spinning lies for clicks. Very similar core points are also made here:

https://www.nature.com/articles/d41586-025-00793-x

It seems accurate that this is an expensive proposed experiment with less expensive alternatives, with very real debate about costs and benefits.

I'm in the "spend more money on theory first" camp. You keep saying that theorists should guide experimenters, but you seem to mean that in the limited sense of poking a little harder at the Standard Model and hoping it breaks.

Meanwhile there are all kinds of open questions where fundamental theory around and parallel to the Standard Model is underdeveloped.

It might well be better to spend a few billion on doing something about that first, then designing experiments to test whatever falls out.

anoncareer0212 · 5 months ago

Be careful when rushing. Your viewpoint as expressed is perfectly rational.

However, you know that both of these claims asked about are blatantly false, and were distracted by the idea that saying those claims is false also implies all alternatives proposed are based on lies.

To wit, the ask was "Is it really true that there a no theories that are proven or discarded with this experiment, and that the Chinese have plans to do it much faster? Her video is pretty damning."

Both of those things are clearly false.

The Chinese part is blatantly false, to the point it can be worked out by a laymen who knows years ascend.

The Standard Model itself is in question, modulo semantics about proven/disproven and the philosophy of certainty, by any reasonable definition, theory is at stake.

anoncareer0212 commented on CERN releases report on the feasibility of a possible Future Circular Collider home.cern/news/news/accel... · Posted by u/gmays

pclmulqdq · 5 months ago

Electrons and positrons have been smashed together before. It has been done at CERN before, too. The premise of this experiment is to do it at higher energy than has been done before.

The scientific goals of that experiment are somewhat more unclear, though. The LHC had a landmark scientific purpose, finding the last particle in the standard model. There is, as far as I can tell, no specific experiment that can be the headline for this new machine because the LHC pretty much did its job (modulo some error) and string theory et al need higher energy. There are a bunch of guesses about the higgs field and about dark matter that failed to materialize at the LHC, so now we want moar energy to see if that fixes our problems.

As to the theory they will be proving, maybe there are a few minor ones about the higgs field, but that's pretty much it at this point.

anoncareer0212 · 5 months ago

I lay out above, the IEEE article linked lays out, and you come across to me as having domain knowledge to understand that having electron-positron collisions at the same energy level of LHC lets us nail down the hints of what we saw at LHC -- persistent deviations in the standard model that require new theory.

When we get new theory, then we go hunting new particles, presuming its physically possible (as you point out with the incorrect idea that this might be being built to look for confirmation of string theory)

I understand the idea this won't find new particles, is it worth it?, but the idea this is unclear, confusing, misguided, or hoping for an outcome are trivially verifiable as false.

Things like:

- "The scientific goals...are unclear" (they are very clear!)

- "(modulo some error)" (reducing the error in the glimpses of deviation from the standard model is the interesting part, 5 sigma or bust, because that lets the theorists know how to progress. This isn't just "oh we'd like to reduce error bars, a less-entitled discipline would just get some grad students on SPSS", this is "holy shit...looks like we found something is fucky in our fundamentals here, but all we know is its off. we need to figure out by how much to give the theorists more data")

- "string theory et al" (I worry very much about the effectiveness of my communication if this is coming up, to be clear, no one is attempting to verify string theory, and it doesn't come up at all even in Sabine's arguments, no? )

The IEEE article lays out this is not about discovering particles.

No one thinks new particles will be discovered.

The investment is not based on speculating new particles will be discovered.

The investment is not based on bad theory that new particles will be discovered.

The investment is not to find a sneaky way to hopefully accidentally find new particles.

Investments in colliders in general haven't been spectulatively looking for new particles in decades.

As both the IEEE, open source information, and my comment lay out above, they are specifically for nailing down these previously-assumed-settled values in the standard model. Because getting more data on the things theory can't explain leads to informed revisions in the theory. The next pendulum swing after that data would be theory to tell us a narrow band of energies to look at for any new particles theory needed to fix the standard model.

anoncareer0212 commented on CERN releases report on the feasibility of a possible Future Circular Collider home.cern/news/news/accel... · Posted by u/gmays

AshamedCaptain · 5 months ago

> I listened to her a lot recently because YouTube decided ever “next video” bump while I drove should be her.

I hope there was _any other criteria_ at play here? Why would I not be surprised that the answer is "NO" for 99.9% of the population? The world is really doomed...

anoncareer0212 · 5 months ago

> The world is really doomed...

Sadly, it absolutely 100% is. She's an amazing case of this. I'll spend the next 2 years gently walking people down from her clickbait, and I'll end up with net-negative karma for it.

I honestly had given up entirely until I saw a subthread about a month ago where people who knew the area were exchanging info of lesser-known youtubers who come and clean up her messes after.

The sad news is, they are getting more and more attention (I saw one over 400K+ views), but a lot of that just comes from being loud, proud, and aggressive, as well as having 30 minutes of video to justify the up front "hey, she's at best a not even wrong contrarian, and honestly, lets be clear at this point, shes a liar for views!!!!"

Thankfully I'm old enough to see this stuff happens in waves, within 2 years it'll become common wisdom that she's X, Y, and Z, and even if I disagree and just think she's misguided, that'll be enough for the tide to ebb.

anoncareer0212 commented on The Llama 4 herd ai.meta.com/blog/llama-4-... · Posted by u/georgehill

terhechte · 5 months ago

I don't think the time grows linearly. The more context the slower (at least in my experience because the system has to throttle). I just tried 2k tokens in the same model that I used for the 120k test some weeks ago and processing took 12 sec to first token (qwen 2.5 32b q8).

anoncareer0212 · 5 months ago

Hmmm, I might be rounding off wrong? Or reading it wrong?

IIUC the data we have:

2K tokens / 12 seconds = 166 tokens/s prefill

120K tokens / (10 minutes == 600 seconds) = 200 token/s prefill