foobarqux (u/foobarqux)

foobarqux commented on A statistical analysis of Rotten Tomatoes statsignificant.com/p/is-... · Posted by u/m463

og_kalu · 4 days ago

Yeah and his point is that's never going to happen lol. People bring up the 100% point a lot and it's a bit silly because a movie with a significant number of ratings is never going to have that kind of distribution.

That's why it's always a hypothetical never backed with actual examples. It's one of those things that sounds plausible until you look at the numbers. Movies close to 100% have pretty high average scores and Movies with majority 3/5's are nowhere near 100%.

Yeah 100% for RT doesn't mean 10/10, but that's it.

foobarqux · 5 hours ago

I've confirmed this is false using the clapper-massive-rotten-tomatoes dataset on Kaggle which contains reviews up to 2023 (so note the scores are different than current scores): there are in fact many movies that have high hot-or-not scores but are clustered near the 6/10 threshold among top 10% of movies according to number of reviews: e.g. The Peanuts Movie, The Good Lie, Gimme Danger, Dream Horse, etc. (filtering on reviews with numeric scores)

These are all movies with (at the time) >90% "approval" rating but average score about 7/10 with most reviews around the 6/10 threshold and tapering down at 7/10,8/10 (as opposed to being multi-modal/split-opinion, e.g. many at 6/10 and many also at 10/10).

foobarqux commented on Claim: GPT-5-pro can prove new interesting mathematics twitter.com/SebastienBube... · Posted by u/marcuschong

aabhay · 14 hours ago

I don’t get why so many people are resistant to the concept that AI can prove new mathematical theorems.

The entire field of math is fractal-like. There are many, many low hanging fruits everywhere. Much of it is rote and not life changing. A big part of doing “interesting” math is picking what to work on.

A more important test is to give an AI access to the entire history of math and have it _decide_ what to work on, and then judge it for both picking an interesting problem and finding a novel solution.

foobarqux · 13 hours ago

As others have said computers already help prove theorems like the four color theorem. It’s not that shocking that LLMs can prove a relative handful of obscure theorems. An alpha-theorem (neural net directed “brute force” search) type system will probably also be able to prove some theorems. There is no evidence today that there will be a massive breakthrough in math due to those systems let alone through LLM type systems.

If LLMs were already a breakthrough in proving theorems, even for obscure minor theorems, there would be a massive increase in published papers due to publish or perish academic incentives.

foobarqux commented on Claim: GPT-5-pro can prove new interesting mathematics twitter.com/SebastienBube... · Posted by u/marcuschong

freshtake · 14 hours ago

An interesting debate!

A few things to consider:

1. This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent? The author is an OpenAI employee IIUC, so it begs this question. Sora's demos were amazing until you tried it, and realized it took 50 attempts to get a usable clip.

2. The author noted that humans had updated their own research in April 2025 with an improved solution. For cases where we detect signs of superior behavior, we need to start publishing the thought process (reasoning steps, inference cycles, tools used, etc.). Otherwise it's impossible to know whether this used a specialty model, had access to the more recent paper, or in other ways got lucky. Without detailed proof it's becoming harder to separate legitimate findings from marketing posts (not suggesting this specific case was a pure marketing post)

3. Points 1 and 2 would help with reproducibility, which is important for scientific rigor. If we give Claude the same tools and inputs, will it perform just as well? This would help the community understand if GPT-5 is novel, or if the novelty is in how the user is prompting it

foobarqux · 14 hours ago

> This is one example. How many other attempts did the person try that failed to be useful, accurate, coherent?

High chance given that this is the same guy that came up with SVG unicorn (sparks of AGI) which raises the same question even more obviously.

foobarqux commented on A statistical analysis of Rotten Tomatoes statsignificant.com/p/is-... · Posted by u/m463

og_kalu · 2 days ago

You seem to be arguing in circles and are now actively misrepresenting my points to defend your own.

>The original point of contention was that the "percent that approve" of the film that RT uses is surprising and not as useful as a regular rating system.

No, the original point I was responding to was the tired, hypothetical claim that a movie could get "100% fresh" with every critic giving it a middling 3/5 stars. My point was that in practice, this never happens. You have provided zero evidence to the contrary and have now shifted the goalposts to a vague, subjective debate about "usefulness.". The original comment is right there. You can read.

>I don't need to have "heard of them" to know that the NYT film critic, the reviewer at Siskel and Ebert and the film critic at Vanity Fair are probably more worth listening to than the "MacGuffin or Meaning Substack".

This is just gatekeeping masquerading as an argument. They have more to say on the mechanics of film, not on what movies you'll think are the best. It's especially ridiculous when you realize that RT gets reviews from these 'top critics' and you can filter for them.

>The RT score tries to answers the question..."What are the odds the average person will like it"...

The RT percentage has a critics and top critics score so no it's not, not anymore than the metacritic score is 'the score the average person will give it'. That's not how statistics work.

>That's fine and as I said something I and everyone else does, just like I eat junk food (and maybe sometimes actually prefer to some 3-star Michelin restaurant). The problem is pretending that those two films are roughly the same quality, or that because someone sometimes prefers a lower critic ranked movie that ratings don't matter: you can make the same argument about preferring a "rotten" RT movie.

There's nothing to pretend. If i think it's better then it's better for me. What critics say doesn't matter. It's really sad that you apparently characterize movies you like with lower critic scores as 'junk food'. Have a mind of your own.

foobarqux · 2 days ago

> You have provided zero evidence to the contrary and have now shifted the goalposts to a vague, subjective debate about "usefulness.". The original comment is right there. You can read.

I literally provided several examples of unreasonable/misleading scores (there isn't an API so this is the best you can do). You on the other hand haven't provided any examples to show when RT is more useful than MC. The context of the original criticism is that the RT score is misleading and less useful than an actual rating.

> This is just gatekeeping masquerading as an argument. They have more to say on the mechanics of film, not on what movies you'll think are the best. It's especially ridiculous when you realize that RT gets reviews from these 'top critics' and you can filter for them.

It's not gatekeeping to recognize that there are actual experts in matter of taste who are going to be way more informative than the average joe. The implication of your argument is that every jack-asses opinion is just as valuable. No one actually believes this. And I don't know how many times I need to say this but I don't want to read a dozen critic reviews, I want a summary statistic (so that you can get a rough ranking of movies).

> The RT percentage has a critics and top critics score so no it's not, not anymore than the metacritic score is 'the score the average person will give it'. That's not how statistics work.

The problem is that the summary statistic (even if the average score was still visible) on RT includes so many people that it is closer to average opinion than critic opinion. That's apart from the problem with hot-or-not binary classification.

> There's nothing to pretend. If i think it's better then it's better for me. What critics say doesn't matter. It's really sad that you apparently characterize movies you like with lower critic scores as 'junk food'. Have a mind of your own.

I have explicitly said that I don't dogmatically follow critic opinions (no one does). The point is the opinions have value as a starting point to filter/select films that is better than popular opinion or a binary thresholding type system.

Once again you can make your same argument against any rating system (IMDB, RT,e tc) and absurdly claim that if you use those as a starting point to select films that you don't "have a mind of your own". In fact you can do this for ratings of any product or service and make the absolutely trivial point that a person's personal preferences or situation can deviate from consensus expert rankings. It's silly to then conclude that one ranking system can't be better than another or that popular opinion is equally valuable compared to expert opinion.

foobarqux commented on A statistical analysis of Rotten Tomatoes statsignificant.com/p/is-... · Posted by u/m463

og_kalu · 2 days ago

A movie with 100% on RT will never in practice get an average score of 6/10. This was the original point of contention. Nothing you've said so far has made this statement any less true, nor do you have any examples to refute it.

>It's much quicker and easier to just get an aggregated Metascore... I don't have any desire to read 12 movie review articles

So your argument against a broad-sample aggregator (RT) is to use a slightly-less-broad-sample aggregator (MC)? You complain about "every-person-with-a-substack" but you're still relying on an aggregation of dozens of critics you've never heard of. You're just drawing an arbitrary line in the sand and calling it "quality."

>"High score" is an arbitrary definition. For the purposes of the discussion... 74 doesn't cross the threshold of worth watching

You're confusing your personal, subjective viewing threshold with an objective measure of quality. A score is what it is. 75/100 is the top quartile. That is, by definition, a high score. Whether you have enough time in your life to watch every movie in the top quartile is completely irrelevant to the validity of the score itself.

Now this is more besides the point but i really do think that you're using a tool (Metacritic's average score) for a job it wasn't designed for: being the sole arbiter of what's worth your time. A film's "quality" is not a single, objective number. It depends on genre, intent, and audience.

Is a 95-rated historical epic 'better' than the 'best' horror film of the year that only managed an 82 on Metacritic? Your system says yes, which is absurd. They're trying to do different things.

Not to mention your method is overly biased towards one specific type of film: the prestige drama. If that's the only kind of film you like to watch then cool i guess, but if not then what you're currently doing is nonsensical.

>As yet another example "Bob Trevino likes it" is 94 RT vs 70 MC compared with "Past Lives" 95 RT vs MC 94: Which is more informative when selecting a movie? I can list more examples but I can't find any examples that demonstrate the reverse (i.e. that shows that you would be better off listening to RT over MC).

Even the most well received movies have a few mixed, negative or less positive than the consensus reviews. You could well be one of them. So the RT score tries to answers the question..."What are the odds i'll like this movie?"

This is a very useful information to have especially because i'm not a zombie picking movies to watch because of a single average score from an echo chamber of critics (which is bizarrely what you seem to be doing).

If the synopsis of Bob Trevino is more interesting to me, I would absolutely pick it over Past Lives especially if the latter seems more divisive.

They are complementary scores. Only when two movies seem to be the same type of film with the same type of distribution of scores will i favor the average score.

foobarqux · 2 days ago

> This was the original point of contention...

The original point of contention was that the "percent that approve" of the film that RT uses is surprising and not as useful as a regular rating system. (By the way the average score is now hidden on RT).

> So your argument against a broad-sample aggregator (RT) is to use a slightly-less-broad-sample aggregator (MC)?

My argument is to use useful aggregation of experts instead of a much less useful one.

> You complain about "every-person-with-a-substack" but you're still relying on an aggregation of dozens of critics you've never heard of.

I don't need to have "heard of them" to know that the NYT film critic, the reviewer at Siskel and Ebert and the film critic at Vanity Fair are probably more worth listening to than the "MacGuffin or Meaning Substack".

> You're just drawing an arbitrary line in the sand and calling it "quality."

No, the opinions of the film critics for the top publications in the world are not arbitrary.

> You're confusing your personal, subjective viewing threshold with an objective measure of quality. A score is what it is. 75/100 is the top quartile. That is, by definition, a high score. Whether you have enough time in your life to watch every movie in the top quartile is completely irrelevant to the validity of the score itself.

Beside the fact that the rating isn't the percentile ranking of film the entire point of the discussion is which site better helps you choose films. Again the definition of "high score" is completely arbitrary and irrelevant.

> Now this is more besides the point but i really do think that you're using a tool (Metacritic's average score) for a job it wasn't designed for: being the sole arbiter of what's worth your time. A film's "quality" is not a single, objective number. It depends on genre, intent, and audience.

I never said that. It's a helpful filtering mechanism. I watch low rated films if they are a genre I particularly like (just like I eat junk food without claiming that it is haute-cuisine) and I don't watch movies if they are not in a style I enjoy. Apropos of your example I don't like horror so I don't watch it, irrespective of the score.

> Not to mention your method is overly biased towards one specific type of film: the prestige drama. If that's the only kind of film you like to watch then cool i guess, but if not then what you're currently doing is nonsensical.

Most films are dramas as far as I know. In any case you can filter on categories so it's irrelevant.

> The RT score tries to answers the question..."What are the odds i'll like this movie?".

Well it's closer to what are the odds the average person will like it, which isn't what I want: I want 1. to be able to pick a better movie rather than a worse one and 2. be able to threshold on higher quality than the average person.

> This is a very useful information to have especially because i'm not a zombie picking movies to watch because of a single average score from an echo chamber of critics (which is bizarrely what you seem to be doing).

No one is doing this, they are using Metacritic as a starting point to filter and rank movies which, once again, RT doesn't do a good job at because of it's binary classifier system and inclusion of everyone under the sun.

> If the synopsis of Bob Trevino is more interesting to me, I would absolutely pick it over Past Lives. They are complementary scores.

That's fine and as I said something I and everyone else does, just like I eat junk food (and maybe sometimes actually prefer to some 3-star Michelin restaurant). The problem is pretending that those two films are roughly the same quality, or that because someone sometimes prefers a lower critic ranked movie that ratings don't matter: you can make the same argument about preferring a "rotten" RT movie.

foobarqux commented on A statistical analysis of Rotten Tomatoes statsignificant.com/p/is-... · Posted by u/m463

og_kalu · 3 days ago

>There aren't a hundred critics worth counting, it's just garbage in garbage out; I don't want every-person-with-a-substack's review, I want the dozen or so top film critics.

This is an argument against aggregation itself, not for Metacritic over RT. If you only trust a dozen specific critics, you should just read them directly. The entire purpose of an aggregator is to gather a wide sample to smooth out individual biases. That's the opposite of 'garbage in garbage out'. If your sample isn't wide as an aggregator, that's a minus no matter how you spin it.

>No, for this year alone... there are 68 movies with a score above 75 on Metacritic.

This is a nonsensical argument. By this logic, if we have a phenomenal year for film where 100 movies get a score over 75, the score itself becomes less valid? A score's meaning is relative to the scale, not the number of films that achieve it.

And Literally hundreds of movies are released every year. 8 a month is a tiny fraction of that.

Your personal viewing capacity doesn't change the fact that 75/100 is objectively a high score.

>We've established that the number is not very useful, far less useful than a 9.7/10 type score is.

No, you've asserted that. We've established they measure two different things. RT measures consensus (% of critics who liked it). Metacritic measures average intensity (a weighted average score). Both are useful. One tells you how many critics would recommend it, the other tells you how much they recommend it, on average. Claiming one is "not very useful" is just stating your personal preference as well as demonstrably false, as rotten tomatoes is very widely used.

foobarqux · 3 days ago

> If you only trust a dozen specific critics, you should just read them directly

It's much quicker and easier to just get an aggregated Metascore, which takes a second (and allows you to go in blind). I don't have any desire to read 12 movie review articles for every movie ever released.

> The entire purpose of an aggregator is to gather a wide sample to smooth out individual biases.

The point is to get a useful number not to achieve some platonic ideal in statistics. Again there aren't 100 movies critics worth listening to and I am not looking for popular opinion. If you want popular opinion use IMDB ratings.

> This is a nonsensical argument. By this logic, if we have a phenomenal year for film where 100 movies get a score over 75, the score itself becomes less valid? A score's meaning is relative to the scale, not the number of films that achieve it.

Yes is some fantasy world where that happens you would be right. In the real world that doesn't happen. Even if it did happen many people still have time constraints and want to watch only the best X films a year and Metacritic is just better at doing that than Rotten Tomatoes is. As yet another example "Bob Trevino likes it" is 94 RT vs 70 MC compared with "Past Lives" 95 RT vs MC 94: Which is more informative when selecting a movie? I can list more examples but I can't find any examples that demonstrate the reverse (i.e. that shows that you would be better off listening to RT over MC).

> And Literally hundreds of movies are released every year. 8 a month is a tiny fraction of that. Your personal viewing capacity doesn't change the fact that 75/100 is objectively a high score.

"High score" is an arbitrary definition. For the purposes of the discussion, which is whether Metacritic is a better way to determine which movies to watch, 74 doesn't cross the threshold of worth watching (absent some other factor) unless you watch more than 8 movies a month (and only want to watch movies released this year).

> No, you've asserted that. We've established they measure two different things. RT measures consensus (% of critics who liked it). Metacritic measures average intensity (a weighted average score). Both are useful. One tells you how many critics would recommend it, the other tells you how much they recommend it, on average. Claiming one is "not very useful" is just stating your personal preference as well as demonstrably false, as rotten tomatoes is very widely used.

Again, it is not useful in the sense of choosing movies to watch if you are even mildly selective. I gave another example above showing why. It's true that many people don't care about that, they just want something that the average person finds entertaining for 1.5 hours, and Rotten Tomatoes is fine for that. If you have a quality threshold higher than that or would rather watch a better movie than a worse one then it isn't.

foobarqux commented on A statistical analysis of Rotten Tomatoes statsignificant.com/p/is-... · Posted by u/m463

og_kalu · 4 days ago

>If you are talking about critic reviews there really aren't that many movie critics and you don't need that many.

RT still amasses a few hundred critics, and yes it matters statistically because scores will almost certainly decrease (or at the least be unstable) with more reviews until a statically significant threshold. Below hundred isn't it and a score based on 10 ratings is nigh useless.

>75 is not a high metacritic score, not just in absolute terms, but particularly not relative to the (ridiculous) 97% of rotten tomatoes.

Yes it's a high score. Have you taken a look at what kind of range best picture nominees fall at ? 75 is a high score. We've already established a 97% doesn't mean 9.7/10. Doesn't mean your contrived examples are a reality. I'm sure you can do arithmetic and see what a 3/5 falls to over 10.

foobarqux · 3 days ago

> RT still amasses a few hundred critics, and yes it matters statistically because scores will almost certainly decrease (or at the least be unstable) with more reviews until a statically significant threshold.

There aren't a hundred critics worth counting, it's just garbage in garbage out; I don't want every-person-with-a-substack's review, I want the dozen or so top film critics.

> Below hundred isn't it and a score based on 10 ratings is nigh useless.

It really isn't. Metacritic top movies for each year are indicative of the "quality" movies, as you would expect the average of the top 10 movie critics to be.

> Yes it's a high score. Have you taken a look at what kind of range best picture nominees fall at ? 75 is a high score.

No, for this year alone (which is only part way through) there are 68 movies with a score above 75 on Metacritic. If you were watching movies according to score alone that mean you would have to watch more than 8 movies a month just to get to those films (and that's if you refuse to watch movies from any other year).

> We've already established a 97% doesn't mean 9.7/10

We've established that the number is not very useful, far less useful than a 9.7/10 type score is.

Look no one is going to stop you from using Rotten Tomatoes if it meets your needs. For me and many other people who don't have time or desire to watch films below a certain quality we need an actual estimate of a quality score, which Rotten Tomatoes doesn't provide and Metacritic does.

foobarqux commented on A statistical analysis of Rotten Tomatoes statsignificant.com/p/is-... · Posted by u/m463

og_kalu · 4 days ago

Yes it's just true in practice.

Rotten Tomatoes and Metacritic are not the same site and have different audiences. Even the most popular movies will barely scrap 60 reviewers on Metacritic.

Comparing them directly is meaningless. Unfortunately they removed the average score for critics percentage but it's still there for the audience percentage.

You're also just wrong. Those movies, especially the last two have high Metacritic scores.

foobarqux · 4 days ago

> Rotten Tomatoes and Metacritic are not the same site and have different audiences.

Yes we are talking about aggregating critic reviews. It's true if you like what the mass audience likes you'll be fine with any kind of crude measure like rotten tomatoes (although you'll still be better off with IMDB scores).

> Even the most popular movies will barely scrap 60 reviewers on Metacritic.

If you are talking about critic reviews there really aren't that many movie critics and you don't need that many. If you are talking about user reviews that isn't what the site is geared for (and not what the users of the site want either, just go to IMDB).

> You're also just wrong. Those movies, especially the last two have high Metacritic scores.

75 is not a high metacritic score, not just in absolute terms, but particularly not relative to the (ridiculous) 97% of rotten tomatoes.

If you only want to watch a few movies a year (and presumably want them to be the "best") Metacritic is the only useful site (with the provisos that someone else posted about political films and modulating for your own personal preferences).

foobarqux commented on A statistical analysis of Rotten Tomatoes statsignificant.com/p/is-... · Posted by u/m463

og_kalu · 4 days ago

Yeah and his point is that's never going to happen lol. People bring up the 100% point a lot and it's a bit silly because a movie with a significant number of ratings is never going to have that kind of distribution.

That's why it's always a hypothetical never backed with actual examples. It's one of those things that sounds plausible until you look at the numbers. Movies close to 100% have pretty high average scores and Movies with majority 3/5's are nowhere near 100%.

Yeah 100% for RT doesn't mean 10/10, but that's it.

foobarqux · 4 days ago

It’s just not true in practice: it’s pretty typical to find films with high rotten tomatoes scores and not very high metacritic scores; rotten tomatoes scores are pretty much useless unless you are not very discerning.

Examples:sovereign, how to make a million…, count of monte cristo, etc

foobarqux commented on Efrit: A native elisp coding agent running in Emacs github.com/steveyegge/efr... · Posted by u/simonpure

foobarqux · 16 days ago

I managed to get this working with gemini by using a proxy [1] and the following config (I used quelpa)

    (use-package efrit
    :quelpa (efrit :fetcher git :repo "steveyegge/efrit")
    :init
    (setq efrit-model "gemini-2.5-pro")
    ;; (setq efrit-api-url "https://generativelanguage.googleapis.com/v1beta/opena
    (setq efrit-api-url "http://127.0.0.1:8089/v1/messages")
    :config (defun efrit--get-api-key () (key-from-file "~/.keys/gemini.txt")) ; this isn't needed, it's set by the proxy
    :ensure t)

I needed to remove the uvicorn version constraint when importing the project to uv to get it to find a version solution.

Initially I thought you could send it directly to Gemini but apparently you need to proxy and translate the responses.

[1] Seems sketchy, use at your risk: https://github.com/coffeegrind123/gemini-for-claude-code