fjdjshsh (u/fjdjshsh)

fjdjshsh commented on The world of Japanese snack bars bbc.com/travel/article/20... · Posted by u/rmason

It’s worth pointing out that there are certainly establishments where tourists aren’t welcome. Ironically I’ve had some gay friends walked from a local only gay bar to the tourists welcome gay bar across the street :-)

fjdjshsh · 2 months ago

Not a local, but in my experience this is due to tourists not being able to speak Japanese, which makes the people working in a place very uncomfortable ("will this person follow the rules? How can I do proper service if I can't communicate?"). A 大丈夫、少し日本語をしゃべります (it's ok, I speak a bit of japanese) has been enough to open the doors for me.

That being said, they do have issues with some nationalities. For example, the average American is way too loud for the average japanese place. Even if they think they are being polite, they just talk too loud and too much for japanese sensibilities.

fjdjshsh commented on Backpropagation is a leaky abstraction (2016) karpathy.medium.com/yes-y... · Posted by u/swatson741

nirinor · 4 months ago

Its a nit pick, but backpropagation is getting a bad rep here. These examples are about gradients+gradient descent variants being a leaky abstraction for optimization [1].

Backpropagation is a specific algorithm for computing gradients of composite functions, but even the failures that do come from composition (multiple sequential sigmoids cause exponential gradient decay) are not backpropagation specific: that's just how the gradients behave for that function, whatever algorithm you use. The remedy, of having people calculate their own backwards pass, is useful because people are _calculating their own derivatives_ for the functions, and get a chance to notice the exponents creeping in. Ask me how I know ;)

[1] Gradients being zero would not be a problem with a global optimization algorithm (which we don't use because they are impractical in high dimensions). Gradients getting very small might be dealt with by with tools like line search (if they are small in all directions) or approximate newton methods (if small in some directions but not others). Not saying those are better solutions in this context, just that optimization(+modeling) are the actually hard parts, not the way gradients are calculated.

fjdjshsh · 4 months ago

I get your point, but I don't think your nit-pick is useful in this case.

The point is that you can't abstract away the details of back propagation (which involve computing gradients) under some circumstances. For example, when we are using gradient descend. Maybe in other circumstances (global optimization algorithm) it wouldn't be an issue, but the leaky abstraction idea isn't that the abstraction is always an issue.

(Right now, back propagation is virtually the only way to calculate gradients in deep learning)

fjdjshsh commented on A definition of AGI arxiv.org/abs/2510.18212... · Posted by u/pegasus

stared · 4 months ago

There’s already a vague definition that AGI is an AI with all the cognitive capabilities of a human. Yes, it’s vague - people differ.

This paper promises to fix "the lack of a concrete definition for Artificial General Intelligence", yet it still relies on the vague notion of a "well-educated adult". That’s especially peculiar, since in many fields AI is already beyond the level of an adult.

You might say this is about "jaggedness", because AI clearly lacks quite a few skills:

> Application of this framework reveals a highly “jagged” cognitive profile in contemporary models.

But all intelligence, of any sort, is "jagged" when measured against a different set of problems or environments.

So, if that’s the case, this isn’t really a framework for AGI; it’s a framework for measuring AI along a particular set of dimensions. A more honest title might be: "A Framework for Measuring the Jaggedness of AI Against the Cattell–Horn–Carroll Theory". It wouldn't be nearly as sexy, though.

fjdjshsh · 4 months ago

>But all intelligence, of any sort, is "jagged" when measured against a different set of problems or environments.

On the other hand, research on "common intelligence" AFAIK shows that most measures of different types of intelligence have a very high correlation and some (apologies, I don't know the literature) have posited that we should think about some "general common intelligence" to understand this.

The surprising thing about AI so far is how much more jagged it is wrt to human intelligence

fjdjshsh commented on Introduction to Multi-Armed Bandits (2019) arxiv.org/abs/1904.07272... · Posted by u/Anon84

rented_mule · 5 months ago

We employed bandits in a product I worked on. It was selecting which piece of content to show in a certain context, optimizing for clicks. It did a great job, but there were implications that I wish we understood from the start.

There was a constant stream of new content (i.e., arms for the bandits) to choose from. Instead of running manual experiments (e.g., A/B tests or other designs), the bandits would sample the new set of options and arrive at a new optimal mix much more quickly.

But we did want to run experiments with other things around the content that was managed by the bandits (e.g., UI flow, overall layout, other algorithmic things, etc.). It turns out bandits complicate these experiments significantly. Any changes to the context in which the bandits operate lead them to shift things more towards exploration to find a new optimal mix, hurting performance for some period of time.

We had a choice we could make here... treat all traffic, regardless of cohort, as a single universe that the bandits are managing (so they would optimize for the mix of cohorts as a whole). Or we could setup bandit stats for each cohort. If things are combined, then we can't use an experiment design that assumes independence between cohorts (e.g., A/B testing) because the bandits break independence. But the optimal mix will likely look different for one cohort vs. another vs. all of them combined. So it's better for experiment validity to isolate the bandits for each cohort. Now small cohorts can take quite a while to converge before we can measure how well things work. All of this puts a real limit on iteration speed.

Things also become very difficult to reason about because their is state in the bandit stats that are being used to optimize things. You can often think of that as a black box, but sometimes you need to look inside and it can be very difficult.

Much (all?) of this comes from bandits being feedback loops - these same problems are present in other approaches where feedback loops are used (e.g., control theory based approaches). Feedback mechanisms are incredibly powerful, but they couple things together in ways that can be difficult to tease apart.

fjdjshsh · 5 months ago

>Things also become very difficult to reason about because their is state in the bandit stats that are being used to optimize things. You can often think of that as a black box, but sometimes you need to look inside and it can be very difficult.

One way to peak into the state is to use bayesian models to represent the "belief" state of the bandits. For example, the arm's "utility" can be a linear function of the features of the arm. At each period, you can inspect the coefficients (and their distribution) for each arm.

See this package:

https://github.com/bayesianbandits/bayesianbandits

fjdjshsh commented on Failing to Understand the Exponential, Again julian.ac/blog/2025/09/27... · Posted by u/lairv

hnlmorg · 5 months ago

My point is that the limits of LLMs will be hit long before we they start to take on human capabilities.

The problem isn’t that exponential growth is hard to visualise. The problem is that LLMs, as advanced and useful a technique as it is, isn’t suited for AGI and thus will never get us even remotely to the stage of AGI.

The human like capabilities are really just smoke and mirrors.

It’s like when people anthropomorphisise their car; “she’s being temperamental today”. Except we know the car is not intelligence and it’s just a mechanical problem. Whereas it’s in the AI tech firms best interest to upsell the human-like characteristics of LLMs because that’s how they get VC money. And as we know, building and running models isn’t cheap.

fjdjshsh · 5 months ago

>the limits of LLMs will be hit long before we they start to take on human capabilities.

Why do you think this? The rest of the comment is just rephrasing this point ("llms isn't suited for AGI"), but you don't seem to provide any argument.

fjdjshsh commented on Microsoft blocks Israel’s use of its tech in mass surveillance of Palestinians theguardian.com/world/202... · Posted by u/helsinkiandrew

fjdjshsh · 6 months ago

50% of Gaza destroyed, 100% of the hospitals. It's a good thing they precisely targeted Hamas assets

fjdjshsh commented on Microsoft blocks Israel’s use of its tech in mass surveillance of Palestinians theguardian.com/world/202... · Posted by u/helsinkiandrew

sir0010010 · 6 months ago

In 1945, about ~90k people died over 2 days from the Tokyo Firebombing. Do you think it would be difficult for any modern millitary - that intentionally wanted to cause as much collateral damage as possible - to greatly exceed that number?

fjdjshsh · 6 months ago

Not sure what is your point. The Israeli military could throw a few atomic bombs and wipe out the entire population in Gaza. That they don't is a sign of restraint for you?

fjdjshsh commented on Microsoft blocks Israel’s use of its tech in mass surveillance of Palestinians theguardian.com/world/202... · Posted by u/helsinkiandrew

kennywinker · 6 months ago

Perhaps the actual moral choice isn’t attacking blindly or mass surveillance of an occupied nation - it’s peace?

Regardless, the death toll in gaza (somewhere between 45,000 and 600,000) suggests that this mass surveillance isn’t being used effectively to reduce the death toll. It also doesn’t take mass surveillance to know that bombing hospitals and schools is going to kill innocent people.

fjdjshsh · 6 months ago

You're assuming the objective is to lower the civilian casualties. From the statements of prominent Israeli ministers and the actual behavior of the bombardment it's pretty clear that, for the Israeli government, killing civilians is a feature, not a bug

fjdjshsh commented on The Lost Japanese ROM of the Macintosh Plus journaldulapin.com/2025/0... · Posted by u/ecliptik

pests · 10 months ago

One distinction is the original transformer was an encoder/decoder while (most?) LLMs today are encoder only.

The translation transformer also was able to peek ahead in the context window while (most?) LLM's now only consider previous tokens.

fjdjshsh · 10 months ago

They're usually thought as "decoder only"

fjdjshsh commented on “Streaming vs. Batch” Is a Wrong Dichotomy, and I Think It's Confusing morling.dev/blog/streamin... · Posted by u/ingve

brudgers · 10 months ago

Streams have unknown size and may be infinite.

Batches have a known size and it are not infinite.

fjdjshsh · 10 months ago

Maybe I'm using the wrong definitions, but I think that's backwards.

Say you are receiving records from users and different intervals and you want to eventually store them in a different format on a database.

Streaming to me means you're "pushing" to the database according to some rule. For example, wait and accumulate 10 records to push. This could happen in 1 minute or in 10 hours. You know the size of the dataset (exactly 10 records). (You could also add some max time too and then you'd be combining batching with streaming)

Batching to me means you're pulling from the database. For example, you pull once every hour. In that hour, you get 0 records or 1000 records. You don't know the size and it's potentially infinite