imtringued (u/imtringued)

imtringued commented on China is eating the world apropos.substack.com/p/ch... · Posted by u/sg5421

anigbrowl · 16 hours ago

reduced to poverty by communism

Communism does not take off in stable prosperous societies because there isn't a market for it. It quite literally requires an underclass of people unhappy enough to stake their lives on establishing a different social order.

imtringued · 4 hours ago

According to Marx, communism will arrive in wealthy industrialised countries first, due to the contradictions of late stage capitalism.

In reality, however, the opposite happened. Russian potato farmers without any machines or capital started industrialising the moment the communists took over.

Communism is a dead ideology, because it failed to evolve in the face of reality disagreeing with the communist world view.

Communists think that capital grants its owners power and that competition leads to exploitation, when the exact opposite is true.

imtringued commented on Das Problem mit German Strings polarsignals.com/blog/pos... · Posted by u/asubiotto

chombier · 2 days ago

my tl;dr: after reading the article:

- two 64-bits words representation

- fixed, 32 bits length

- short strings (<12 bytes) are stored in-place

- long strings store a 4 byte prefix in-place + pointer to the rest

- two bits are used as flags in the pointer to further optimize some use-cases

imtringued · 5 hours ago

Seems like they missed an opportunity to have a 8 byte version for strings that fit in the 4 byte prefix.

imtringued commented on AI adoption linked to 13% decline in jobs for young U.S. workers: study cnbc.com/2025/08/28/gener... · Posted by u/pseudolus

deanmoriarty · 11 hours ago

You’ll get downvoted but in my experience, which may not be representative of the entire population, this is true.

A mid-size US tech company I know well went fully remote after a lot of insistence from the workforce, prior to the pandemic they were fully in office.

Soon enough they started hiring remotely from EU, and now the vast majority of their technical folks are from there. The only US workers remaining are mostly GTM/sales. I personally heard the founder saying “why should we pay US comp when we can get extremely good talent in EU for less than half the cost”. EU workers, on average, also tend to not switch job as frequently, so that’s a further advantage for the company.

Once you adapt to remote-only, you can scoop some amazing talent in Poland/Ukraine/Serbia/etc for $50k a year.

imtringued · 6 hours ago

The fixed exchange rates between EU countries massively drags down the international cost of a German software engineer, and US companies have yet to wisen up to that fact.

imtringued commented on Important machine learning equations chizkidd.github.io//2025/... · Posted by u/sebg

cubefox · a day ago

What does this comment have to do with the previous comment, which talked about supervised learning?

imtringued · a day ago

Reread the comment

"Backprop is just a way to compute the gradients of the weights with respect to the cost function, not an algorithm to minimize the cost function wrt. the weights."

What does the word supervised mean? It's when you define a cost function to be the difference between the training data and the model output.

Aka something like (f(x)-y)^2 which is simply the quadratic difference between the result of the model given an input x from the training data and the corresponding label y.

A learning algorithm is an algorithm that produces a model given a cost function and in the case of supervised learning, the cost function is parameterized with the training data.

The most common way to learn a model is to use an optimization algorithm. There are many optimization algorithms that can be used for this. One of the simplest algorithms for the optimization of unconstrained non-linear functions is stochastic gradient descent.

It's popular because it is a first order method. First order methods only use the first partial derivative known as the gradient whose size is equal to the number of parameters. Second order methods converge faster, but they need the Hessian, whose size scales with the square of the to be optimized parameters.

How do you calculate the gradient? Either you calculate each partial derivative individually, or you use the chain rule and work backwards to calculate the complete gradient.

I hope this made it clear that your question is exactly backwards. The referenced blog is about back propagation and unnecessarily mentions supervised learning when it shouldn't have done that and you're the one now sticking with supervised learning even though the comment you're responding to told you exactly why it is inappropriate to call back propagation a supervised learning algorithm.

imtringued commented on Important machine learning equations chizkidd.github.io//2025/... · Posted by u/sebg

cl3misch · a day ago

I actually see this a lot: confusing backpropagation with gradient descent (or any optimizer). Backprop is just a way to compute the gradients of the weights with respect to the cost function, not an algorithm to minimize the cost function wrt. the weights.

I guess giving the (mathematically) simple principle of computing a gradient with the chain rule the fancy name "backpropagation" comes from the early days of AI where the computers were much less powerful and this seemed less obvious?

imtringued · a day ago

The German Wikipedia article makes the same mistake and it is quite infuriating.

imtringued commented on Nvidia DGX Spark nvidia.com/en-us/products... · Posted by u/janandonly

Y_Y · a day ago

Even if that "sparsity feature" is that two or of every four adjacent values in your areay be zeros, and that performance halves if not doing this?

I think lots of children are going to be very disappointed running their blas benchmarks on Christmas morning and seeing barely tens of teraflops.

(For reference see how the still optimistic numbers are for the H200 when you use realistic datatypes.

https://nvdam.widen.net/s/nb5zzzsjdf/hpc-datasheet-sc23-h200... )

imtringued · a day ago

Using sparsity in advertising is incredibly misleading to the point of lying. The entire point of sparsity is that you avoid doing calculations. Sparsity support means you need fewer FLOPs for a matrix of the same size. It doesn't magically increase the number of FLOPs you have.

Even AMD got that memo and is mostly advertising their 8bit/block fp16 performance on their GPUs and NPUs, even though the NPUs support 4 bit INT with sparsity, which would 4x the quoted numbers if they used Nvidia's marketing FLOPs.

imtringued commented on Nvidia DGX Spark nvidia.com/en-us/products... · Posted by u/janandonly

woooooo · a day ago

Matrix vector multiplication for feed forward layers is most of the bandwidth as I understand things, there's not really a way to do it "better", its just a bunch of memory-bound dot products.

(Posting this comment in hopes of being corrected and learning something).

imtringued · a day ago

Training is performed in parallel with batching and is more flops heavy. I don't have an intuition on how memory bandwidth intensive updating the parameters is. It shouldn't be much worse than doing a single forward pass though.

imtringued commented on Efficient Array Programming github.com/razetime/effic... · Posted by u/todsacerdoti

imtringued · a day ago

Considering the complete absence of array languages in a field dominated by operations on tensors, I think it is fair to say that the terse array programming languages like APL aren't just niche languages. They're niche languages even in the category of niche languages.

In theory you should be able to define entire neural networks with the help of a handful lines of APL. You wouldn't even bother with complex frameworks offering you pre-built architectures. You'd just copy paste the 10 lines of fully self contained APL code that describes the network from the documentation, because even the idea of downloading a library is overkill.

imtringued commented on GMP damaging Zen 5 CPUs? gmplib.org/gmp-zen5... · Posted by u/sequin

db48x · a day ago

Yes, but only briefly. When you study the thermodynamics of information you’ll discover that it’s actually erasing information that has a cost. Every time the CPU stores a value in a register it erases the previous value, using up energy. In fact, every individual transistor has to erase the previous state on basically every clock cycle.

Curiously there is a minimum cost to erase a single bit that no system can go below. It’s extremely small, billions of times smaller than the amount of energy our CPUs use every time they erase a bit, but it exists. Look up Landauer’s Limit. There is a similar limit on the maximum amount of information stored in a system which is proportional to the surface area of the sphere that the information fits inside. Exceed that limit and you’ll form a black hole. We’re no where near that limit yet either.

imtringued · a day ago

>In fact, every individual transistor has to erase the previous state on basically every clock cycle.

This is incorrect in both directions.

Only transistors whose inputs are changing have to discharge their capacitance.

This means that if the inputs don't change nothing happens, but if the inputs change then the changes propagate through the circuit to the next flip flop, possibly creating a cascade of changes.

Consider this pathological scenario: The first input changes, then a delay happens, then the second input changes so that the output remains the same. This is known as a "glitch". Even though the output hasn't changed, the downstream transistors see their input switch twice. Glitches propagate through transistors and not only that, if another unfortunate timing event happens, you can end up with accumulating multiple glitches. A single transistor may switch multiple times in a clock cycle.

Switching transistors costs energy, which means you end up with "parasitic" power consumption that doesn't contribute to the calculated output.

imtringued commented on Word documents will be saved to the cloud automatically on Windows going forward ghacks.net/2025/08/27/you... · Posted by u/speckx

tsukikage · 2 days ago

OneDrive has to manage synchronising the cloud with multiple, potentially independently updated, local copies. This is a much harder problem than anything Google have tried to tackle, with more ways for things to go wrong compared to "no internet connectivity? No documents for you!"

This has the effect that (to a first approximation) everyone knows someone with a horrific OneDrive data loss story, no-one particularly trusts OneDrive with anything actually important, and so no-one wants to be forced to use it for everything.

imtringued · a day ago

That sounds like an extreme amount of incompetence to me.

I understand that you run into this problem with third party software like Dropbox, because they aren't natively integrated with the OS and therefore need to do some unreliable file tagging to support basic operations like renaming, moving or copying files, but Microsoft controls the entire OS.

They can scan the filesystem journal for file system operations. They can build custom OneDrive specific features into their file explorer. They have an office suite that can directly integrate with OneDrive.

Yet they chose to not do that and instead decided that they really ought to collect all your data for training AI instead.