stephencanon (u/stephencanon)

stephencanon commented on Who Invented Backpropagation? people.idsia.ch/~juergen/... · Posted by u/nothrowaways

eigenspace · 12 days ago

Reverse move automatic differentiation is not integration. It's still differentiation, but just a different method of calculating the derivative than the one you'd think to do by hand. It basically just applies the chain rule in the opposite order from what is intuitive to people.

It has a lot more overhead than regular forwards mode autodiff because you need to cache values from running the function and refer back to them in reverse order, but the advantage is that for function with many many inputs and very few outputs (i.e. the classic example is calculating the gradient of a scalar function in a high dimensional space like for gradient descent), it is algorithmically more efficient and requires only one pass through the primal function.

On the other hand, traditional forwards mode derivatives are most efficient for functions with very few inputs, but many outputs. It's essentially a duality relationship.

stephencanon · 12 days ago

I don't think most people think to do either direction by hand; it's all just matrix multiplication, you can multiply them in whatever order makes it easier.

stephencanon commented on White Mountain Direttissima whitemountainski.co/pages... · Posted by u/oftenwrong

dan-robertson · 18 days ago

What is it that makes this route a direttissima? I’m not super familiar with the term.

stephencanon · 18 days ago

All 48 peaks on the AMC white mountains 4000-footers¹ list in one continuous trek (no driving/shuttling/etc between trailheads).

¹ this list is outdated vis-a-vis modern mapping and includes at least one peak shorter than 4000 feet (Tecumseh) and omits at least one peak that should qualify per the rules (Guyot), but if the list were updated they would still have completed the direttissima, since they passed over Guyot on the way to the Bonds (dropping Tecumseh could only make the diretissima easier, but I'm not sure it makes much of a difference; it's been a decade or so since I hiked that section of the whites).

As an aside, that day 5 from Wildcat to Cabot is absolutely brutal even if you're fresh, to say nothing of having already covered 180 miles in the previous four days.

stephencanon commented on Operation Costs in CPU Clock Cycles (2016) ithare.com/infographics-o... · Posted by u/limoce

stephencanon · 19 days ago

Worth noting that division (integer, fp, and simd) has gotten much cheaper in the last decade. Division is partially pipelined on common microarchitectures now (capable of delivering a result every 2-4 cycles) and have greatly reduced latency from ~30-80 cycles down to ~10-20 cycles.

This improvement is sufficient to tip the balance toward favoring division in some algorithms where historically programmers went out of their way to avoid it.

stephencanon commented on Installing a mini-split AC in a Brooklyn apartment probablydance.com/2025/08... · Posted by u/ibobev

ipython · 21 days ago

They don’t say what the kWh usage is, just that the electricity cost in $$ is over $1000 on the highest month. For a unit surrounded by what should be other conditioned spaces, that’s insane to me.

A quick web search indicates that nyc $/kwh is about 31c. So that’s 3225kwh in one month! My standalone house plus pool pump, dual zone ac, and ev charger doesn’t even come close. Clearly there is a major insulation issue which is the root cause and everything else is just trying to put bandaids on an arterial bleed.

stephencanon · 20 days ago

Yeah, we live far north of NYC where it gets much colder, and have never spent nearly that much on heating. Even when we lived in a converted barn from the 1930s with single pane windows and no wall insulation, the most we ever spent was about $500/month. Now (new construction, triple-pane windows, ground-sourced heat pump) it’s more like $80/month

stephencanon commented on AI Ethics is being narrowed on purpose, like privacy was nimishg.substack.com/p/ai... · Posted by u/i_dont_know_

myrmidon · 23 days ago

I think a big factor in Asimov's laws specifically being sidelined is that the whole process of building AI looks very different from what we pictured back then.

Instead of us programming the AIs by feeding it lots of explicit hand-crafted rules/instructions, we're feeding the things with plain data instead, and the resulting behavior is much more black-box, less predictable and less controllable than anticipated.

Training LLMs is closer, conceptually, to raising children than to implementing regexp parsers, and the whole "small simple set of universal constraints" is just not really applicable/useful.

stephencanon · 23 days ago

Raising children involves a whole lot of simple constraints that you gradually relax.

“Don’t touch the knife” becomes “You can use _this_ knife, if an adult is watching,” which becomes “You can use these knives but you have to be careful, tell me what that means” and then “you have free run of the knife drawer, the bandages are over there.” But there’s careful supervision at each step and you want to see that they’re ready before moving up. I haven’t seen any evidence of that at all in LLM training—it seems to be more akin to handing each toddler every book ever written about knives and a blade and waiting to see what happens.

stephencanon commented on Consider using Zstandard and/or LZ4 instead of Deflate github.com/w3c/png/issues... · Posted by u/marklit

arp242 · 25 days ago

Comparison of "zpng" (PNG wth zstd) and WebP lossless, with current PNG. From https://github.com/WangXuan95/Image-Compression-Benchmark :

  Compressed format    Compressed size (bytes)  Compress Time  Decompress Time
  WEBP (lossless m5)   1,475,908,700           1,112          49
  WEBP (lossless m1)   1,496,478,650             720          37
  ZPNG (-19)           1,703,197,687           1,529          20
  ZPNG                 1,755,786,378              26          24

  PNG (optipng -o5)    1,899,273,578           27,680         26
  PNG (optipng -o2)    1,905,215,734            4,395         27
  PNG (optimize=True)  1,935,713,540            1,120         29
  PNG (optimize=False) 2,003,016,524              335         34

Doesn't really seem worth it? It doesn't compress better, and only slightly faster in decompression time.

stephencanon · 25 days ago

"Only slightly faster in decompression time."

m5 vs -19 is nearly 2.5x faster to decompress; given that most data is decompressed many many more times (often thousands or millions of times more, often by devices running on small batteries) than it is compressed, that's an enormous win, not "only slightly faster".

The way in which it might not be worth it is the larger size, which is a real drawback.

stephencanon commented on SF may soon ban natural gas in homes and businesses undergoing major renovations sfchronicle.com/sf/articl... · Posted by u/mikhael

SilverElfin · a month ago

Gas cooking is still much better. I have both. Induction just isn’t as enjoyable and you can’t do things like move your pan and have it keep heating like with a flame. Not to mention, induction is rough on pans. Banning things is aggressive and uncalled for.

stephencanon · a month ago

It’s really not. We built an all-electric ADU for my parents, then ended up living in it while renovating our house. As someone who cooks pretty much all meals, the induction range is better in almost every way than the fancy gas range that came with our house, so we’re replacing it with an induction cooktop in the renovation.

stephencanon commented on That XOR Trick (2020) florian.github.io//xor-tr... · Posted by u/hundredwatt

FBT · 2 months ago

> ... Does distributivity of inversion ~ over operation ⋆ follow from the other Abelian group axioms / properties? If so, how?

It does. For all x and y:

  (1) ~x ⋆ x = 0 (definition of the inverse)
  (2) ~y ⋆ y = 0 (definition of the inverse)
  (3) (~x ⋆ x) ⋆ (~y ⋆ y) = 0 ⋆ 0 = 0 (from (1) and (2))
  (4) (~x ⋆ ~y) ⋆ (x ⋆ y) = 0 (via associativity and commutativity)

In (4) we see that (~x ⋆ ~y) is the inverse of (x ⋆ y). That is to say, ~(x ⋆ y) = (~x ⋆ ~y). QED.

stephencanon · 2 months ago

Right. Another way to see this is that for a general (possibly non-Abelian) group, the inverse of xy is y⁻¹x⁻¹ (because xyy⁻¹x⁻¹ = x1x⁻¹ = xx⁻¹ = 1 [using "1" for the identity here, as is typical for general groups], or more colloquially, "the inverse operation of putting on your socks and shoes is taking off your shoes and socks"). For an Abelian group, y⁻¹x⁻¹ = x⁻¹y⁻¹, and we're done.

stephencanon commented on Swift Binary Parsing github.com/apple/swift-bi... · Posted by u/gok

leakycap · 3 months ago

This is such a nice surprise - both that Apple continues to use github and that they're working to solve this in Swift in a universalish way.

The downside for now being Project Status which indicates likely source-breaking future updates, so I'll be watching for a while before I try to implement it.

stephencanon · 3 months ago

Worth pointing out that we're already using this in the OS for "real work", so we consider it to be production-usable, despite the 0.x tag. The pre-1.0 tag is really about the fact that we expect to make some changes to the high-level patterns as we do more to adopt Span-based APIs over the coming months.

Which means, yes, using it now is signing up for some source breaks, but they won't be too abrupt and this is considerably more stable than a typical first 0.x release would be.

stephencanon commented on Highly efficient matrix transpose in Mojo veitner.bearblog.dev/high... · Posted by u/timmyd

saagarjha · 3 months ago

It's quite rare. Usually problems are tiled anyway and you can amortize the cost of having data in the "wrong" layout by loading coalesced in whatever is the best layout for your data and then transposing inside your tile, which gives you access to much faster memory.

stephencanon · 3 months ago

The one pure transpose case that does come up occasionally is an in-place non-square transpose, where there is a rich literature of very fussy algorithms. If someone managed to make any headway with compiler optimization there, I'd be interested.