Readit News logoReadit News
suresk commented on Report: Tim Cook could step down as Apple CEO 'as soon as next year'   9to5mac.com/2025/11/14/ti... · Posted by u/achow
Razengan · a month ago
Does any CEO actually use their own company's products?

The richest and most "powerful" people still have meat-based assistants do all their shit: Take their notes, check their calendars, make their appointments, toast their bread..

And it shows: This is how you get features like "Edge Light" and an Invites app before fixing basic functionality that the peasants rely upon. Like how we get the weird iOS Journal app even though Notes could have done all that if they had improved it a bit.

Steve Jobs was probably one of the few people in charge who actually used his company's own products. You need someone who's annoyed with the status quo enough to make a company to solve it, not just someone elected by a board.

suresk · a month ago
The opposite problem can happen- the CEO uses the product all the time and becomes blind to problems. “It has always worked that way”, or “who would want to do that!?”” are much more common than pure apathy.
suresk commented on Electric bill may be paying for big data centers' energy use   theconversation.com/how-y... · Posted by u/taubek
suresk · 4 months ago
They also get massive subsidies and tax breaks for building these data centers. They require the negotiations be done in secret and often fight to keep the agreements secret to make it so people don’t flip out when they see how bad they are.
suresk commented on Sorting algorithms with CUDA   ashwanirathee.com/blog/20... · Posted by u/ashwani-rathee
suresk · 10 months ago
Kind of a fun toy problem to play around with. I noticed you had thread coarsening as an option to play around with - there is often some gain to be had here. I think this is also a fun thing to play around with Nsight on - things that are impacting your performance aren't always obvious and it is a pretty good profiler - might be worth playing around with. (I wrote about a fun thing I found with thread coarsening and automatic loop unrolling with Nsight here: https://www.spenceruresk.com/loop-unrolling-gone-bad-e81f66f...)

You may also want to look at other sorting algorithms - common CPU sorting algorithms are hard to maximize GPU hardware with - a network sort like bitonic sorting involves more work (and you have to pad to a power of 2) but often runs much faster on parallel hardware.

I had a fairly naive implementation that would sort 10M in around 10ms on an H100. I'm sure with more work they can get quite a bit faster, but they need to be fairly big to make up for the kernel launch overhead.

suresk commented on Show HN: We built a Plug-in Home Battery for the 99.7% of us without Powerwalls   pilaenergy.com... · Posted by u/coleashman
pedalpete · 10 months ago
I love the design, but I wonder about the "selling feature".

How many people regularly experience power outages (ok, if you're American relying on Canadian electricity, you might have a right to be concerned).

I'm surprised you're not touting the "save on your power bill" benefits. Could this not store power when rates are low, and use the battery when rates are higher, while maintaining a balanced minimum storage amount to ensure power is available should the power go out?

I'd think it could be quite smart about this if you looked at weather patterns and other factors to calculate a likelihood of an outage, and ensured more back-up was available.

From a selling stand-point, isn't saving money every day a better feature than "just in case the electricity goes out"?

suresk · 10 months ago
> I'm surprised you're not touting the "save on your power bill" benefits.

At ~$600/kWh for capacity, the ROI isn't great. I have a pretty big differential on my rates because I have an EV, and even then I'd need over a decade to make the $1,000 back assuming I fully discharged it every day.

suresk commented on Introduction to CUDA programming for Python developers   pyspur.dev/blog/introduct... · Posted by u/t55
t55 · 10 months ago
They basically ditched CUDA and went straight to writing in PTX, which is like GPU assembly, letting them repurposing some cores for communication to squeeze out extra performance. I believe that with better AI models and tools like Cursor, we will move to a world where you can mold code ever more specific to your use case to make it more performant.
suresk · 10 months ago
Are you sure they ditched CUDA? I keep hearing this, but it seems odd because that would be a ton of extra work to entirely ditch it vs selectively employing some ptx in CUDA kernels which is fairly straightforward.

Their paper [1] only mentions using PTX in a few areas to optimize data transfer operations so they don't blow up the L2 cache. This makes intuitive sense to me, since the main limitation of the H800 vs H100 is reduced nvlink bandwidth, which would necessitate doing stuff like this that may not be a common thing for others who have access to H100s.

1. https://arxiv.org/abs/2412.19437

suresk commented on Introduction to CUDA programming for Python developers   pyspur.dev/blog/introduct... · Posted by u/t55
jms55 · 10 months ago
Lets say you already have deep knowledge of GPU architecture and experience optimizing GPU code to saves 0.5ms runtime for a kernel. But you got that experience from writing graphics code for rendering, and have little knowledge of AI stuff beyond surface level stuff of how neural networks work.

How can I leverage that experience into earning the huge amounts of money that AI companies seem to be paying? Most job listings I've looked at require a PhD in specifically AI/math stuff and 15 years of experience (I have a masters in CS, and no where close to 15 years of experience).

suresk · 10 months ago
I've only done the CUDA side (and not professionally), so I've always wondered how much those skills transfer either way myself. I imagine some of the specific techniques employed are fairly different, but a lot of it is just your mental model for programming, which can be a bit of a shift if you're not used to it.

I'd think things like optimizing for occupancy/memory throughput, ensuring coalesced memory accesses, tuning block sizes, using fast math alternatives, writing parallel algorithms, working with profiling tools like nsight, and things like that are fairly transferable?

suresk commented on Railroad Tycoon II   filfre.net/2025/01/railro... · Posted by u/doppp
suresk · a year ago
So many fond memories of this game - it was a really fun blend of railroad sim and economic sim that I haven't really found since. I'll never forget the "ding ding ding" sound that goes off when a train pulls into a station and earns you a bit of cash!

Deleted Comment

suresk commented on Zen5's AVX512 Teardown and More   numberworld.org/blogs/202... · Posted by u/todsacerdoti
jsheard · a year ago
> The register file size makes sense, I didn't think they were that much of the die on those processors

https://i.imgur.com/WdMPX8S.jpeg

According to this, Zen4s FP register file is almost as big as its FP execution units. It's a pretty sizable chunk of silicon.

suresk · a year ago
I was having trouble finding an E Core die shot, but that helps put it into perspective a bit anyway. Thanks!
suresk commented on Zen5's AVX512 Teardown and More   numberworld.org/blogs/202... · Posted by u/todsacerdoti
jsheard · a year ago
I think Intels E-cores are quite a bit smaller than the Zen 4c/5c cores, maybe at that scale it's prohibitive to even double up the register file? That's required even if the logic is double-pumped. AIUI the small Zen cores are mostly the same design as the big ones, just with less cache, silicon layout retuned for density rather than speed, and the removal of the 3D Cache stacking vias, while Intels small cores are clean-sheet designs with next to nothing in common with their big cores so they have to opportunity to shrink them a lot more.
suresk · a year ago
My non-expert brain immediately jumped to double-pumping + maybe working with their thread director to have tasks using a lot of AVX512 instructions prefer P cores more. It feels like such an obvious solution to a really dumb problem that I assumed there was something simple I was missing.

The register file size makes sense, I didn't think they were that much of the die on those processors but I guess they had to be pretty aggressive to meet power goals?

u/suresk

KarmaCake day1418December 8, 2011
About
http://www.spenceruresk.com/

Email: suresk [at] gmail dot com

View Original