Readit News logoReadit News
averne_ commented on Fabrice Bellard: Biography (2009) [pdf]   ipaidia.gr/wp-content/upl... · Posted by u/lioeters
dllu · 2 months ago
The fact that so many people use FFmpeg and QEMU suggest that he is quite good at documenting, collaborating, and at least making his code remarkably clean and easy to follow. This already puts him way ahead of the average silicon valley senior software engineer that I've worked with. However, he does value independence so I don't think he would have been happy working at a faang-type company for long.
averne_ · 2 months ago
Not really. https://codecs.multimedia.cx/2022/12/ffhistory-fabrice-bella...

>Fabrice won International Obfuscated C Code Contest three times and you need a certain mindset to create code like that—which creeps into your other work. So despite his implementation of FFmpeg was fast-working, it was not very nice to debug or refactor, especially if you’re not Fabrice

averne_ commented on How Brian Eno Created Ambient 1: Music for Airports (2019)   reverbmachine.com/blog/de... · Posted by u/dijksterhuis
gonzalohm · 2 months ago
Do you have any recommendations?
averne_ · 2 months ago
Not OP but I also often to listen to ambient while programming. A couple recommendations would be "Music for Nine Post Cards" and other works by Hiroshi Yoshimura, and "Music for 18 musicians" and others by Steve Reich.

In fact, the use of loops described in this article reminded me of what Reich called "phases", basically the same concept of emerging/shifting melodic patterns between different samples.

averne_ commented on Nobel Prize in Physics 2025   nobelprize.org/prizes/phy... · Posted by u/luisb
SiempreViernes · 4 months ago
They discovered that the theory worked in a regime it hadn't been tested before; I'm not sure what "new physics" means in your sentence: it is a core assumption of physics that it's rules are always true, that all physics has always existed.
averne_ · 4 months ago
New physics in this context means previously unknown effects or mechanisms, or even a new theory/framework for an already understood phenomenon. Using "physics" in this way is common amongst academics.
averne_ commented on Cerebras systems raises $1.1B Series G   cerebras.ai/press-release... · Posted by u/fcpguru
twothreeone · 4 months ago
I don't think you realize the history of wafer-scale integration and what it means for the chip industry [1]. The approach was famously taken by Gene Amdahl's Trilogy Systems in the 80ies, but failed dramatically leading to (among others) deployment of "accelerator cards" in the form of.. the NVIDIA GeForce 256, the first GPU in 1999. It's not like NVIDIA hasn't been trying to integrate multiple dies in the same package, but doing that successfully has been a huge technological hurdle so far.

[1] https://ieeexplore.ieee.org/abstract/document/9623424

averne_ · 4 months ago
The main reason a wafer scale chip works there is because their cores are extremely tiny, and silicon area that gets fused off in the event of a defect is much lower than on NVIDIA chips, where a whole SM can get disabled. AFAIU this approach is not easily applicable to complex core designs.
averne_ commented on AMD’s RDNA4 GPU architecture   chipsandcheese.com/p/amds... · Posted by u/rbanffy
sylware · 5 months ago
Another pane of AMD GPU R&D is the _userland_ _hardware_ [ring] buffers for near direct hardware userland programming.

They started to experiment on that in mesa and linux ("user queues", as "user hardware queues").

I don't know how they will work around the scarse VM IDs, but here, we are talking near 0 driver. Obviously, they will have to simplify/cleanup a lot 3D pipeline programming and be very sure of its robustness, basically to have it ready for "default" rendering/usage right away.

Userland will get from the kernel stuff along those lines: command/event hardware ring buffers, data dma buffers, read/write pointers & doorbells memory page for those ring buffers, and an event file descriptor for an event ring buffer. Basically, what the kernel currently has.

I wonder if it will provide some significant simplification than the current way which is giving indirect command buffers to the kernel and deal with 'sync objects'/barriers.

averne_ · 5 months ago
The NVidia driver also has userland submission (in fact it does not support kernel-mode submission at all). I don't think it leads to a significant simplification or not of the userland code, basically a driver has to keep track of the same thing it would've submitted to an ioctl. If anything there are some subtleties that require careful consideration.

The major upside is removing the context switch on a submission. The idea is that an application only talks to the kernel for queue setup/teardown, everything else happens in userland.

averne_ commented on Ask HN: Why hasn't x86 caught up with Apple M series?    · Posted by u/stephenheron
daeken · 5 months ago
There's no AArch32 or Thumb support (A32/T32) on M-series chips. AArch64 (technically A64) is the only supported instruction set. Fun fact: this makes it impossible to run Mario Kart 8 via virtualization on Macs without software translation, since it's A32.

How much that does for efficiency I can't say, but I imagine it helps, especially given just how damn easy it is to decode.

averne_ · 5 months ago
It actually doesn't make much difference: https://chipsandcheese.com/i/138977378/decoder-differences-a...
averne_ commented on FFmpeg 8.0   ffmpeg.org/index.html#pr8... · Posted by u/gyan
dtf · 6 months ago
I don't have a link to hand right now, but I'll try to put one up for you this weekend. I'm very interested in your implementation - thanks, will take a good look!

Initially this was just a vehicle for me to get stuck in and learn some WebGPU, so no doubt I'm missing lots of opportunities for optimisation - but it's been fun as much as frustrating. I leaned heavily on the SMPTE specification document and the FFMPEG proresdec.c implementation to understand and debug.

averne_ · 6 months ago
No problem, just be aware there's a bunch of optimizations I haven't had time to implement yet. In particular, I'd to remove the reset kernel, fuse the VLD/IDCT ones, and try different strategies and hw-dependent specializations for the IDCT routine (AAN algorithm, packed FP16, cooperative matrices).
averne_ commented on FFmpeg 8.0   ffmpeg.org/index.html#pr8... · Posted by u/gyan
dtf · 6 months ago
These release notes are very interesting! I spent a couple of weeks recently writing a ProRes decoder using WebGPU compute shaders, and it runs plenty fast enough (although I suspect Apple has some special hardware they make use of for their implementation). I can imagine this path also working well for the new Android APV codec, if it ever becomes popular.

The ProRes bitstream spec was given to SMPTE [1], but I never managed to find any information on ProRes RAW, so it's exciting to see software and compute implementations here. Has this been reverse-engineered by the FFMPEG wizards? At first glance of the code, it does look fairly similar to the regular ProRes.

[1] https://pub.smpte.org/doc/rdd36/20220909-pub/rdd36-2022.pdf

averne_ · 6 months ago
Do you have a link for that? I'm the guy working on the Vulkan ProRes decoder mentionned as "in review" in this changelog, as part of a GSoC project.

I'm curious wrt how a WebGPU implementation would differ from Vulkan. Here's mine if you're interested: https://github.com/averne/FFmpeg/tree/vk-proresdec

averne_ commented on I designed my own fast game streaming video codec – PyroWave   themaister.net/blog/2025/... · Posted by u/Bogdanp
Almondsetat · 6 months ago
VC-2 is an intra-only wavelet-based ultra low latency codec developed by the BBC years ago for exactly this purpose. It is royalty free and currently the only implementations are in ffmpeg and in the official BBC repository, and are CPU based. I am planning to make a CUDA accelerated version for my master thesis, since the Vulkan implementations made at GSoC last year still suck quite a bit. I would suggest people to look into this codec
averne_ · 6 months ago
Do you mind going in some detail as to why they suck? Not a dig, just genuinely curious.
averne_ commented on I designed my own fast game streaming video codec – PyroWave   themaister.net/blog/2025/... · Posted by u/Bogdanp
crazygringo · 6 months ago
Fascinating! But...

> The go-to solution here is GPU accelerated video compression

Isn't the solution usually hardware encoding?

> I think this is an order of magnitude faster than even dedicated hardware codecs on GPUs.

Is there an actual benchmark though?

I would have assumed that built-in hardware encoding would always be faster. Plus, I'd assume your game is already saturating your GPU, so the last thing you want to do is use it for simultaneous video encoding. But I'm not an expert in either of these, so curious to know if/how I'm wrong here? Like if hardware encoders are designed to be real-time, but intentionally trade off latency for higher compression? And is the proposed video encoding really is so lightweight it can easily share the GPU without affecting game performance?

averne_ · 6 months ago
Hardware GPU encoders refer to dedicated ASIC engines, separate from the main shader cores. So they run in parallel and there is no performance penalty for using both simultaneously, besides increased power consumption.

Generally, you're right that these hardware blocks favor latency. One example of this is motion estimation (one of the most expensive operations during encoding). The NVENC engine on NVidia GPUs will only use fairly basic detection loops, but can optionally be fed motion hints from an external source. I know that NVidia has a CUDA-based motion estimator (called CEA) for this purpose. On recent GPUs there is also the optical flow engine (another separate block) which might be able to do higher quality detection.

u/averne_

KarmaCake day140October 20, 2021View Original