Readit News logoReadit News
w-m commented on What the heck is going on at Apple?   cnn.com/2025/12/06/tech/a... · Posted by u/methuselah_in
w-m · 9 days ago
Apple acquires OpenAI, Sam becomes CEO of combined company; iPhone revenue used to build out data centers; Jony rehired as design chief for AI device.
w-m commented on Mixpanel Security Breach   mixpanel.com/blog/sms-sec... · Posted by u/jaredwiener
ares623 · 19 days ago
Does that mean Mixpanel stock/valuation goes up because OpenAI uses them? That's how it works now is it?
w-m · 19 days ago
> FAQ

> Has Mixpanel been removed from OpenAI products?

> Yes.

https://openai.com/index/mixpanel-incident/

w-m commented on CUDA Ontology   jamesakl.com/posts/cuda-o... · Posted by u/gugagore
w-m · a month ago
This is a good resource. But for the computer vision and machine learning practitioner most of the fun can start where this article ends.

nvcc from the CUDA toolkit has a compatibility range with the underlying host compilers like gcc. If you install a newer CUDA toolkit on an older machine, likely you'll need to upgrade your compiler toolchain as well, and fix the paths.

While orchestration in many (research) projects happens from Python, some depend on building CUDA extensions. An innocently looking Python project may not ship the compiled kernels and may require a CUDA toolkit to work correctly. Some package management solutions provide the ability to install CUDA toolkits (conda/mamba, pixi), the pure-Python ones do not (pip, uv). This leaves you to match the correct CUDA toolkit to your Python environment for a project. conda specifically provides different channels (default/nvidia/pytorch/conda-forge), from conda 4.6 defaulting to a strict channel priority, meaning "if a name exists in a higher-priority channel, lower ones aren't considered". The default strict priority can make your requirements unsatisfiable, even though there would be a version of each required package in the collection of channels. uv is neat and fast and awesome, but leaves you alone in dealing with the CUDA toolkit.

Also, code that compiles with older CUDA toolkit versions may not compile with newer CUDA toolkit versions. Newer hardware may require a CUDA toolkit version that is newer than what the project maintainer intended. PyTorch ships with a specific CUDA runtime version. If you have additional code in your project that also is using CUDA extensions, you need to match the CUDA runtime version of your installed PyTorch for it to work. Trying to bring up a project from a couple of years ago to run on latest hardware may thus blow up on you on multiple fronts.

w-m commented on Cerebras Code now supports GLM 4.6 at 1000 tokens/sec   cerebras.ai/code... · Posted by u/nathabonfim59
w-m · a month ago
I wanted to try GLM 4.6 through their API with Cline, before spending the $50. But I'm getting hit with API limits. And now I'm noticing a red banner "GLM4.6 Temporarily Sold Out. Check back soon." at cloud.cerebras.ai. HN hug of death, or was this there before?
w-m commented on Gmail AI gets more intrusive   daveverse.org/2025/11/07/... · Posted by u/speckx
glerk · a month ago
Google must have some awful PMs and designers. The worst UX decision I have seen recently is AI auto-dubbing all youtube videos by default with no way to disable this behavior globally. How could you miss that people can be fluent in multiple languages and if I click on a video in a foreign language, I most likely want the original soundtrack? Clearly, the intention was to boost some metric “X users are using this feature” with no regard for the actual user.
w-m · a month ago
Go to your Google account settings; add the languages you speak and don’t want auto translations for in your personal profile.

I agree that the auto dubbing is the worst feature. It may have been HN where I read the above tip to turn that off, it seems to have worked for me so far.

w-m commented on iPad Pro with M5 chip   apple.com/newsroom/2025/1... · Posted by u/chasingbrains
sylens · 2 months ago
I have a 2018 iPad Pro that is due for replacement but I cannot bring myself to spend the money on a new iPad. No matter how much I think I'll use it, it becomes a web browsing and YouTube machine on the couch. It's a shame because I think the hardware design is quite good, but the OS itself is so limiting, even with the "improvements" iPadOS 26 introduced.
w-m · 2 months ago
The A10X processor in my 2017 iPad Pro has always felt ridiculously overpowered for a couch machine. Recently it had gotten sluggish, hot, hung for times and lost battery quite quickly and I thought its time had finally come.. but no, after resetting the OS, it's as fast as ever. So hopefully it'll last me til Apple finally gives the iPad Air a 120Hz display.
w-m commented on GPU Hot: Dashboard for monitoring NVIDIA GPUs on remote servers   github.com/psalias2006/gp... · Posted by u/github-trending
John23832 · 2 months ago
The "why not use" section should probably include nvtop?
w-m · 2 months ago
Possibly also nvitop, which is a different tool from nvtop: https://github.com/XuehaiPan/nvitop
w-m commented on OpenAI charges by the minute, so speed up your audio   george.mand.is/2025/06/op... · Posted by u/georgemandis
squigz · 6 months ago
Out of curiosity, how might you improve those docs? They seem fairly reasonable to me
w-m · 6 months ago
The documentation reads like it was written by a programmer who documented the different parameters to their implementation of a specific algorithm. Now when you as the user come along and want to use silenceremove, you'll have to carefully read through this, and build your own mental model of that algorithm, and then you'll be able to set these parameters accordingly. That takes a lot of time and energy, in this case multiple read-throughs and I'd say > 5 minutes.

Good documentation should do this work for you. It should explain somewhat atomic concepts to you, that you can immediately adapt, and compose. Where it already works is for the "detection" and "window" parameters, which are straightforward. But the actions of trimming in the start/middle/end, and how to configure how long the silence lasts before trimming, whether to ignore short bursts of noise, whether to skip every nth silence period, these are all ideas and concepts that get mushed together in 10 parameters which are called start/stop-duration/threshold/silence/mode/periods.

If you want to apply this filter, it takes a long time to build mental models for these 10 parameters. You do have some example calls, which is great, but which doesn't help if you need to adjust any of these - then you probably need to understand them all.

Some stuff I stumbled over when reading it:

"To remove silence from the middle of a file, specify a stop_periods that is negative. This value is then treated as a positive value [...]" - what? Why is this parameter so heavily overloaded?

"start_duration: Specify the amount of time that non-silence must be detected before it stops trimming audio" - parameter is named start_something, but it's about stopping? Why?

"start_periods: [...] Normally, [...] start_periods will be 1 [...]. Default value is 0."

"start_mode: Specify mode of detection of silence end at start": start_mode end at start?

It's very clunky. Every parameter has multiple modes of operation. Why is it start and stop for beginning and end, and why is "do stuff in the middle" part of the end? Why is there no global mode?

You could nitpick this stuff to death. In the end, naming things is famously one of the two hard problems in computer science (the others being cache invalidation and off-by-one errors). And writing good documentation is also very, very hard work. Just exposing the internals of the algorithm is often not great UX, because then every user has to learn how the thing works internally before they can start using it (hey, looking at you, git).

So while it's easy to point out where these docs fail, it would be a lot of work to rewrite this documentation from the top down, explaining the concepts first. Or even rewriting the interface to make this more approachable, and the parameters less overloaded. But since it's hard work, and not sexy to programmers, it won't get done, and many people will come after, having to spend time on reading and re-reading this current mess.

w-m commented on OpenAI charges by the minute, so speed up your audio   george.mand.is/2025/06/op... · Posted by u/georgemandis
georgemandis · 6 months ago
Oooh fun! I had a feeling there was more ffmpeg wizardry I could be leaning into here. I'll have to try this later—thanks for the idea!
w-m · 6 months ago
In the meantime I realized that the apad part is nonsensical - it pads the end of the stream, not at each silence-removed cut. I wanted to get angry at o3 for proposing this, but then I had a look at the silenceremove= documentation myself: https://ffmpeg.org/ffmpeg-filters.html#silenceremove

Good god. You couldn't make that any more convoluted and hard-to-grasp if you wanted to. You gotta love ffmpeg!

I now think this might be a good solution:

    ffmpeg -i video-audio.m4a \
           -af "silenceremove=start_periods=1:stop_periods=-1:stop_duration=0.15:stop_threshold=-40dB:detection=rms" \
           -c:a aac -b:a 128k output.m4a -y

w-m commented on OpenAI charges by the minute, so speed up your audio   george.mand.is/2025/06/op... · Posted by u/georgemandis
w-m · 6 months ago
With transcribing a talk by Andrej, you already picked the most challenging case possible, speed-wise. His natural talking speed is already >=1.5x that of a normal human. One of the people you absolutely have to set your YouTube speed back down to 1x when listening to follow what's going on.

In the idea of making more of an OpenAI minute, don't send it any silence.

E.g.

    ffmpeg -i video-audio.m4a \
      -af "silenceremove=start_periods=1:start_duration=0:start_threshold=-50dB:\
                         stop_periods=-1:stop_duration=0.02:stop_threshold=-50dB,\
                         apad=pad_dur=0.02" \
      -c:a aac -b:a 128k output_minpause.m4a -y
will cut the talk down from 39m31s to 31m34s, by replacing any silence (with a -50dB threshold) longer than 20ms by a 20ms pause. And to keep with the spirit of your post, I measured only that the input file got shorter, I didn't look at all at the quality of the transcription by feeding it the shorter version.

u/w-m

KarmaCake day3004June 28, 2013
About
Hey, I'm Wieland (/viːland/), I'm a computer vision researcher. You can check out my current projects on GitHub: https://github.com/w-m

---

X-Maps: Using a tiny laser projector and an event camera to estimate depth at 60 Hz on a laptop CPU. The algorithm is using only the time stamps of when the laser passes over the scene, not its color or intensity, so you can project any content you like. Fun for AR demos!

https://fraunhoferhhi.github.io/X-maps/

---

Self-Organizing Gaussian Grids: 3D data is awkward to compress. But there's plenty of solutions for compressing 2D data (images!). So let's organize our 3D data into a 2D grid, where grid neighbors are also close in 3D. That's a hard problem, but we can leverage novel assignment algorithms with GPU power for parallelizing that, to get the sorting done in a few seconds.

https://fraunhoferhhi.github.io/Self-Organizing-Gaussians/

---

https://www.3dgs.zip/ resources on 3D Gaussian Splatting from our research group.

https://survey.3dgs.zip/ - a comparison of different compression methods for 3D Gaussian Splatting scenes.

View Original