eoerl (u/eoerl) - Readit News

eoerl commented on Lossless LLM compression for efficient GPU inference via dynamic-length float arxiv.org/abs/2504.11651... · Posted by u/CharlesW

Animats · 4 months ago

Once this weight format war settles down, hardware can be built to support it. Presumably you want matrix multiply hardware optimized for whatever weight format turns out to be reasonably optimal.

eoerl · 4 months ago

Optimization is post hoc here : you have to train first to be able to huffman en ode, so it's not a pure format question

eoerl commented on Romanian court annuls result of presidential election first round bbc.com/news/articles/cn4... · Posted by u/vinni2

returningfory2 · 9 months ago

I appreciate the details, but ultimately I still don't buy it. The people who voted for this guy have agency and decided for themselves. Yes, they were likely influenced by a likely state-actor campaign. But they still have agency, they liked what they were being presented with, and made the final call themselves on who to vote for.

eoerl · 9 months ago

In a lot of countries there are rules, for instance limitations in terms of spending or similar time on air for all candidates. I don't know whether that's the case in Romania, but it is completely possible to rule an election out even if people voted "freely". I know that typically doesn't apply to the US, but there's a world outside of it

eoerl commented on Tenstorrent unveils Grayskull, its RISC-V answer to GPUs techradar.com/pro/firm-he... · Posted by u/Brajeshwar

paulmd · a year ago

H100 does have NVDEC and jpeg ingest accelerators

https://www.servethehome.com/wp-content/uploads/2023/10/NVID...

Tbh it’s mildly surprising they even removed NVENC considering the overall size of the chip (in the absolute we are only talking about low-single-digit mm2 savings) and then H100 is still advertised and has features targeting VM graphics/visualization still… remember they also still put a full graphics pipeline with ROPs/TMUs on the chip, just no actual display hardware.

eoerl · a year ago

These can also be used for machine learning actually (see Dali for data loading for instance)

eoerl commented on If you're interested in eye-tracking, I'm interested in funding you twitter.com/paulg/status/... · Posted by u/pg

anymouse123456 · 2 years ago

They're currently in the $1,000-$3,000 range.

eoerl · 2 years ago

We (The Eye Tribe folks) sold one at 99$ years ago. 1k-3k is mostly lack of competition I believe.

eoerl commented on Non-determinism in GPT-4 is caused by Sparse MoE 152334H.github.io/blog/no... · Posted by u/152334H

xyzzy_plugh · 2 years ago

> That said, I agree with n2d4 that it’s stupid to insult the authors. Talk is cheap and building is hard.

If your code offers an expectation of determinism then it's sloppy to not distinguish where there isn't determinism. There's nothing difficult about writing a comment to the effect of "this function is non-deterministic. For deterministic results, use X".

The code is sloppy if the developers didn't consider determinism and offer nothing to consumers, or if the consumers writing software cannot know where non-determinism is introduced.

If that's somehow insulting then I'd say someone has very thin skin.

eoerl · 2 years ago

There are flags[1] for that indeed. It feels like half of the people commenting here don't know all that much about the topic they're commenting upon

1: https://pytorch.org/docs/stable/generated/torch.use_determin...

eoerl commented on Google doesn’t want employees working remotely anymore theverge.com/2023/6/7/237... · Posted by u/dlb007

thatsagreatcomm · 2 years ago

IMO you lose so much more by sacrificing spontaneous conversation & ideation that results. You also lose the ease of just walking over to someone to ask a question. You also lose an unbelievable amount for anyone who lacks experience - training is AWFUL remote. Not even close.

It's not perfect but a group of aligned people in the same physical working space will just dominate a similar group spread apart that has to use chats & zoom to communicate. Management has got to be seeing this, in various forms, across multiple business segments.

eoerl · 2 years ago

> It's not perfect but a group of aligned people in the same physical working space will just dominate a similar group spread apart that has to use chats & zoom to communicate. Management has got to be seeing this, in various forms, across multiple business segments.

There's no data on this, at the very least you could mention that it's only your personal impression ?

IMO (and this is clearly a personal take) there are two competing effects: - higher bandwidth and easier to align face to face - more distractions, interruptions, more complicated to get things done

If you're in a business or position where you have no IP or nothing hard to do per say, you'll see the first one dominate. If you're somewhere with IP and competitive advantages through smarts then I'd say (personal again) the second effect can come to dominate.

Google pulling a "no remote" move means to me that their competitive advantage in terms of engineering and smarts is not a priority + using the fact that the market swung back towards employers vs. employees. But not general comment about "this take is obviously so much better", this is just intellectual lazyness I believe

eoerl commented on Stable Diffusion 2.0 stability.ai/blog/stable-... · Posted by u/hardmaru

corysama · 3 years ago

It’s my understanding that, amazingly enough, blending the models is done by literally performing a trivial linear blend of the raw numbers in the model files.

Someone even figured out they could get great compression of specialized model files by first subtracting the base model from the specialized model (using plain arithmetic) before zipping it. Of course, you need the same base file handy when you go to reverse the process.

eoerl · 3 years ago

It is not typically possible to blend models like that, since the training process is (lateral) order insensitive, as far as the model goes.

eoerl commented on Fast-stable-diffusion colabs, +25% speed increase and memory efficient github.com/TheLastBen/fas... · Posted by u/DLeychIC

tgtweak · 3 years ago

any differences in output or they are identical with the same prompt and seed?

eoerl · 3 years ago

identical outputs, up to float computation shenanigans (not computed in the same order, strictly speaking)

eoerl commented on Fast-stable-diffusion colabs, +25% speed increase and memory efficient github.com/TheLastBen/fas... · Posted by u/DLeychIC

bloaf · 3 years ago

These guys claim a 50% speedup with a similar approach:

https://www.reddit.com/r/StableDiffusion/comments/xmr3ic/spe...

eoerl · 3 years ago

yep, same approach but it arrived 3 days later and there's no mention of the [original PR](https://github.com/huggingface/diffusers/pull/532#issuecomme...), nice. Else the kernels used in that case -upstream flash attention- are not compatible with all nvidia GPU generations, FYI (xformers' cover a wider range and are generally faster or just pull Flash')

eoerl commented on Fast-stable-diffusion colabs, +25% speed increase and memory efficient github.com/TheLastBen/fas... · Posted by u/DLeychIC

whywhywhywhy · 3 years ago

Would be much better if these changes were a branch on the starting repo rather than entire refactors with many changes.

eoerl · 3 years ago

did you even peek at the link ? There's a PR on diffusers, and it's mentioned on the front page https://github.com/huggingface/diffusers/pull/532#issuecomme...