Readit News logoReadit News
eoerl commented on Lossless LLM compression for efficient GPU inference via dynamic-length float   arxiv.org/abs/2504.11651... · Posted by u/CharlesW
Animats · 4 months ago
Once this weight format war settles down, hardware can be built to support it. Presumably you want matrix multiply hardware optimized for whatever weight format turns out to be reasonably optimal.
eoerl · 4 months ago
Optimization is post hoc here : you have to train first to be able to huffman en ode, so it's not a pure format question
eoerl commented on Romanian court annuls result of presidential election first round   bbc.com/news/articles/cn4... · Posted by u/vinni2
returningfory2 · 9 months ago
I appreciate the details, but ultimately I still don't buy it. The people who voted for this guy have agency and decided for themselves. Yes, they were likely influenced by a likely state-actor campaign. But they still have agency, they liked what they were being presented with, and made the final call themselves on who to vote for.
eoerl · 9 months ago
In a lot of countries there are rules, for instance limitations in terms of spending or similar time on air for all candidates. I don't know whether that's the case in Romania, but it is completely possible to rule an election out even if people voted "freely". I know that typically doesn't apply to the US, but there's a world outside of it
eoerl commented on Tenstorrent unveils Grayskull, its RISC-V answer to GPUs   techradar.com/pro/firm-he... · Posted by u/Brajeshwar
paulmd · a year ago
H100 does have NVDEC and jpeg ingest accelerators

https://www.servethehome.com/wp-content/uploads/2023/10/NVID...

Tbh it’s mildly surprising they even removed NVENC considering the overall size of the chip (in the absolute we are only talking about low-single-digit mm2 savings) and then H100 is still advertised and has features targeting VM graphics/visualization still… remember they also still put a full graphics pipeline with ROPs/TMUs on the chip, just no actual display hardware.

eoerl · a year ago
These can also be used for machine learning actually (see Dali for data loading for instance)
eoerl commented on If you're interested in eye-tracking, I'm interested in funding you   twitter.com/paulg/status/... · Posted by u/pg
anymouse123456 · 2 years ago
They're currently in the $1,000-$3,000 range.
eoerl · 2 years ago
We (The Eye Tribe folks) sold one at 99$ years ago. 1k-3k is mostly lack of competition I believe.
eoerl commented on Non-determinism in GPT-4 is caused by Sparse MoE   152334H.github.io/blog/no... · Posted by u/152334H
xyzzy_plugh · 2 years ago
> That said, I agree with n2d4 that it’s stupid to insult the authors. Talk is cheap and building is hard.

If your code offers an expectation of determinism then it's sloppy to not distinguish where there isn't determinism. There's nothing difficult about writing a comment to the effect of "this function is non-deterministic. For deterministic results, use X".

The code is sloppy if the developers didn't consider determinism and offer nothing to consumers, or if the consumers writing software cannot know where non-determinism is introduced.

If that's somehow insulting then I'd say someone has very thin skin.

eoerl · 2 years ago
There are flags[1] for that indeed. It feels like half of the people commenting here don't know all that much about the topic they're commenting upon

1: https://pytorch.org/docs/stable/generated/torch.use_determin...

eoerl commented on Google doesn’t want employees working remotely anymore   theverge.com/2023/6/7/237... · Posted by u/dlb007
thatsagreatcomm · 2 years ago
IMO you lose so much more by sacrificing spontaneous conversation & ideation that results. You also lose the ease of just walking over to someone to ask a question. You also lose an unbelievable amount for anyone who lacks experience - training is AWFUL remote. Not even close.

It's not perfect but a group of aligned people in the same physical working space will just dominate a similar group spread apart that has to use chats & zoom to communicate. Management has got to be seeing this, in various forms, across multiple business segments.

eoerl · 2 years ago
> It's not perfect but a group of aligned people in the same physical working space will just dominate a similar group spread apart that has to use chats & zoom to communicate. Management has got to be seeing this, in various forms, across multiple business segments.

There's no data on this, at the very least you could mention that it's only your personal impression ?

IMO (and this is clearly a personal take) there are two competing effects: - higher bandwidth and easier to align face to face - more distractions, interruptions, more complicated to get things done

If you're in a business or position where you have no IP or nothing hard to do per say, you'll see the first one dominate. If you're somewhere with IP and competitive advantages through smarts then I'd say (personal again) the second effect can come to dominate.

Google pulling a "no remote" move means to me that their competitive advantage in terms of engineering and smarts is not a priority + using the fact that the market swung back towards employers vs. employees. But not general comment about "this take is obviously so much better", this is just intellectual lazyness I believe

eoerl commented on Stable Diffusion 2.0   stability.ai/blog/stable-... · Posted by u/hardmaru
corysama · 3 years ago
It’s my understanding that, amazingly enough, blending the models is done by literally performing a trivial linear blend of the raw numbers in the model files.

Someone even figured out they could get great compression of specialized model files by first subtracting the base model from the specialized model (using plain arithmetic) before zipping it. Of course, you need the same base file handy when you go to reverse the process.

eoerl · 3 years ago
It is not typically possible to blend models like that, since the training process is (lateral) order insensitive, as far as the model goes.
eoerl commented on Fast-stable-diffusion colabs, +25% speed increase and memory efficient   github.com/TheLastBen/fas... · Posted by u/DLeychIC
tgtweak · 3 years ago
any differences in output or they are identical with the same prompt and seed?
eoerl · 3 years ago
identical outputs, up to float computation shenanigans (not computed in the same order, strictly speaking)
eoerl commented on Fast-stable-diffusion colabs, +25% speed increase and memory efficient   github.com/TheLastBen/fas... · Posted by u/DLeychIC
bloaf · 3 years ago
These guys claim a 50% speedup with a similar approach:

https://www.reddit.com/r/StableDiffusion/comments/xmr3ic/spe...

eoerl · 3 years ago
yep, same approach but it arrived 3 days later and there's no mention of the [original PR](https://github.com/huggingface/diffusers/pull/532#issuecomme...), nice. Else the kernels used in that case -upstream flash attention- are not compatible with all nvidia GPU generations, FYI (xformers' cover a wider range and are generally faster or just pull Flash')
eoerl commented on Fast-stable-diffusion colabs, +25% speed increase and memory efficient   github.com/TheLastBen/fas... · Posted by u/DLeychIC
whywhywhywhy · 3 years ago
Would be much better if these changes were a branch on the starting repo rather than entire refactors with many changes.
eoerl · 3 years ago
did you even peek at the link ? There's a PR on diffusers, and it's mentioned on the front page https://github.com/huggingface/diffusers/pull/532#issuecomme...

u/eoerl

KarmaCake day37March 3, 2020View Original