Readit News logoReadit News
atgctg commented on Transformers Without Normalization   jiachenzhu.github.io/DyT/... · Posted by u/hellollm
rryan · 9 months ago
RMSNorm is pretty insigificant in terms of the overall compute in a transformer though -- usually the reduction work can be fused with earlier or later operations.
atgctg · 9 months ago
The paper's Table 7 shows DyT reducing overall LLaMA 7B inference time by 7.8% and training time by 8.2%. That is not insignificant.
atgctg commented on An Interview with Daniel Gross and Nat Friedman About Models, Margins, and Moats   stratechery.com/2025/an-i... · Posted by u/feross
shishy · a year ago
Bummer this one is for paid subs only :(
atgctg · a year ago
You can get a free trial right now to Stratechery Plus through Asianometry:

https://stratechery.passport.online/member/plan/4ycW4SE71Cy6...

Source: https://substack.com/home/post/p-154928959

atgctg commented on Prompt Caching   docs.anthropic.com/en/doc... · Posted by u/fallinditch
bastawhiz · a year ago
That pricing is ridiculous. A token is essentially a 32 bit integer. Four bytes. A million tokens is 4MB. Imagine paying $1/hr for less than the storage of three floppies. That's two million times more expensive than the storage cost of standard S3 (720 hours×256M tokens (1gb)×$1 vs $0.09). Or 2000 times more expensive than the storage cost of Elasticache serverless.

(Yes, I realize it's probably more than 4MB, but it's still an outrageously high markup. They could do their own caching, not tell you they're doing it, and keep the difference and make even more money)

atgctg · a year ago
You have to store the KV cache, not the tokens. For Gemma 27B (probably slightly larger than Flash), this would be:

  Size of KV cache = 2 * (num_layers) * (num_kv_heads * dim_head) * seq_length * precision

  8-bit Gemma 27B KV cache = 2 * (46) * (16 * 144) * 1e6 * 1 byte ≈ 200 GB
Note that this doesn't take further optimizations into account that Google might be using.

Formula: https://developer.nvidia.com/blog/mastering-llm-techniques-i...

Gemma 27B config: https://huggingface.co/google/gemma-2-27b/blob/main/config.j...

atgctg commented on GitHub Profile with a Custom Background   github.com/cloud11665... · Posted by u/flexagoon
atgctg · 2 years ago
Works using math CSS injection [1]:

    ```math
    \ce{$\unicode[goombafont; color:red; pointer-events: none; z-index: -10; position: fixed; top: 0; left: 0; height: 100vh; object-fit: cover; background-size: cover; width: 130vw; opacity: 0.5; background: url('https://github.com/cloud11665/cloud11665/assets/59028866/3b916a93-1632-49cd-bf65-14e666cd81c8');]{x0000}$}
[1]: https://raw.githubusercontent.com/cloud11665/cloud11665/mast...

atgctg commented on GPT-4o   openai.com/index/hello-gp... · Posted by u/Lealen
atgctg · 2 years ago
Tiktoken added support for GPT-4o: https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...

It has an increased vocab size of 200k.

u/atgctg

KarmaCake day1379June 19, 2022View Original