atgctg (u/atgctg) - Readit News

atgctg commented on Transformers Without Normalization jiachenzhu.github.io/DyT/... · Posted by u/hellollm

rryan · 9 months ago

RMSNorm is pretty insigificant in terms of the overall compute in a transformer though -- usually the reduction work can be fused with earlier or later operations.

atgctg · 9 months ago

The paper's Table 7 shows DyT reducing overall LLaMA 7B inference time by 7.8% and training time by 8.2%. That is not insignificant.

atgctg commented on An Interview with Daniel Gross and Nat Friedman About Models, Margins, and Moats stratechery.com/2025/an-i... · Posted by u/feross

shishy · a year ago

Bummer this one is for paid subs only :(

atgctg · a year ago

You can get a free trial right now to Stratechery Plus through Asianometry:

https://stratechery.passport.online/member/plan/4ycW4SE71Cy6...

Source: https://substack.com/home/post/p-154928959

atgctg commented on Prompt Caching docs.anthropic.com/en/doc... · Posted by u/fallinditch

bastawhiz · a year ago

That pricing is ridiculous. A token is essentially a 32 bit integer. Four bytes. A million tokens is 4MB. Imagine paying $1/hr for less than the storage of three floppies. That's two million times more expensive than the storage cost of standard S3 (720 hours×256M tokens (1gb)×$1 vs $0.09). Or 2000 times more expensive than the storage cost of Elasticache serverless.

(Yes, I realize it's probably more than 4MB, but it's still an outrageously high markup. They could do their own caching, not tell you they're doing it, and keep the difference and make even more money)

atgctg · a year ago

You have to store the KV cache, not the tokens. For Gemma 27B (probably slightly larger than Flash), this would be:

  Size of KV cache = 2 * (num_layers) * (num_kv_heads * dim_head) * seq_length * precision

  8-bit Gemma 27B KV cache = 2 * (46) * (16 * 144) * 1e6 * 1 byte ≈ 200 GB

Note that this doesn't take further optimizations into account that Google might be using.

Formula: https://developer.nvidia.com/blog/mastering-llm-techniques-i...

Gemma 27B config: https://huggingface.co/google/gemma-2-27b/blob/main/config.j...

atgctg commented on GitHub Profile with a Custom Background github.com/cloud11665... · Posted by u/flexagoon

atgctg · 2 years ago

Works using math CSS injection [1]:

    ```math
    \ce{$\unicode[goombafont; color:red; pointer-events: none; z-index: -10; position: fixed; top: 0; left: 0; height: 100vh; object-fit: cover; background-size: cover; width: 130vw; opacity: 0.5; background: url('https://github.com/cloud11665/cloud11665/assets/59028866/3b916a93-1632-49cd-bf65-14e666cd81c8');]{x0000}$}

[1]: https://raw.githubusercontent.com/cloud11665/cloud11665/mast...

atgctg commented on GPT-4o openai.com/index/hello-gp... · Posted by u/Lealen

atgctg · 2 years ago

Tiktoken added support for GPT-4o: https://github.com/openai/tiktoken/commit/9d01e5670ff50eb74c...

It has an increased vocab size of 200k.