tmalsburg2 (u/tmalsburg2)

tmalsburg2 commented on <template>: The Content Template element developer.mozilla.org/en-... · Posted by u/palmfacehn

tmalsburg2 · 4 hours ago

What problem is this trying to solve and does it actually succeed at solving it? I‘m struggling to see the appeal given that the JS still needs to model the internal structure of the template in order to fill the slots.

Posted by u/tmalsburg2 24 days ago

Hackers Found Backdoor in High-Security Safes–Can Open Them in Seconds wired.com/story/securam-p...

Deleted Comment

tmalsburg2 commented on Linda Yaccarino is leaving X nytimes.com/2025/07/09/te... · Posted by u/donohoe

amelius · 2 months ago

Winner takes all.

tmalsburg2 · 2 months ago

Also inflation.

tmalsburg2 commented on Bamba: An open-source LLM that crosses a transformer with an SSM research.ibm.com/blog/bam... · Posted by u/shallow-mind

og_kalu · 4 months ago

You are mentioning avenues that are largely for entertainment. Sure you might not go back to re-attend for those. If you will be tested or are doing research, are you really looking at a large source once ?

tmalsburg2 · 4 months ago

It’s do easy to come up with serious non-entertainment examples, I‘m sure you don’t need my help finding them.

tmalsburg2 commented on Bamba: An open-source LLM that crosses a transformer with an SSM research.ibm.com/blog/bam... · Posted by u/shallow-mind

og_kalu · 4 months ago

You can't always know what will be "relevant info" in the future. Even humans can't do this but whenever that's an issue, we just go back and re-read, re-watch etc.

None of these modern recurrent architecture have a way to do this.

tmalsburg2 · 4 months ago

How often do you go back an rewatch earlier parts of a movie? I hardly ever do this. In the cinema, theater, or when listening to the radio it’s simply impossible and it still works.

tmalsburg2 commented on Bamba: An open-source LLM that crosses a transformer with an SSM research.ibm.com/blog/bam... · Posted by u/shallow-mind

anentropic · 4 months ago

> they added another trillion tokens and shrank the model from 18 GB to 9 GB through quantization, reducing its bit width from Mamba2’s 16-bit floating-point precision to 8-bits.

This sounds like what they call "Bamba-9B" is actually an 18B model quantised to 8 bits.

I thought generally we were naming models "nB" by their number of params and treating quantisation as a separate concern. Are there any other models that instead treat the name as an indicative memory requirement?

Is this an attempt to hide that it fares poorly vs other ~18B parameter models?

EDIT: no, I just misunderstood

tmalsburg2 · 4 months ago

Yeah, that's confusing, but the HuggingFace page says it has 9.78 B parameters.

https://huggingface.co/ibm-ai-platform/Bamba-9B-fp8

tmalsburg2 commented on Bamba: An open-source LLM that crosses a transformer with an SSM research.ibm.com/blog/bam... · Posted by u/shallow-mind

quantadev · 4 months ago

Not be contrarian, but if the next word prediction happens to be someone's name or a place or something discussed multiple places in the book then often, yes, a knowledge of the full plot of the book is "required" just to predict the next word, as you get to the middle or end of a book.

For example you could never fill in the last chapter of any good book without having knowledge of every previous chapter. Not highly detailed knowledge, but still knowledge.

tmalsburg2 · 4 months ago

Isn't this exactly the point of this model? No need to memorize everything (which makes transfomers expensive), just keep the relevant info. SSM are essentially recurrent models.

tmalsburg2 commented on What Is Entropy? jasonfantl.com/posts/What... · Posted by u/jfantl

tshaddox · 5 months ago

According to my perhaps naive interpretation of that, the "degree of surprise" would depend on at least three things:

1. the laws of nature (i.e. how accurately do the laws of physics permit measuring the system and how determined are future states based on current states)

2. one's present understanding of the laws of nature

3. one's ability to measure the state of a system accurately and compute the predictions in practice

It strikes me as odd to include 2 and 3 in a definition of "entropy."

tmalsburg2 · 5 months ago

OP is talking about information entropy. Nature isn't relevant there.

tmalsburg2 commented on 23andMe files for bankruptcy to sell itself reuters.com/business/heal... · Posted by u/healsdata

shreezus · 5 months ago

Even if you haven't personally used their service, if any close relatives have, they already have a sizable amount of information on your genome. They maintain the equivalent of "shadow profiles" (https://en.wikipedia.org/wiki/Shadow_profile) as part of their data model for "ancestry" modeling purposes - for example inferring a paternal haplogroup based on data uploaded by genetic relatives.

I can only hope at the end of the day their data doesn't end up in the wrong hands. It is their most valuable asset, and this is a way bigger deal than it seems.

tmalsburg2 · 5 months ago

What would even be the right hands in this case? Seems like almost any hands would be the wrong hands.

u/tmalsburg2

KarmaCake day2480February 27, 2014View Original