"Count length of string s" -> LLM -> correct syntax for string-count for any programming language. This is the perfect context-length for an LLM. But note that you don't "complete the line", you tell the LLM what you want to have done in full (very isolated) context, instead of having it guessing.
If you already know how to compute `len` on some arbitrary syntax soup then the difference is just a minor annoyance where you have to jump back in your editor, add a function call and some punctuation, and jump back to where you were to add some closing punctuation. It's so fast you'd never bother with an LLM, so despite real and meaningful differences existing the LLM discussion point isn't relevant.
If you don't know how to compute `len` on some arbitrary syntax soup, I don't see how crafting an ideal prompt in a "full (very isolated) context" is ever faster than tab-completing things which look like "count" or "len."
I've also found G_LIKELY and G_UNLIKELY in glib to be useful when writing some types of performance-critical code. Would be a fun experiment to compare the assembly when using it and not using it.
- Table overflow mitigation: multi-leveled tables, not wasting space on 100% predicted branches, etc
- Table eviction: Rolling counts are actually impossible without space consumption; do you have space wasted, periodic flushing, exponential moving averages, etc
- Table initialization: When do you start caring about a branch (and wasting table space), how conservative are the initial parameters, etc
- Table overflow: What do you do when a branch doesn't fit in the table but should
As a rule of thumb, no extra information/context is used for branch prediction. If a program over the course of a few thousand instructions hits a branch X% of the time, then X will be the branch prediction. If you have context you want to use to influence the prediction, you need to manifest that context as additional lines of assembly the predictor can use in its lookup table.
As another rule of thumb, if the hot path has more than a few thousand branches (on modern architectures, often just a few thousand <100% branches (you want the assembly to generate the jump-if-not-equal in the right direction for that architecture though, else you'll get a 100% misprediction rate instead)) then you'll hit slow paths -- multi-leveled search, mispredicted branches, etc.
It's reasonably interesting, and given that it's hardware it's definitely clever, but it's not _that_ clever from a software perspective. Is there anything in particular you're curious about?