This paper shows an emergent world model in an LLM that was taught to play otello moves https://ar5iv.labs.arxiv.org/html/2210.13382
https://arxiv.org/pdf/2303.12712.pdf This paper discusses (among other things) how a GPT4 model navigated between rooms in a text adventure game and was able to create a map afterward. Literally building a model of the world as it was navigating and drawing a map of that afterwards
I disagree.
What is the "right" things is use-case dependent.
For UI it's glyph bases, kinda, more precise some good enough abstraction over render width. For which glyphs are not always good enough but also the best you can get without adding a ton of complexity.
But for pretty much every other use-case you want storage byte size.
I mean in the UI you care about the length of a string because there is limited width to render a strings.
But everywhere else you care about it because of (memory) resource limitations and costs in various ways. Weather that is for bandwidth cost, storage cost, number of network packages, efficient index-ability, etc. etc. In rare cases being able to type it, but then it's often us-ascii only, too.
It has distinct .chars .codes and .bytes that you can specify depending on the use case. And if you try to use .length is complains asking you to use one of the other options to clarify your intent.