Rather than inferring from how you imagine the architecture working, you can look at examples and counterexamples to see what capabilities they have.
One misconception is that predicting the next word means there is no internal idea on the word after next. The simple disproof of this is that models put 'an' instead of 'a' ahead of words beginning with vowels. It would be quite easy to detect (and exploit) behaviour that decided to use a vowel word just because it somewhat arbitrarily used an 'an'.
Models predict the next word, but they don't just predict the next word. They generate a great deal of internal information in service of that goal. Placing limits on their abilities by assuming the output they express is the sum total of what they have done is a mistake. The output probability is not what it thinks, it is a reduction of what it thinks.
One of Andrej Karpathy's recent videos talked about how researchers showed that models do have an internal sense of not knowing the answer, but fine tuning on question answering I'd not give them the ability to express that knowledge. Finding information the model did and didn't know then fine tuning to say I don't know for cases where it had no information allowed the model to generalise and express "I don't know"
It's the only chip manufacturer "left" in the US. The argument is national security: the US expects China to invade Taiwan and this will kill TSMC in the process.
Whether this will happen or not can be debated, but this is what the government expects.