Are there any good resources for understanding models like this? Specifically a "protein language model". I have a basic grasp on how LLMs tokenize and encode natural language, but what does a protein language actually look like? An LLM can produce results that look correct but are actually incorrect, how are proteins produced by this model validated? Are the outputs run through some other software to determine whether the proteins are valid?
I recently saw this about AlphaFold: https://elanapearl.github.io/blog/2024/the-illustrated-alpha....
I don't think it's going to answer all your question but it might still help!