Step 1 is to find an optimal starting word. My most insightful finding so far has came from trying to define a cost function for comparing potential starting words. It turns out that "% of potential guesses [not] eliminated" is an excellent loss function.
Importantly, the full set of all Acceptable Guesses is knowable. For any given (guess, answer) pair, the game provides feedback about each character (or digit). There are basically three pieces of potential feedback: (N)ot Used, (U)sed Elsewhere, (E)xact Match. For example, a single guess might produce the feedback "NUNNE". Each of these will eliminate some subset of Acceptable Guesses which means you can attribute a fixed value between 0.0 and 1.0 to any piece of feedback, and multiply those together to get the loss score of a given guess. Average that across all Potential Answers to get a cost score.
Not sure what my goal of this post is. Guess I just wanted to share that these games are as much fun to analyze as they are to play (if not more).
In particular, max entropy heuristic + beam search will derive the optimal (now proven) strategy in terms of minimizing expected number of guesses