It’s worth trying.
I personally don’t have the tools to do this and I don’t have enough of a budget to do this as an experiment. Where would you go to find the right people to evaluate this concept? I’ve looked around stack overflow, Reddit, etc but can’t seem to find anyone talking about this.
https://arxiv.org/abs/2210.03945
https://arxiv.org/abs/2202.00217
https://arxiv.org/abs/2201.10608
My take is that fine-tuning a language model to parse HTML shouldn't be terribly difficult but you probably do need a computer with a good GPU. The one problem that current LLMs have is that they all have a limited attention window. BERT has a 512 token window and ChatGPT has a 4096 token window, where typically a token is less than a word. There are models with much longer windows (reformer) but those don't work as well as the state-of-the-art models, at least not yet.
Practically that means you can't feed a huge HTML document into the model without splitting it up first, if you do split it up you're going to lose the ability of the model to see the document as a whole (for instance match up the <div> and </div> tags)