Everything shown to me so far has been a solvable problem by scripts/xpath template/creation logic. I've handled all of this for over 10 years with one script. When I see it finding everything and associating them with correct external labels, then they have something. Otherwise I am concluding it non-functional and a long since solved problem where ML is over-engineering.
"Keep in mind that Tarsier tags different types of elements differently to help your LLM identify what actions are performable on each element. Specifically:
[#ID]: text-insertable fields (e.g. textarea, input with textual type)
[@ID]: hyperlinks (<a> tags)
[$ID]: other interactable elements (e.g. button, select)
[ID]: plain text (if you pass tag_text_elements=True)"
Do you see the search boxes labeled [#4] and [#5] at the top? And before you say that the tag is on a different line from the placeholder text—yes, and our agent is smart enough to handle that minor idiosyncrasy. Are you shocked? :)
Edit: I do not intend to come off as negative or disparaging - I already discussed this with some OS projects I work on as well as internally at work. You guys did something great, and I am just trying to point out gaps that could take it from great to unbelievable.