Regexes are indeed a perfectly fine answer *when you have the guarantee no corne...

goto11 · on Jan 31, 2019

The second line is not a corner case, that is simply not legal XHTML. You cannot have an unescaped < in an attribute value. You will need to take comments (and DTD's and CData) into consideration of course, but you can do that in a regex.

In any case, how would you use xpath or CSS to identity self-closing tags? They operate on the parsed tree, not on the token level, and the question is about identifying specific tokens.