I found it extremely helpful. The sheer emphasis of the reply made me very curio...

unlinkr · on Sept 9, 2019

The problem with the answer is it is wrong. The question is about identifying start-tags in XHTML. This is a question of tokenization and can be solved with a regular expression. Indeed, most parsers use regular expressions for the tokenization stage. It is exactly the right tool for the job!

Furthermore, the asker specifically needs to distinguish between start tags and self-closing start tags. This is a token-level difference which is typically not exposed by XHTML parsers. So saying "use a parser" is less than helpful.

I have elaborated a bit in blog post: https://www.cargocultcode.com/solving-the-zalgo-regex/