The problem with the answer is it is wrong. The question is about identifying start-tags in XHTML. This is a question of tokenization and can be solved with a regular expression. Indeed, most parsers use regular expressions for the tokenization stage. It is exactly the right tool for the job!
Furthermore, the asker specifically needs to distinguish between start tags and self-closing start tags. This is a token-level difference which is typically not exposed by XHTML parsers. So saying "use a parser" is less than helpful.
- I will never forget that regex can't parse XHTML
- the reason being, regex is insufficiently powerful
- when I first saw this post, I knew little about regex under the hood, this sent me down a wiki hole of FSMs, pushdown automata and turing machines
- this misconception is apparently common enough to be madness-inducing to those that know better
- use a hecking xml parser instead
It almost reminds me of a Bill Nye sketch. Teaching through a bit of non-sequitur and absurdism.