Regexes are indeed a perfectly fine answer when you have the guarantee no corner cases will show up in the content, and I did and still do use regexes to quickly extract data form well-known HTML/XML as a quick hack (curl|grep). Otherwise you're much better served by using a parser and select nodes with xpath/css.
The question doesn't specify if the file to match against is unique/one-shot or if it's a general case. Without that info you can largely assume it has to handle any input. The regex will get unwieldy since you have to account for corner cases like:
The second line is not a corner case, that is simply not legal XHTML. You cannot have an unescaped < in an attribute value. You will need to take comments (and DTD's and CData) into consideration of course, but you can do that in a regex.
In any case, how would you use xpath or CSS to identity self-closing tags? They operate on the parsed tree, not on the token level, and the question is about identifying specific tokens.
The question doesn't specify if the file to match against is unique/one-shot or if it's a general case. Without that info you can largely assume it has to handle any input. The regex will get unwieldy since you have to account for corner cases like: