正则表达式 - 我只想匹配正则表达式中的开始标记

Question

我正在制作一个正则表达式,我只想匹配错误的标签,如: <p> *some text here, some other tags may be here as well but no ending 'p' tag* </p>

 <P>Affectionately Inscribed </P><P>TO </P><P>HENRY BULLAR, </P><P>(of the western circuit)<P>PREFACE</P>

在上面相同的文本中我想得到结果, <P>(of the western circuit)<P>并且不应该捕获任何其他内容.我正在使用它,但它不起作用:

<P>[^\(</P>\)]*<P>

请帮忙.

Answer 1

对于xml/html类型数据,正则表达式并不总是一个好的选择.特别是,属性,区分大小写,注释等都会产生很大的影响.

对于xhtml,我使用XmlDocument/ XDocument和xpath查询.

对于"非x"html,我会看一下HTML Agility Pack和相同的内容.