cze*_*sio 2 html c# regex tags find
我必须使用它的内容检索几个div部分(特定类名称"row"),并另外找到所有锚标记(链接URL)(类"下划线红色粗体").短篇小说:得到以下部分:
<div class = "row ">
... (divs, tags ...)
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
Run Code Online (Sandbox Code Playgroud)
和网址集
string[] urls = {"/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p"}
Run Code Online (Sandbox Code Playgroud)
整个页面看起来像这样:
<html>
Run Code Online (Sandbox Code Playgroud)
... 很多东西
<div class="row ">
<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f0607827.jpg">
</a>
</div>
<div class="desc">
<div class="l1">
<div class="icons">
</div>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
Culture And Gender <br>Intimate Relation</a>
</div>
<div class="fleft">
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">
<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>
<div class="omit"></div>
<div class="row ">
<div class="photo">
<a rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534899,p">
<img alt="alt msg" src="/b/s/b9/03/b9038292d147a582add07ee1f06078222.jpg">
</a>
</div>
<div class="desc">
<div class="l1">
<div class="icons">
</div>
<table cellspacing="0" cellpadding="0" border="0">
<tbody>
<tr>
<td>
<div class="fleft">
<a class="underline red bold" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod5653489225,p">
Culture And Gender <br>Intimate Relation</a>
</div>
<div class="fleft">
</div>
</td>
</tr>
</tbody>
</table>
</div>
<div class="l2">
<div>
</div>
<div>
<div class="but">
</div>
</div>
</div>
<div class="l3">
Long description
<a class="underlinepix_red no_wrap" rel="nofollow" href="/searchClickThru?pid=prod56534895&q=&rpos=109181&rpp=10&_dyncharset=UTF-8&sort=&url=/culture-and-gender-intimate-relation-ksiazka,prod56534895,p">
more<img alt="" src="/b/img/arr_red_sm.gif">
</a>
</div>
</div>
</div>
Run Code Online (Sandbox Code Playgroud)
有人可以帮我创建合适的reg ex吗?
Jen*_*ens 15
正则表达式不适合这种情况.
由于HTML的嵌套特性,执行所要求的正则表达式将非常(非常非常)长且复杂.请改用HTML Parser.