我有以下HTML标记,
<div id="subcontent_l">
<p>
<a href="/membership-packages/"><img height="202" width="644" alt="" src="http://74.52.72.231/wp-content/uploads/2010/06/banner1.jpg" title="banner1" class="aligncenter size-full wp-image-299">
</a>
</p>
<p class="subc">Access to Guaranteed Healthcare Benefits</p>
<p><a href="http://74.52.72.231/join-now"><img height="37" width="166" alt="" src="http://74.52.72.231/wp-content/uploads/2010/09/jn2.jpg" title="jn" class="alignleft size-full wp-image-229"></a></p>
</div>
Run Code Online (Sandbox Code Playgroud)
现在在上面的标记我想找到那个有src = jn2.jpg后跟图像的锚点找到这个后我的标记应该是这样的
期望的结果将是: -
<a href="http://74.52.72.231/join-now"><img height="37" width="166" alt="" src="http://74.52.72.231/wp-content/uploads/2010/09/jn2.jpg" title="jn" class="alignleft size-full wp-image-229"></a>
Run Code Online (Sandbox Code Playgroud)
我想用正则表达式做这个,我有一个正则表达式,里面找到所有的图像标签.我的表达是/[^<]*<a.*href[\s]*=[\s]*("[^"]*").*[\s]*<img.*\/a>$
但不能找到我想要的相同.请帮我.
正则表达式不适合这项工作.HTML不是常规语言.而是使用HTML解析器.每个自尊的编程语言都提供HTML解析工具和/或库.我不知道你正在使用什么编程语言,但如果你熟悉Java,我会推荐Jsoup.这是一个做你想要的例子:
String html = "<div id=\"subcontent_l\">"
+ "<p>"
+ "<a href=\"/membership-packages/\"><img height=\"202\" width=\"644\" alt=\"\" src=\"http://74.52.72.231/wp-content/uploads/2010/06/banner1.jpg\" title=\"banner1\" class=\"aligncenter size-full wp-image-299\">"
+ "</a>"
+ "</p>"
+ "<p class=\"subc\">Access to Guaranteed Healthcare Benefits</p>"
+ "<p><a href=\"http://74.52.72.231/join-now\"><img height=\"37\" width=\"166\" alt=\"\" src=\"http://74.52.72.231/wp-content/uploads/2010/09/jn2.jpg\" title=\"jn\" class=\"alignleft size-full wp-image-229\"></a></p>"
+ "</div>";
Document document = Jsoup.parse(html);
Element link = document.select("img[src$=jn2.jpg]").first().parent();
System.out.println(link.outerHtml()); // Prints the desired result.
Run Code Online (Sandbox Code Playgroud)
Jsoup使用类似jQuery的CSS选择器来选择感兴趣的元素.对于C#/ .NET,有一个Jsoup端口:NSoup.PHP也有类似的库:phpQuery.