我今天已经介绍过xpath,它似乎非常强大,但经过相当多的搜索后,我还没有找到如何在使用contains时检索兄弟姐妹(通过follow-sibling和previous-sibling):
text = """
<html>
<head>
<title>This tag includes 'some_text'</title>
<h2>A h2 tag</h2>
</head>
</html>
"""
import lxml.html
doc = lxml.html.fromstring(text)
a = doc.xpath("//*[contains(text(),'some_text')]/following-sibling::*")
Run Code Online (Sandbox Code Playgroud)
哪个产生[].当然,我期望的结果是得到h2标签.
但是,使用*[contains(text(),'name')]按预期方式检索title元素.以同样的方式,如果不使用跟随兄弟轴(我认为它是如何被称为),我使用//parent::*,也有效.
那么,我怎样才能让兄弟姐妹处于这种状态?
提前致谢.
你有趣的HTML样本.
import lxml
text = """
<html>
<body>
<span>This tag includes 'some_text'</span>
<h2>A h2 tag</h2>
</body>
</html>
"""
doc = lxml.etree.fromstring(text, parser=lxml.etree.HTMLParser())
doc.xpath("//*[contains(text(),'some_text')]/following-sibling::*")
# [<Element h2 at 102eee100>]
doc = lxml.html.fromstring(text)
doc.xpath("//*[contains(text(),'some_text')]/following-sibling::*")
# [<Element h2 at 102f6f188>]
Run Code Online (Sandbox Code Playgroud)
更新:
在这里,我不使用html解析器及其验证规则,并将输入视为随机xml:
text = """
<html>
<head>
<title>This tag includes 'some_text'</title>
<h2>A h2 tag</h2>
</head>
</html>
"""
doc = lxml.etree.fromstring(text)
doc.xpath("//*[contains(text(),'some_text')]/following-sibling::*[1]")
# [<Element h2 at 102eeef70>]
Run Code Online (Sandbox Code Playgroud)