如何使用lxml按文本查找元素？

Question

如何使用lxml按文本查找元素？

假设我们有以下html:

<html>
    <body>
        <a href="/1234.html">TEXT A</a>
        <a href="/3243.html">TEXT B</a>
        <a href="/7445.html">TEXT C</a>
    <body>
</html>

Run Code Online (Sandbox Code Playgroud)

如何让它找到包含"TEXT A"的元素"a"？

到目前为止我有:

root = lxml.hmtl.document_fromstring(the_html_above)
e = root.find('.//a')

Run Code Online (Sandbox Code Playgroud)

我试过了:

e = root.find('.//a[@text="TEXT A"]')

Run Code Online (Sandbox Code Playgroud)

但这不起作用,因为"a"标签没有属性"text".

有什么方法可以用与我尝试过的方式类似的方式来解决这个问题吗？

Answer 1

unu*_*tbu 40

你很近.使用text()=而不是@text(表示属性).

e = root.xpath('.//a[text()="TEXT A"]')

Run Code Online (Sandbox Code Playgroud)

或者,如果您只知道文本包含"TEXT A",

e = root.xpath('.//a[contains(text(),"TEXT A")]')

Run Code Online (Sandbox Code Playgroud)

或者,如果您只知道文本以"TEXT A"开头,

e = root.xpath('.//a[starts-with(text(),"TEXT A")]')

Run Code Online (Sandbox Code Playgroud)

有关可用字符串函数的更多信息,请参阅文档.

例如,

import lxml.html as LH

text = '''\
<html>
    <body>
        <a href="/1234.html">TEXT A</a>
        <a href="/3243.html">TEXT B</a>
        <a href="/7445.html">TEXT C</a>
    <body>
</html>'''

root = LH.fromstring(text)
e = root.xpath('.//a[text()="TEXT A"]')
print(e)

Run Code Online (Sandbox Code Playgroud)

产量

[<Element a at 0xb746d2cc>]

Run Code Online (Sandbox Code Playgroud)

将`find`改为`xpath`. (4认同)
对.`find` /`findAll`是简化的方法,不允许所有类型的XPath.使用当前版本的lxml,`xpath`接受XPath版本1.0. (3认同)
这给了我SyntaxError：谓词无效。 (2认同)

Answer 2

Too*_*ink 7

另一种对我来说看起来更直接的方法：

results = []
root = lxml.hmtl.fromstring(the_html_above)
for tag in root.iter():
    if "TEXT A" in tag.text
        results.append(tag)

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，9 月前
查看次数：	23049 次
最近记录：	12 年，3 月前