如何在 Selenium 的 XPath 选择器中选择所有子文本但不包括标签？

Question

如何在 Selenium 的 XPath 选择器中选择所有子文本但不包括标签？

nul*_*ull 2 html python selenium xpath selenium-webdriver

我有这个 html：

<div id="content">
    <h1>Title 1</h1><br><br>

    <h2>Sub-Title 1</h2>
    <br><br>
    Description 1.<br><br>Description 2.
    <br><br>

    <h2>Sub-Title 2</h2>
    <br><br>
    Description 1<br>Description 2<br>
    <br><br>

    <div class="infobox">
        <font style="color:#000000"><b>Information Title</b></font>
        <br><br>Long Information Text
    </div>
</div>

Run Code Online (Sandbox Code Playgroud)

我想<div id="content">在 Selenium 的find_element_by_xpath函数中获取所有文本，但不包括<div class="infobox">的内容，所以预期的结果是这样的：

Title 1


Sub-Title 1


Descripton 1.

Descripton 2.


Sub-Title 2


Descripton 1.
Descripton 2.

Run Code Online (Sandbox Code Playgroud)

我可以通过在在线 XPath 测试器中使用此代码来获取它：

//div[@id="content"]/descendant::text()[not(ancestor::div/@class="infobox")]

Run Code Online (Sandbox Code Playgroud)

但是如果我将代码传递给 selenium 的 find_element_by_xpath，我会得到selenium.common.exceptions.InvalidSelectorException.

result = driver.find_element_by_xpath('//div[@id="content"]/descendant::text()[not(ancestor::div/@class="infobox")]')

Run Code Online (Sandbox Code Playgroud)

Answer 1

ale*_*cxe 5

内部使用的 xpathfind_element_by_xpath()必须指向一个元素，而不是文本节点和属性。

这里最简单的方法是找到父标签，找到要排除的文本的子标签，然后从父文本中删除子文本：

parent = driver.find_element_by_id('content')
child = parent.find_element_by_class_name('infobox')
print parent.text.replace(child.text, '')

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，10 月前
查看次数：	2185 次
最近记录：	10 年，10 月前