使用BeautifulSoup在两个节点之间提取兄弟节点

Question

使用BeautifulSoup在两个节点之间提取兄弟节点

我有一个这样的文件:

<p class="top">I don't want this</p>

<p>I want this</p>
<table>
   <!-- ... -->
</table>

<img ... />

<p> and all that stuff too</p>

<p class="end>But not this and nothing after it</p>

Run Code Online (Sandbox Code Playgroud)

我想提取p [class = top]和p [class = end]段落之间的所有内容.

我可以用BeautifulSoup做一个很好的方法吗？

Answer 1

Łuk*_*asz 8

node.nextSibling 属性是你的解决方案:

from BeautifulSoup import BeautifulSoup

soup = BeautifulSoup(html)

nextNode = soup.find('p', {'class': 'top'})
while True:
    # process
    nextNode = nextNode.nextSibling
    if getattr(nextNode, 'name', None)  == 'p' and nextNode.get('class', None) == 'end':
        break

Run Code Online (Sandbox Code Playgroud)

这个复杂的条件是确保您正在访问HTML标记的属性而不是字符串节点.

归档时间：	15 年，11 月前
查看次数：	1861 次
最近记录：	15 年，11 月前