如何使用 BeautifulSoup 删除嵌套标签中的内容?

alv*_*vas 3 html python nested beautifulsoup

如何使用 删除嵌套标签中的内容BeautifulSoup?这些帖子显示了反向检索嵌套标签中的内容:如何使用 BeautifulSoupBeautifulSoup获取嵌套标签的内容:如何从包含一些嵌套 <ul 的 <ul> 列表中提取所有 <li> > 吗?

我试过了,.text但它只删除了标签

>>> from bs4 import BeautifulSoup as bs
>>> html = "<foo>Something something <bar> blah blah</bar> something</foo>"
>>> bs(html).find_all('foo')[0]
<foo>Something something <bar> blah blah</bar> something else</foo>
>>> bs(html).find_all('foo')[0].text
u'Something something  blah blah something else'
Run Code Online (Sandbox Code Playgroud)

期望的输出:

别的东西别的东西

Alv*_*tes 5

您可以检查bs4.element.NavigableString儿童:

from bs4 import BeautifulSoup as bs
import bs4
html = "<foo>Something something <bar> blah blah</bar> something <bar2>GONE!</bar2> else</foo>"
def get_only_text(elem):
    for item in elem.children:
        if isinstance(item,bs4.element.NavigableString):
            yield item

print ''.join(get_only_text(bs(html).find_all('foo')[0]))
Run Code Online (Sandbox Code Playgroud)

输出;

Something something  something  else
Run Code Online (Sandbox Code Playgroud)