如何使用没有类的 BeautifulSoup 提取值

Question

如何使用没有类的 BeautifulSoup 提取值

use*_*217 1 python parsing beautifulsoup html-parsing python-2.7

html代码：

<td class="_480u">
    <div class="clearfix">
        <div>
            Female
        </div>
    </div>
</td>

Run Code Online (Sandbox Code Playgroud)

我想要值“女性”作为输出。

我试过了bs.findAll('div',{'class':'clearfix'})；bs.findAll('tag',{'class':'_480u'}) 但是这些类遍布我的 html 代码，输出是一个很大的列表。我想在我的搜索中加入 {td --> class = ".." 和 div --> class = ".."}，这样我就可以得到女性的输出。我怎样才能做到这一点？

谢谢

Answer 1

fal*_*tru 5

使用stripped_strings属性：

>>> from bs4 import BeautifulSoup
>>>
>>> html = '''<td class="_480u">
...     <div class="clearfix">
...         <div>
...             Female
...         </div>
...     </div>
... </td>'''
>>> soup = BeautifulSoup(html)
>>> print ' '.join(soup.find('div', {'class': 'clearfix'}).stripped_strings)
Female
>>> print ' '.join(soup.find('td', {'class': '_480u'}).stripped_strings)
Female

Run Code Online (Sandbox Code Playgroud)

或将类指定为空字符串（或None）并使用string属性：

>>> soup.find('div', {'class': ''}).string
u'\n            Female\n        '
>>> soup.find('div', {'class': ''}).string.strip()
u'Female'

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，5 月前
查看次数：	2296 次
最近记录：	12 年，5 月前