如何用beautifulsoup4提取HTML?

jia*_* Ma 1 python beautifulsoup

html看起来像这样:

<td class='Thistd'><a ><img /></a>Here is some text.</td>
Run Code Online (Sandbox Code Playgroud)

我只想得到字符串<td>.我不需要<a>...</a>.我怎样才能做到这一点?

我的代码:

from bs4 import BeautifulSoup
html = """<td class='Thistd'><a><img /></a>Here is some text.</td>"""

soup = BeautifulSoup(html)
tds = soup.findAll('td', {'class': 'Thistd'})
for td in tds:
    print td
    print '============='
Run Code Online (Sandbox Code Playgroud)

我得到的是 <td class='Thistd'><a ><img /></a>Here is some text.</td>

但我只是需要 Here is some text.

The*_*nse 5

码:

from bs4 import BeautifulSoup
html = """<td class='Thistd'><a ><img /></a>Here is some text.</td>"""

soup = BeautifulSoup(html)
tds = soup.findAll('td', {'class': 'Thistd'})
for td in tds:
    print td.text#the only change you need to do
    print '============='
Run Code Online (Sandbox Code Playgroud)

输出:

Here is some text.
=============
Run Code Online (Sandbox Code Playgroud)

注意:

.text用于获取在这种情况下,给定的BS4对象只有文本属性被td标记.对于更多信息,它着眼于官方网站