jia*_* Ma 1 python beautifulsoup
html看起来像这样:
<td class='Thistd'><a ><img /></a>Here is some text.</td>
Run Code Online (Sandbox Code Playgroud)
我只想得到字符串<td>.我不需要<a>...</a>.我怎样才能做到这一点?
我的代码:
from bs4 import BeautifulSoup
html = """<td class='Thistd'><a><img /></a>Here is some text.</td>"""
soup = BeautifulSoup(html)
tds = soup.findAll('td', {'class': 'Thistd'})
for td in tds:
print td
print '============='
Run Code Online (Sandbox Code Playgroud)
我得到的是 <td class='Thistd'><a ><img /></a>Here is some text.</td>
但我只是需要 Here is some text.
码:
from bs4 import BeautifulSoup
html = """<td class='Thistd'><a ><img /></a>Here is some text.</td>"""
soup = BeautifulSoup(html)
tds = soup.findAll('td', {'class': 'Thistd'})
for td in tds:
print td.text#the only change you need to do
print '============='
Run Code Online (Sandbox Code Playgroud)
输出:
Here is some text.
=============
Run Code Online (Sandbox Code Playgroud)
注意:
将.text用于获取在这种情况下,给定的BS4对象只有文本属性被td标记.对于更多信息,它着眼于官方网站
| 归档时间: |
|
| 查看次数: |
68 次 |
| 最近记录: |