use*_*016 3 python dom beautifulsoup html-parsing
我正在处理具有子标签的 HTML 元素,我想“忽略”或删除这些标签,以便文本仍然存在。刚才,如果我尝试.string任何带有标签的元素,我得到的只是None.
import bs4
soup = bs4.BeautifulSoup("""
<div id="main">
<p>This is a paragraph.</p>
<p>This is a paragraph <span class="test">with a tag</span>.</p>
<p>This is another paragraph.</p>
</div>
""")
main = soup.find(id='main')
for child in main.children:
print child.string
Run Code Online (Sandbox Code Playgroud)
输出:
This is a paragraph.
None
This is another paragraph.
Run Code Online (Sandbox Code Playgroud)
我希望第二行是This is a paragraph with a tag.. 我该怎么做呢?
for child in soup.find(id='main'):
if isinstance(child, bs4.Tag):
print child.text
Run Code Online (Sandbox Code Playgroud)
而且,你会得到:
This is a paragraph.
This is a paragraph with a tag.
This is another paragraph.
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3538 次 |
| 最近记录: |