PSe*_*ode 12 html python beautifulsoup python-2.7
我正在尝试解析标记之间的文本<blockquote>.当我输入soup.blockquote.get_text().
我得到了我想要的HTML文件中第一个出现的blockquote的结果.如何<blockquote>在文件中找到下一个和顺序标记?也许我只是累了,在文档中找不到它.
示例HTML文件:
<html>
<head>header
</head>
<blockquote>I can get this text
</blockquote>
<p>eiaoiefj</p>
<blockquote>trying to capture this next
</blockquote>
<p></p><strong>do not capture this</strong>
<blockquote>
capture this too but separately after "capture this next"
</blockquote>
</html>
Run Code Online (Sandbox Code Playgroud)
简单的python代码:
from bs4 import BeautifulSoup
html_doc = open("example.html")
soup = BeautifulSoup(html_doc)
print.(soup.blockquote.get_text())
# how to get the next blockquote???
Run Code Online (Sandbox Code Playgroud)
fal*_*tru 17
使用find_next_sibling(如果不是兄弟,请find_next改用)
>>> html = '''
... <html>
... <head>header
... </head>
... <blockquote>blah blah
... </blockquote>
... <p>eiaoiefj</p>
... <blockquote>capture this next
... </blockquote>
... <p></p><strong>don'tcapturethis</strong>
... <blockquote>
... capture this too but separately after "capture this next"
... </blockquote>
... </html>
... '''
>>> from bs4 import BeautifulSoup
>>> soup = BeautifulSoup(html)
>>> quote1 = soup.blockquote
>>> quote1.text
u'blah blah\n'
>>> quote2 = quote1.find_next_siblings('blockquote')
>>> quote2.text
u'capture this next\n'
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
17799 次 |
| 最近记录: |