LAN*_*ark 4 python beautifulsoup web-scraping python-3.x
使用Python 3和BeautifulSoup 4,我希望能够从HTML页面中提取仅由其上方的注释描绘的文本。一个例子:
<\!--UNIQUE COMMENT-->
I would like to get this text
<\!--SECOND UNIQUE COMMENT-->
I would also like to find this text
Run Code Online (Sandbox Code Playgroud)
我找到了多种方法来提取页面的文本或评论,但没有办法完成我要寻找的事情。任何帮助将不胜感激。
您只需要遍历所有可用注释,以查看它是否是必需的条目之一,然后显示以下元素的文本,如下所示:
from bs4 import BeautifulSoup, Comment
html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')
for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:
print comment.next_element.strip()
Run Code Online (Sandbox Code Playgroud)
这将显示以下内容:
from bs4 import BeautifulSoup, Comment
html = """
<html>
<body>
<p>p tag text</p>
<!--UNIQUE COMMENT-->
I would like to get this text
<!--SECOND UNIQUE COMMENT-->
I would also like to find this text
</body>
</html>
"""
soup = BeautifulSoup(html, 'lxml')
for comment in soup.findAll(text=lambda text:isinstance(text, Comment)):
if comment in ['UNIQUE COMMENT', 'SECOND UNIQUE COMMENT']:
print comment.next_element.strip()
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
4601 次 |
| 最近记录: |