Ren*_*ene 2 beautifulsoup web-scraping python-2.7
我正在尝试获取给定div中所有p元素的文本(没有标记的内容):
import requests
from bs4 import BeautifulSoup
def getArticle(url):
url = 'http://www.bbc.com/news/business-34421804'
result = requests.get(url)
c = result.content
soup = BeautifulSoup(c)
article = []
article = soup.find("div", {"class":"story-body__inner"}).findAll('p')
for element in article:
article = ''.join(element.findAll(text = True))
return article
Run Code Online (Sandbox Code Playgroud)
问题是这只返回最后一段的内容.但是如果我只使用print,代码就能完美运行:
for element in article:
print ''.join(element.findAll(text = True))
return
Run Code Online (Sandbox Code Playgroud)
我想在别处调用这个函数,所以我需要它来返回文本,而不仅仅是打印它.我搜索了stackoverflow并搜索了很多,但没有找到答案,我不明白可能是什么问题.我使用Python 2.7.9和bs4.提前致谢!
以下代码应该工作 -
import requests
from bs4 import BeautifulSoup
def getArticle(url):
url = 'http://www.bbc.com/news/business-34421804'
result = requests.get(url)
c = result.content
soup = BeautifulSoup(c)
article_text = ''
article = soup.find("div", {"class":"story-body__inner"}).findAll('p')
for element in article:
article_text += '\n' + ''.join(element.findAll(text = True))
return article_text
Run Code Online (Sandbox Code Playgroud)
您的代码中存在几个问题 -
| 归档时间: |
|
| 查看次数: |
5868 次 |
| 最近记录: |