蟒蛇美汤，捡起所有元素

Question

蟒蛇美汤，捡起所有元素

我是从一个网站，帮助得到一个文本文章python和BeatifulSoup。现在我有一个奇怪的问题......我只是想打印出多个p标签内的文本，这些标签位于 div 中，带有 class dr_article。现在代码看起来像这样：

from bs4 import BeautifulSoup

def getArticleText(webtext):
soup = BeautifulSoup(webtext)
divTag = soup.find_all("div", {"class":"dr_article"})
for tag in divTag:
    pData = tag.find_all("p").text
    print pData

Run Code Online (Sandbox Code Playgroud)

我收到以下错误：

Traceback (most recent call last):
  File "<pyshell#14>", line 1, in <module>
execfile("word_rank/main.py")
  File "word_rank/main.py", line 7, in <module>
articletext.getArticleText(webtext)
  File "word_rank\articletext.py", line 7, in getArticleText
pData = tag.find_all("p").text
AttributeError: 'list' object has no attribute 'text'

Run Code Online (Sandbox Code Playgroud)

但是当我只选择第一个元素[0]之前.text我没有收到错误并且它按预期工作。它获取第一个元素文本。准确地说，我修改了我的代码，它看起来像这样：

from bs4 import BeautifulSoup

def getArticleText(webtext):
soup = BeautifulSoup(webtext)
divTag = soup.find_all("div", {"class":"dr_article"})
for tag in divTag:
    pData = tag.find_all("p")[0].text
    print pData

Run Code Online (Sandbox Code Playgroud)

我的问题是如何一次从所有元素中获取文本？要修改什么，这样我就不会只从一个元素中获取文本，而是从所有元素中获取文本？

Answer 1

4d4*_*d4c 5

您正在获取所有元素，因此该函数返回列表。尝试通过它：

from bs4 import BeautifulSoup

def getArticleText(webtext):
    soup = BeautifulSoup(webtext)
    divTag = soup.find_all("div", {"class":"dr_article"})
    for tag in divTag:
        for element in tag.find_all("p"):
            pData = element.text
            print pData

Run Code Online (Sandbox Code Playgroud)

或者您可以分别选择每个元素：

tag.find_all("p")[0].text
tag.find_all("p")[1].text
tag.find_all("p")[..].text
tag.find_all("p")[N - 1].text
tag.find_all("p")[N].text

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，4 月前
查看次数：	6585 次
最近记录：	7 年，2 月前