使用for循环从Python中的beautifulsoup爬取只会返回最后一个结果

Question

使用for循环从Python中的beautifulsoup爬取只会返回最后一个结果

我正在尝试使用beautifulsoup从网页上抓取数据，并将其（最终）输出到csv中。作为第一步，我尝试获取相关表的文本。我设法做到了，但是当我重新运行它时，代码不再为我提供相同的输出：运行for循环时，它不会保存所有的12372条记录，而只是保存了最后一条。

我的代码的缩写版本是：

from bs4 import BeautifulSoup
BirthsSoup = BeautifulSoup(browser.page_source, features="html.parser")
print(BirthsSoup.prettify()) 
# this confirms that the soup has captured the page as I want it to

birthsTable = BirthsSoup.select('#t2 td')
# selects all the elements in the table I want

birthsLen = len(birthsTable)
# birthsLen: 12372

for i in range(birthsLen):
    print(birthsTable[i].prettify())
# this confirms that the beautifulsoup tag object correctly captured all of the table

for i in range(birthsLen):
    birthsText = birthsTable[i].getText()
# this was supposed to compile the text for every element in the table

Run Code Online (Sandbox Code Playgroud)

但是for循环仅保存表中最后一个（即12372nd）元素的文本。我是否需要做其他事情以使它在循环通过时保存每个元素？我认为我先前的（期望的）输出在换行符中包含了每个元素的文本。

这是我第一次使用python，如果我犯了一个明显的错误，因此深表歉意。

Answer 1

Mer*_*rig 6

您正在执行的操作是在每次迭代时覆盖您的birthText字符串，因此到结束时，只会保存最后一个。要解决此问题，请创建一个列表并追加每行：

birthsLen = len(birthsTable)
birthsText = []

for i in range(birthsLen):
    birthsText.append(birthsTable[i].getText())

Run Code Online (Sandbox Code Playgroud)

或者，更简洁地说：

birthsText = [line.getText() for line in birthsTable]

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，10 月前
查看次数：	174 次
最近记录：	6 年，10 月前