python-如何从python美丽汤中获取桌子上的肢体？

Question

python-如何从python美丽汤中获取桌子上的肢体？

JPC*_*JPC 4 python beautifulsoup web-scraping

我正在尝试从http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals的 “决赛列表”表（第二表）中删除Year＆Winners（第一和第二列）：我正在使用以下代码：

import urllib2
from BeautifulSoup import BeautifulSoup

url = "http://www.samhsa.gov/data/NSDUH/2k10State/NSDUHsae2010/NSDUHsaeAppC2010.htm"
soup = BeautifulSoup(urllib2.urlopen(url).read())
soup.findAll('table')[0].tbody.findAll('tr')
for row in soup.findAll('table')[0].tbody.findAll('tr'):
    first_column = row.findAll('th')[0].contents
    third_column = row.findAll('td')[2].contents
    print first_column, third_column

Run Code Online (Sandbox Code Playgroud)

通过上面的代码，我能够获得第一列和第三列。但是，当我使用与相同的代码时http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals，它找不到tbody作为其元素，但是当我检查该元素时可以看到tbody。

url = "http://en.wikipedia.org/wiki/List_of_FIFA_World_Cup_finals"
soup = BeautifulSoup(urllib2.urlopen(url).read())

print soup.findAll('table')[2]

    soup.findAll('table')[2].tbody.findAll('tr')
    for row in soup.findAll('table')[0].tbody.findAll('tr'):
        first_column = row.findAll('th')[0].contents
        third_column = row.findAll('td')[2].contents
        print first_column, third_column

Run Code Online (Sandbox Code Playgroud)

这是我从评论错误中得到的：

'
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-150-fedd08c6da16> in <module>()
      7 # print soup.findAll('table')[2]
      8 
----> 9 soup.findAll('table')[2].tbody.findAll('tr')
     10 for row in soup.findAll('table')[0].tbody.findAll('tr'):
     11     first_column = row.findAll('th')[0].contents

AttributeError: 'NoneType' object has no attribute 'findAll'

'

Run Code Online (Sandbox Code Playgroud)

Answer 1

Der*_*itz 7

如果要通过浏览器中的检查工具进行检查，它将插入tbody标签。

源代码可能包含也可能不包含它们。如果您确实想知道，我建议您查看源代码视图。

无论哪种方式，您都无需遍历tbody，只需：

soup.findAll('table')[0].findAll('tr') 应该管用。

归档时间：	11 年，11 月前
查看次数：	18092 次
最近记录：	7 年，5 月前