小编QwE*_*y99的帖子

Python网页抓取 - 当页面通过JS加载内容时,如何通过美丽的汤获取资源？

所以我试图使用BeautifulSoup和urllib从特定网站抓取一张桌子.我的目标是从该表中的所有数据创建单个列表.我尝试使用其他网站的表格使用相同的代码,它工作正常.但是,在使用此网站进行尝试时,该表将返回NoneType对象.有人可以帮我弄这个吗？我试过在线寻找其他答案,但我没有太多运气.

这是代码:

import requests
import urllib

from bs4 import BeautifulSoup
soup = BeautifulSoup(urllib.request.urlopen("http://www.teamrankings.com/ncaa-basketball/stat/free-throw-pct").read())

table = soup.find("table", attrs={'class':'sortable'})

data = []
rows = table.findAll("tr")
for tr in rows:
    cols = tr.findAll("td")
    for td in cols:
        text = ''.join(td.find(text=True))
        data.append(text)

print(data)

Run Code Online (Sandbox Code Playgroud)

python screen-scraping urllib beautifulsoup

QwE*_*y99

2015 04-21

5
推荐指数

1
解决办法

1874
查看次数