use*_*035 5 python beautifulsoup web-scraping
我是网络抓取的新手,想从 Spotrac 抓取大学项目的球员姓名和薪水。到目前为止,我所做的如下。
import requests
from bs4 import BeautifulSoup
URL = 'https://www.spotrac.com/nfl/rankings/'
reqs = requests.get(URL)
soup = BeautifulSoup(reqs.text, 'lxml')
print("List of all the h1, h2, h3 :")
for my_tag in soup.find_all(class_="team-name"):
print(my_tag.text)
for my_tag in soup.find_all(class_="info"):
print(my_tag.text)
Run Code Online (Sandbox Code Playgroud)
这个输出只有 100 个名字,但页面有 1000 个元素。有没有原因造成这种情况?
要获取所有名称和其他信息,请进行 Ajax POST 调用https://www.spotrac.com/nfl/rankings/:
import requests
from bs4 import BeautifulSoup
url = 'https://www.spotrac.com/nfl/rankings/'
data = {
'ajax': 'true',
'mobile': 'false'
}
soup = BeautifulSoup(requests.post(url, data=data).content, 'html.parser')
for h3 in soup.select('h3'):
print(h3.text)
print(h3.find_next(class_="rank-value").text)
print('-' * 80)
Run Code Online (Sandbox Code Playgroud)
印刷:
Dak Prescott
$31,409,000
--------------------------------------------------------------------------------
Russell Wilson
$31,000,000
--------------------------------------------------------------------------------
...all the way to
--------------------------------------------------------------------------------
Willie Gay Jr.
$958,372
--------------------------------------------------------------------------------
Jace Sternberger
$956,632
--------------------------------------------------------------------------------
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
197 次 |
| 最近记录: |