Dee*_*Roy 4 python beautifulsoup web-scraping python-3.x
我找到了一个很酷的Python脚本,该脚本可从NFL名册上刮取球员信息。但是,我想将NFL Combine结果添加到数据中。我在下面提供了一个玩家的示例。
import urllib.request
from bs4 import BeautifulSoup
URL2 = 'www.nfl.com/player/deandrewwhite/2552657/combine'
soupCombine = BeautifulSoup(urllib.request.urlopen(URL2))
Combinestats = soupCombine.find_all("div", attrs = {"class": "tp-title"})
Combinestats[0].contents
Run Code Online (Sandbox Code Playgroud)
产生:
['3 Cone Drill', < span class="tp-results">6.97 secs< /span>]
Run Code Online (Sandbox Code Playgroud)
如何从Combinestats [0] .contents中获取以下内容?
DrillName = '3 Cone Drill'
DrillResult = 6.97
Run Code Online (Sandbox Code Playgroud)
供参考的是Combinestats中的项目。
for ii in range(len(Combinestats)):
print(Combinestats[ii].contents)
['3 Cone Drill', <span class="tp-results">6.97 secs</span>]
['40 Yard Dash', <span class="tp-results">4.44 Secs</span>]
['Broad Jump', <span class="tp-results">118.0 inches</span>]
['20 Yard Shuttle', <span class="tp-results">4.18 secs</span>]
['Vertical Jump', <span class="tp-results">34.5 inches</span>]
Run Code Online (Sandbox Code Playgroud)
只需使用列表理解即可。
resultSet = soup.find_all("div", attrs = {"class": "tp-title"})
stats = [
(i.contents[0], i.contents[1].text) for i in resultSet
]
Run Code Online (Sandbox Code Playgroud)
或者,for循环。
stats = []
for i in resultSet:
stats.append(i.contents[0], i.contents[1].text)
Run Code Online (Sandbox Code Playgroud)
print(stats)
[
('40 Yard Dash', '4.44 Secs'),
('3 Cone Drill', '6.97 secs'),
('Broad Jump', '118.0 inches'),
('20 Yard Shuttle', '4.18 secs'),
('Vertical Jump', '34.5 inches')
]
Run Code Online (Sandbox Code Playgroud)