edy*_*y13 8 python selenium highcharts
我试图从高清图中抓取数据.我看了类似的问题,但不明白script_execute如何工作或如何使用我的浏览器检测js.这是我目前的代码:
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
# Core settings
chrome_path = r"C:\Users\X\Y\chromedriver_win32\chromedriver.exe"
driver = webdriver.Chrome(chrome_path)
driver.implicitly_wait(15)
stats_url = 'https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/'
driver.get(stats_url)
driver.find_element_by_link_text('by Source').click()
driver.find_element_by_id('custom-date-range').click()
year = driver.find_element_by_id('date-range-start')
year.click()
for i in range(5): # goes back 5 years
year.send_keys(Keys.ARROW_DOWN)
driver.find_element_by_id('date-range-submit').click()
Run Code Online (Sandbox Code Playgroud)
我想从图表中删除"下载"数据,(不仅仅是针对许多页面的此页面).当我使用自定义搜索选项时,网站自动生成的csv文件不会更新.所以唯一的方法是从图表中删除数据.我怎么能这样做?
Mozilla提供了一个简单的REST API来获取统计信息,因此您不需要使用Selenium.
随着requests模块:
url = "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20170823-20171023.json"
data = requests.get(url).json()
Run Code Online (Sandbox Code Playgroud)
要选择范围,只需更新URL中的日期即可.
但是,如果你仍然愿意用selenium废弃图表:
dates = driver.execute_script("return Highcharts.charts[0].series[0].xData");
users = driver.execute_script("return Highcharts.charts[0].series[0].yData");
downloads = driver.execute_script("return Highcharts.charts[0].series[1].yData");
Run Code Online (Sandbox Code Playgroud)
我注意到了一件事.
似乎是这样的:
"当我使用自定义搜索选项时,网站自动生成的csv文件不会更新".
但事实上并非如此.它已更新,但最大"自定义数据范围"似乎为1年.
例如,如果您从设置2013-09-23到2017-10-23该.csv(以.json)产生具有最大1年的数据(从这个例子22/10/2016来21/10/2017).
如果你玩"极端",你可以更好地注意到这一点.
例如:
https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131023-20141023.json
Run Code Online (Sandbox Code Playgroud)
{"date": "2014-10-23", "count": 212730, "end": "2014-10-23"}{"date": "2013-10-24", "count": 163094, "end": "2013-10-24"}如果你改变:
https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131023-20141024.json
Run Code Online (Sandbox Code Playgroud)
{"date": "2014-10-24", "count": 215105, "end": "2014-10-24"}{"date": "2013-10-25", "count": 168018, "end": "2013-10-25"}或者:
https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-20131022-20141023.json
Run Code Online (Sandbox Code Playgroud)
将再次:
{"date": "2014-10-23", "count": 212730, "end": "2014-10-23"}{"date": "2013-10-24", "count": 163094, "end": "2013-10-24"}因此,为了获得过去5年的数据,您可以:
import subprocess
interestedYears=5;
year=1
today="2017-10-23"
tokenDataToday= today.split("-")
dateEnd=tokenDataToday[0]+tokenDataToday[1]+tokenDataToday[2]
url= "https://addons.mozilla.org/en-US/firefox/addon/adblock-plus/statistics/downloads-day-"
while year <= interestedYears:
yearStart= str(int(float(tokenDataToday[0]))-year)
dateStart=yearStart+tokenDataToday[1]+tokenDataToday[2]
#print("dateStart: " + dateStart)
#print("dateEnd: " + dateEnd)
tmpUrl=url+dateStart+"-"+dateEnd+".csv"
cmd = 'curl -O ' + tmpUrl
print(cmd)
args = cmd.split()
process = subprocess.Popen(args, shell=False, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
stdout, stderr = process.communicate()
dateEnd=dateStart
year = year+1
print("-----------------------------")
Run Code Online (Sandbox Code Playgroud)