抓取 selenium 中的特定表

Question

抓取 selenium 中的特定表

Lui*_*ruz 3 python selenium xpath web-scraping

我正在尝试抓取页面上 div 内找到的表格。

到目前为止，基本上这是我的尝试：

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys

driver = webdriver.Chrome()
driver.implicitly_wait(10)

URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location

driver.get(URL)

table = driver.find_element_by_xpath('//div[@class="line-chart"]/table/tbody')

print table.text

Run Code Online (Sandbox Code Playgroud)

如果我运行脚本，并使用“stackoverflow”这样的参数，我应该能够抓取这个网站：https://www.google.us/trends/explore ?date=today%203-m&geo=US&q=stackoverflow

显然我的 xpath 不起作用，程序没有打印任何东西，它只是空白。

我基本上需要该网站上显示的图表的值。这些值（和日期）位于表格内，这是屏幕截图：

你能帮我找到表的正确 xpath 以在 python 上使用 selenium 检索这些值吗？

提前致谢！

Answer 1

Piy*_*ush 5

您可以使用 Xpath 如下：

//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr

Run Code Online (Sandbox Code Playgroud)

在这里，我将完善我的答案并对您的代码进行一些更改，但它不起作用。

# NOTE: Download the chromedriver driver
# Then move exe file on C:\Python27\Scripts
from selenium import webdriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
import time
import sys
from lxml.html import fromstring,tostring

driver = webdriver.Chrome()
driver.implicitly_wait(20)
'''
URL_start = "http://www.google.us/trends/explore?"
date = '&date=today%203-m' # Last 90 days
location = "&geo=US"
symbol = sys.argv[1]
query = 'q='+symbol
URL = URL_start+query+date+location
'''
driver.get("https://www.google.us/trends/explore?date=today%203-m&geo=US&q=stackoverflow")

table_trs = driver.find_elements_by_xpath('//div[@class="line-chart"]/div/div[1]/div/div/table/tbody/tr')

for tr in table_trs:
    #print tr.get_attribute("innerHTML").encode("UTF-8")

    td = tr.find_elements_by_xpath(".//td")
    if len(td)==2:
        print td[0].get_attribute("innerHTML").encode("UTF-8") +"\t"+td[1].get_attribute("innerHTML").encode("UTF-8")

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，2 月前
查看次数：	8741 次
最近记录：	9 年，2 月前