gkz*_*gkz 1 python selenium xpath web-scraping python-3.x
我想通过 Python、selenium、firefox 获取 Airbnb 列表页面的 URL,但是,我的程序运行不佳。
我的错误代码如下;
Original exception was:
Traceback (most recent call last):
File "pages.py", line 19, in <module>
for links in driver.find_element_by_xpath('//div[contains(@id, "listing-")]//a[contains(@href, "rooms")]'):
TypeError: 'FirefoxWebElement' object is not iterable
Run Code Online (Sandbox Code Playgroud)
这是我的代码!
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
test_url = 'https://www.airbnb.jp/s/%E6%97%A5%E6%9C%AC%E6%B2%96%E7%B8%84%E7%9C%8C/homes?refinement_paths%5B%5D=%2Fhomes&query=%E6%97%A5%E6%9C%AC%E6%B2%96%E7%B8%84%E7%9C%8C&price_min=15000&allow_override%5B%5D=&checkin=2018-07-07&checkout=2018-07-08&place_id=ChIJ51ur7mJw9TQR79H9hnJhuzU&s_tag=z4scstF7'
opts = FirefoxOptions()
opts.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=opts)
driver.get(test_url)
driver.implicitly_wait(30)
for links in driver.find_element_by_xpath('//div[contains(@id, "listing-")]//a[contains(@href, "rooms")]'):
listing_url = links.get_attribute('href')
print(listing_url)
driver.quit()
Run Code Online (Sandbox Code Playgroud)
我试图改变我的代码,另一个代码如下;(错误信息与我的第一个代码相同。)
from selenium import webdriver
from selenium.webdriver import FirefoxOptions
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.common.exceptions import TimeoutException
test_url = 'https://www.airbnb.jp/s/%E6%97%A5%E6%9C%AC%E6%B2%96%E7%B8%84%E7%9C%8C/homes?refinement_paths%5B%5D=%2Fhomes&query=%E6%97%A5%E6%9C%AC%E6%B2%96%E7%B8%84%E7%9C%8C&price_min=15000&allow_override%5B%5D=&checkin=2018-07-07&checkout=2018-07-08&place_id=ChIJ51ur7mJw9TQR79H9hnJhuzU&s_tag=z4scstF7'
opts = FirefoxOptions()
opts.add_argument("--headless")
driver = webdriver.Firefox(firefox_options=opts)
driver.get(test_url)
driver.implicitly_wait(30)
links = driver.find_element_by_xpath('//a[contains(@href, "rooms")]')
for link in links:
listing_url = link.get_attribute('href')
print(listing_url)
driver.quit()
Run Code Online (Sandbox Code Playgroud)
如果您有时间,我很高兴为您回复。谢谢你。
您需要使用find_elements_by_xpathwhere 返回列表elements
不是find_element_by_xpath只返回一个元素
...
links = driver.find_elements_by_xpath('//div[contains(@id, "listing-")]//a[contains(@href, "rooms")]')
for link in links:
print(link.get_attribute('href')
...
Run Code Online (Sandbox Code Playgroud)
输出
https://www.airbnb.jp/rooms/7793811?location=%E6%97%A5%E6%9C%AC%E6%B2%96%E7%B8%84%E7%9C%8C&check_in=2018-07-07&check_out=2018-07-08
https://www.airbnb.jp/rooms/7793811?location=%E6%97%A5%E6%9C%AC%E6%B2%96%E7%B8%84%E7%9C%8C&check_in=2018-07-07&check_out=2018-07-08
...
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
8530 次 |
| 最近记录: |