Kar*_*ngh 5 html python selenium screen-scraping beautifulsoup
我正试图抓住zillow网站的内容.
例如:https: //www.zillow.com/homedetails/689-Luis-Munoz-Marin-Blvd-APT-508-Jersey-City-NJ-07310/108625724_zpid/
问题是我无法抓住价格和税收历史的内容.我认为它们是javascript元素加载页面加载时因此尝试使用selenium但我仍然无法得到它们.以下就是我的尝试.
码
phistory = soup.find("div",{"id": "hdp-price-history"})
print phistory
Run Code Online (Sandbox Code Playgroud)
HTML
<div class="loading yui3-widget yui3-async-block yui3-complaintstable yui3-hdppricehistory yui3-hdppricehistory-content" id="hdp-price-history">
div class="zsg-content-section zsg-loading-spinner_lg"></div>
</div>
Run Code Online (Sandbox Code Playgroud)
这是最外面的元素,但里面没有任何元素.也尝试过没有产生任何元素soup.find_all("table",class_ = "zsg-table yui3-toggle-content-minimized").
您可以尝试等到所需的<table>生成并变得可见:
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait as wait
from selenium.webdriver.support import expected_conditions as EC
driver.get("https://www.zillow.com/homedetails/689-Luis-Munoz-Marin-Blvd-APT-508-Jersey-City-NJ-07310/108625724_zpid/")
table = wait(driver, 10).until(EC.visibility_of_element_located((By.XPATH, '//div[@id="hdp-price-history"]//table')))
print(table.text)
Run Code Online (Sandbox Code Playgroud)
输出:
DATE EVENT PRICE $/SQFT SOURCE
05/03/17 Listed for sale $750,000+159% $534 KELLER WILLIAM...
06/15/11 Sold $290,000-38.3% $206 Public Record
10/14/05 Sold $470,000 $334 Public Record
Run Code Online (Sandbox Code Playgroud)
您也可以不使用 来解析它BeautifulSoup,例如
print(table.find_element_by_xpath('.//td[text()="Listed for sale"]/following::span').text)
Run Code Online (Sandbox Code Playgroud)
输出:
$750,000
Run Code Online (Sandbox Code Playgroud)
或者
print(table.find_element_by_xpath('.//td[text()="Sold"]/following::span').text)
Run Code Online (Sandbox Code Playgroud)
输出:
$290,000
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
484 次 |
| 最近记录: |