小编Eri*_*hoi的帖子

Python Selenium - 获取 href 值

我正在尝试从网站复制 href 值，html 代码如下所示：

<p class="sc-eYdvao kvdWiq">
 <a href="https://www.iproperty.com.my/property/setia-eco-park/sale- 
 1653165/">Shah Alam Setia Eco Park, Setia Eco Park
 </a>
</p>

Run Code Online (Sandbox Code Playgroud)

我试过了，driver.find_elements_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href")但它回来了'list' object has no attribute 'get_attribute'。使用driver.find_element_by_css_selector(".sc-eYdvao.kvdWiq").get_attribute("href")返回的None. 但是我不能使用 xpath，因为该网站有 20+ 个 href，我需要全部复制。使用 xpath 只会复制一个。

如果有帮助，所有 20 多个 href 都归入同一类，即sc-eYdvao kvdWiq.

最终，我想复制所有 20+ 个 href 并将它们导出到 csv 文件。

感谢任何可能的帮助。

python selenium xpath css-selectors webdriverwait

Eri*_*hoi

2020 07-10

28
推荐指数

2
解决办法

5万
查看次数

Tabula按区域坐标提取表格

我们可以选择通过指定PDF坐标来从PDF文档中提取表格.对于Windows用户,为了获取坐标,您必须将PDF文件上传到Tabula网页并导出包含坐标的脚本,然后将坐标输入到您的代码中.对于Mac用户,您只需使用预览应用程序和裁剪检查器.我只是想知道是否有任何第三方程序或插件为Windows用户提供此功能？我认为在下列情况下这会很方便:

当您没有互联网接入时.
我认为预览应用程序将更准确,因为我遇到了Tabula网页生成的不准确的坐标.

如果有人能指出我能找到这样的东西,将不胜感激.非常感谢.

python pdf tabula

Eri*_*hoi

2017 08-02

6
推荐指数

4
解决办法

5167
查看次数

尝试使用 Pandas 从 Selenium 的结果中抓取表格

我正在尝试使用 Pandas 从 Javascript 网站上抓取表格。为此，我使用 Selenium 首先到达我想要的页面。我能够以文本格式打印表格（如注释脚本中所示），但我也希望能够在 Pandas 中拥有该表格。我附上我的脚本如下，希望有人能帮助我解决这个问题。

import time
from selenium import webdriver
import pandas as pd

chrome_path = r"Path to chrome driver"
driver = webdriver.Chrome(chrome_path)
url = 'http://www.bursamalaysia.com/market/securities/equities/prices/#/?
filter=BS02'

page = driver.get(url)
time.sleep(2)


driver.find_element_by_xpath('//*[@id="bursa_boards"]/option[2]').click()


driver.find_element_by_xpath('//*[@id="bursa_sectors"]/option[11]').click()
time.sleep(2)

driver.find_element_by_xpath('//*[@id="bm_equity_price_search"]').click()
time.sleep(5)

target = driver.find_elements_by_id('bm_equities_prices_table')
##for data in target:
##    print (data.text)

for data in target:
    dfs = pd.read_html(target,match = '+')
for df in dfs:
    print (df)

Run Code Online (Sandbox Code Playgroud)

运行上面的脚本，我收到以下错误：

Traceback (most recent call last):
  File "E:\Coding\Python\BS_Bursa Properties\Selenium_Pandas_Bursa Properties.py", line 29, in <module> …

Run Code Online (Sandbox Code Playgroud)

javascript python selenium

Eri*_*hoi

2017 07-30

4
推荐指数

2
解决办法

1万
查看次数

标签统计

python ×3

selenium ×2

css-selectors ×1

javascript ×1

pdf ×1

tabula ×1

webdriverwait ×1

xpath ×1

Python Selenium - 获取 href 值

Tabula按区域坐标提取表格

尝试使用 Pandas 从 Selenium 的结果中抓取表格

标签 统计

小编Eri_hoi的帖子

标签统计