Vin*_*nce 0 python web-scraping python-3.x selenium-webdriver
如何通过类名查找元素而不重复输出?我有两堂课要刮hdrlnk和results-price。我写的代码是这样的:
x = driver.find_elements_by_class_name(['hdrlnk','result-price'])
Run Code Online (Sandbox Code Playgroud)
它给了我一些错误。我尝试过另一个代码,如下:
x = driver.find_elements_by_class_name('hdrlnk'),
y = driver.find_elements_by_class_name('result-price')
for xs in x:
for ys in y:
print(xs.text + ys.text)
Run Code Online (Sandbox Code Playgroud)
但我得到了这样的结果
sony 5 disc cd changer$40
sony 5 disc cd changer$70
sony 5 disc cd changer$70
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$190
sony 5 disc cd changer$10
Run Code Online (Sandbox Code Playgroud)
我试图抓取的 HTML 结构部分
<p class="result-info">
<span class="icon icon-star" role="button" title="save this post in your favorites list">
<span class="screen-reader-text">favorite this post</span>
</span>
<time class="result-date" datetime="2019-11-07 18:20" title="Thu 07 Nov 06:20:56 PM">Nov 7</time>
<a href="https://vancouver.craigslist.org/rch/ele/d/chandeliers/7015824686.html" data-id="7015824686" class="result-title hdrlnk">CHANDELIERS</a>
<span class="result-meta">
<span class="result-price">$800</span>
<span class="result-hood"> (Richmond)</span>
<span class="result-tags">
<span class="pictag">pic</span>
</span>
<span class="banish icon icon-trash" role="button">
<span class="screen-reader-text">hide this posting</span>
</span>
<span class="unbanish icon icon-trash red" role="button" aria-hidden="true"></span>
<a href="#" class="restore-link">
<span class="restore-narrow-text">restore</span>
<span class="restore-wide-text">restore this posting</span>
</a>
</span>
</p>
Run Code Online (Sandbox Code Playgroud)
第一个元素重复,但我得到了第二个元素的正确值。我该如何纠正这个错误?
.find_elements_by_class_name()只需要一个类名。我建议使用 CSS 选择器来完成这项工作,例如.hdrlnk .result-price. 代码看起来像
prices = driver.find_elements_by_css_selector('.hdrlnk .result-price')
Run Code Online (Sandbox Code Playgroud)
这将打印所有价格。如果您还想要标签,则必须编写更多代码。
for heading in driver.find_elements_by_css_selector('.hdrlnk'):
print(heading.text)
for price in heading.find_elements_by_xpath('./following::span[@class="result-price"]'):
print(' ' + price.text)
Run Code Online (Sandbox Code Playgroud)
有关查找元素的所有选项,请参阅文档。
CSS 选择器参考:
W3C 参考
Selenium 技巧:CSS 选择器
驯服高级 CSS 选择器
| 归档时间: |
|
| 查看次数: |
9464 次 |
| 最近记录: |