使用 rvest 从 Walmart 获取价格

Tun*_*ung 1 r css-selectors web-scraping rvest

我试图在 Selector Gadget 扩展程序的帮助下,通过 rvest 包从一些沃尔玛商店获取价格和库存状况。我能够获得商店的地址,但无法获得价格和库存状况。任何建议将不胜感激!

这是我到目前为止所做的

    library(dplyr)
    library(rvest)

    url <- read_html("http://www.walmart.com/store/25/search?query=50636282")

    selector_name<-".cs-secondary-copy"
    fnames <- html_nodes(x = url, css = selector_name) %>%
      html_text()
    fnames

    price <- html_nodes(x = url, css = ".sup") %>%
      html_text() %>% 
      as.numeric()
    price

    stock <- html_nodes(x = url, css = ".stockStatus-unavailable") %>%
      html_text()
    stock
Run Code Online (Sandbox Code Playgroud)

输出

    > fnames
    [1] "4820 S Clark St, Mexico, MO 65265"                   "Item availability is updated every day at midnight."
    > price
    numeric(0)
    > stock
    character(0)
Run Code Online (Sandbox Code Playgroud)

来自 Selector Gadget 的相关数据

    <span class="cs-secondary-copy">4820 S Clark St, Mexico, MO 65265</span>

      <div class="csTile">

      <div class="csTile-img">
      <a href="/ip/Virgin-Mobile-LG-Tribute-5-Prepaid-Smartphone/50636282">
      <img class="js-cs-image-link" id="43A657WDTF0J" src="https://i5.walmartimages.com/asr/51a2cea5-abe4-4a03-9711-b995cb7e215f_1.fd7b362cc57347042f4b518ff05de7ec.jpeg?odnHeight=180&amp;odnWidth=180&amp;odnBg=ffffff" alt="Virgin Mobile LG Tribute 5 Prepaid Smartphone" width="144" height="144">
      </a>
      </div>

      <div class="csTile-content">
      <div class="csTile-stockStatus">
      <strong class="stockStatus-unavailable">
      Out of Stock
    </strong>
      </div>
      <div class="price-display csTile-price">
      <p class="csTile-disclaimer">Store Price</p>
      <span class="sup">$</span>15<span class="currency-delimiter">.</span><span class="sup">00</span>
      </div>

      <p class="csTile-heading js-cstile-heading"><span>
      Virgin Mobile LG Tribute 5 Prepaid Smartphone
    </span><div class="js-truncate-disclosure-arrow truncate-disclosure-arrow"></div></p>
      <div class="csTile-rating">
      <span class="stars stars-small">
      <i class="star star-rated"></i><i class="star star-rated"></i><i class="star star-rated"></i><i class="star star-rated"></i><i class="star star-partial"></i><span class="visuallyhidden">4.5 stars</span>
      <span class="visuallyhidden">Average rating: 4.4375 stars</span>
      <span class="stars-reviews stars-reviews--grey">16
    <span class="visuallyhidden">ratings</span>
      </span>
      </span>
      </div>
      <a class="btn btn-inverse l-margin-top js-cs-product-link" id="43A657WDTF0J" href="/ip/Virgin-Mobile-LG-Tribute-5-Prepaid-Smartphone/50636282">
      Buy online
    </a>
      </div>

      </div>
Run Code Online (Sandbox Code Playgroud)

Jon*_*oll 6

[ 2018 年 5 月更新:] 沃尔玛发布了一个 API,它可能能够满足这个问题的需求:https : //medium.com/@kyleake/how-to-extract-data-from-walmart-open-api- efd01a2f91e0——似乎价格可能会从某些端点返回。

也就是说,rvest由于 API 要求您注册,因此抓取 via可能仍然违反条款。


https://help.walmart.com/app/answers/detail/a_id/8#2

你被禁止:

  • 违反或试图违反沃尔玛网站的安全性;
  • 使用任何设备、软件或例行程序来干扰或试图干扰沃尔玛网站的正常运行;或者
  • 使用或试图使用任何引擎、软件、工具、代理或其他设备或机制(沃尔玛或其他第三方网络浏览器提供的搜索机制除外)来导航或搜索沃尔玛网站。

(强调我的)。