地图函数 R 中的进度条 - 网页抓取

fab*_*tto 2 r web-scraping progress-bar rvest purrr

在进行网页抓取时,我一直试图在地图功能中包含进度条。

首先,我收集所有链接,几秒钟内就可以得到结果。

library(rvest)
library(dplyr)
library(stringr)
library(purrr)

news_america_mg_01 <- paste0("https://www.americamineiro.com.br/paginas/page/", 
                                 seq(from = 1, to = 4)) %>% 
  map(. %>% 
        read_html() %>% 
        html_nodes(".gdlr-blog-title a") %>% 
        html_attr("href") %>% 
        as.data.frame())
Run Code Online (Sandbox Code Playgroud)

其次,这是我想要包含进度条的地方,我提取从网站收集的链接的信息。

news_america_mg_02 <- news_america_mg_01 %>%
  map(. %>% 

        #Title
        mutate(title = map_chr(., ~ read_html(.x) %>%
                                          html_node("h1.gdlr-blog-title.entry-title") %>%
                                          html_text()),
               #Date
               data = map_chr(., ~ read_html(.x) %>%
                                        html_node(".gdlr-info .updated a") %>%
                                        html_text()),
               #Text
               text = map_chr(., ~ read_html(.x) %>%
                                 html_node(".size-large+ p") %>%
                                 html_text())))
Run Code Online (Sandbox Code Playgroud)

提前致谢!!

Jef*_*ker 5

purrr:map_chr()使用进度条选项之一创建包装器。图片来源:詹姆斯·阿特金的帖子

map_chr_progress <- function(.x, .f, ..., .id = NULL) {
  .f <- purrr::as_mapper(.f, ...)
  pb <- progress::progress_bar$new(total = length(.x), format = " [:bar] :current/:total (:percent) eta: :eta", force = TRUE)
  
  f <- function(...) {
    pb$tick()
    .f(...)
  }
  purrr::map_chr(.x, f, ..., .id = .id)
}
Run Code Online (Sandbox Code Playgroud)

然后你就可以在你的链中使用它dplyr