在 R 中使用 rvest 填充和提交搜索

Lau*_*ura 7 r rvest

我正在学习如何填写表格并rvest在 R 中提交,当我想在 stackoverflow 中搜索 ggplot 标签时,我陷入了困境。这是我的代码:

url<-"https://stackoverflow.com/questions"

(session<-html_session("https://stackoverflow.com/questions"))

(form<-html_form(session)[[2]])
(filled_form<-set_values(form, tagQuery = "ggplot"))
searched<-submit_form(session, filled_form)
Run Code Online (Sandbox Code Playgroud)

我有错误:

Submitting with '<unnamed>'
Error in parse_url(url) : length(url) == 1 is not TRUE
Run Code Online (Sandbox Code Playgroud)

按照这个问题(表单提交时出现 rvest 错误)我尝试了几种方法来解决这个问题,但我不能:

filled_form$fields[[13]]$name<-"submit"
filled_form$fields[[14]]$name<-"submit"
filled_form$fields[[13]]$type<-"button"
filled_form$fields[[14]]$type<-"button"
Run Code Online (Sandbox Code Playgroud)

任何帮助家伙

Wal*_*ldi 2

搜索查询位于因为此表单中html_form(session)[[1]]
没有按钮:submit

<form> 'search' (GET /search)
  <input text> 'q': 
Run Code Online (Sandbox Code Playgroud)

这个解决方法似乎有效:

<form> 'search' (GET /search)
  <input text> 'q': 
  <input submit> '': 
Run Code Online (Sandbox Code Playgroud)

给出以下代码序列:

library(rvest)
url<-"https://stackoverflow.com/questions"
(session<-html_session("https://stackoverflow.com/questions"))
(form<-html_form(session)[[1]])

fake_submit_button <- list(name = NULL,
                           type = "submit",
                           value = NULL,
                           checked = NULL,
                           disabled = NULL,
                           readonly = NULL,
                           required = FALSE)
attr(fake_submit_button, "class") <- "input"

form[["fields"]][["submit"]] <- fake_submit_button
(filled_form<-set_values(form, q = "ggplot"))


searched<-submit_form(session, filled_form)
Run Code Online (Sandbox Code Playgroud)

问题是回复有验证码:

searched$url
[1] "https://stackoverflow.com/nocaptcha?s=7291e7e6-9b8b-4b5f-bd1c-0f6890c23573"
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

您将无法使用 来处理此问题rvest,但是在手动单击验证码后,您会得到您正在寻找的查询:

https://stackoverflow.com/search?q=ggplot
Run Code Online (Sandbox Code Playgroud)

使用我的其他答案可能更容易:

read_html(paste0('https://stackoverflow.com/search?tab=newest&q=',search))
Run Code Online (Sandbox Code Playgroud)