小编Bla*_*las的帖子

使用tryCatch和rvest来处理404和其他爬行错误

使用时检索h1标题时rvest,我有时会遇到404页.这将停止该过程并返回此错误.

open.connection(x,"rb")出错:HTTP错误404.

请参阅下面的示例

Data<-data.frame(Pages=c(
"http://boingboing.net/2016/06/16/spam-king-sanford-wallace.html",
"http://boingboing.net/2016/06/16/omg-the-japanese-trump-commer.html",
"http://boingboing.net/2016/06/16/omar-mateen-posted-to-facebook.html",
"http://boingboing.net/2016/06/16/omar-mateen-posted-to-facdddebook.html"))
Run Code Online (Sandbox Code Playgroud)

用于检索h1的代码

library (rvest)
sapply(Data$Pages, function(url){
 url %>%
 as.character() %>% 
 read_html() %>% 
 html_nodes('h1') %>% 
 html_text()
 })
Run Code Online (Sandbox Code Playgroud)

有没有办法包含一个参数来忽略错误并继续这个过程?

r try-catch rvest

8
推荐指数
1
解决办法
9017
查看次数

以 x 轴为日期的闪亮 ggplot 绘图笔刷限制

我正在尝试复制 rstudio 页面上提供的此示例。

画廊/情节交互缩放

问题是我需要对as.Date %d/%m/%Yx 轴使用日期 ( ) 并且在缩放绘图时出现此错误

无效输入:date_trans 仅适用于 Date 类的对象

library(ggplot2)
library(scales)
library (grid)
library (DT)
ui <- fluidPage(
fluidRow(
column(width = 12, class = "well", h4("Left plot controls right plot"),
fluidRow(column(width = 6, plotOutput("plot1", height = 300,
brush = brushOpts(
id = "plot2_brush",
esetOnNew = TRUE)))
,
column(width = 6,plotOutput("plot2", height = 300,
click = "plot_click",
dblclick = dblclickOpts(
id = "plot_dblclick")))
,
fluidRow(column(width = 12, dataTableOutput("selected_rows")))
))))

Date <- c("01/01/2014","01/01/2014","01/01/2014","01/01/2014")
Sevdow <- …
Run Code Online (Sandbox Code Playgroud)

r ggplot2 shiny

5
推荐指数
1
解决办法
1906
查看次数

使用闪亮的文本输入和 dplyr 过滤数据框中的行

我试图在闪亮的应用程序上使用文本输入小部件来过滤数据框中的行,但我无法让它工作。

数据集

df1<-data.frame (Name=c("Carlos","Pete","Carlos","Carlos","Carlos","Pete","Pete","Pete","Pete","Homer"),Sales=(as.integer(c("3","4","7","6","4","9","1","2","1","9"))))
Run Code Online (Sandbox Code Playgroud)

用户界面

shinyUI(fluidPage(
titlePanel("Sales trends"),titlePanel("People score"),

sidebarLayout(sidebarPanel(

  textInput("text", label = h3("Text input"), value = "Enter text..."),

  numericInput("obs", "Number of observations to view:", 3),

  helpText("Note: while the data view will show only the specified",
           "number of observations, the summary will still be based",
           "on the full dataset."),

  submitButton("Update View")
),

mainPanel(
  h4("Volume: Total sales"),
  verbatimTextOutput("volume"),

  h4("Top people"),
  tableOutput("view")
))))
Run Code Online (Sandbox Code Playgroud)

服务器

library(shiny)
library (dplyr)
df1<-data.frame (Name=c("Carlos","Pete","Carlos","Carlos","Carlos","Pete","Pete","Pete","Pete","Homer"),Sales=(as.integer(c("3","4","7","6","4","9","1","2","1","9"))))
shinyServer(function(input, output) {
output$value <- renderPrint({ input$text })
datasetInput <- reactive({
switch(input$dataset,df1%>% filter(Name …
Run Code Online (Sandbox Code Playgroud)

r shiny dplyr shinydashboard

3
推荐指数
1
解决办法
5671
查看次数

使用R从网页中提取元描述

您好我正在尝试检索这些wepages元描述

从页面来源"

Data<-data.frame(Pages=c(
"http://boingboing.net/2016/06/16/spam-king-sanford-wallace.html", 
"http://boingboing.net/2016/06/16/omg-the-japanese-trump-commer.html",
"http://boingboing.net/2016/06/16/omar-mateen-posted-to-facebook.html"))
Run Code Online (Sandbox Code Playgroud)

期望的输出

Data$Meta_Description<-data.frame(Extracted=c(
"Sanford Wallace gets 2.5 years in prison for 27 million Facebook", 
"OMG, this Japanese Trump Commercial is everything",
"Omar Mateen posted to Facebook during Orlando mass shooting"))
Run Code Online (Sandbox Code Playgroud)

我试图用httr来完成这个任务但是我无法以所需的输出格式获取它或者从使用GET命令检索的内容中提取内容

library (httr)
resp<-GET ("http://boingboing.net/2016/06/16/spam-king-sanford-wallace.html")
str(resp)
List of 10
$ url        : chr "http://boingboing.net/2016/06/16/spam-king-sanford-wallace.html"
$ status_code: int 200
$ headers    :List of 22
..$ server                     : chr "Apache/2.2"
Run Code Online (Sandbox Code Playgroud)

我需要从源代码中提取的字段在此字符串之后

<meta itemprop="description" content="
Run Code Online (Sandbox Code Playgroud)

像这样

<meta itemprop="description" content="&#039;Spam King&#039; 
Sanford Wallace gets 2.5 years in prison for …
Run Code Online (Sandbox Code Playgroud)

r httr rvest

2
推荐指数
1
解决办法
1600
查看次数

文件名中带有sys.time的write.csv

我需要每天在文件夹中生成许多csv文件和故事,文件名中包含处理时间。

我试图将系统时间附加到文件名中,但无法使用paste0做到这一点

write.csv(output, paste0("C://Users/My Computer/dir", Sys.time(), ".csv"))
Run Code Online (Sandbox Code Playgroud)

是否可以在文件中包含系统时间,还是这些文件的用户更好地找到了按修改日期读取这些文件的功能?

r

1
推荐指数
1
解决办法
3105
查看次数

标签 统计

r ×5

rvest ×2

shiny ×2

dplyr ×1

ggplot2 ×1

httr ×1

shinydashboard ×1

try-catch ×1