我正在尝试将 pdf 转换为数据框,但是由于列标题在每个页面上重复(并且最后一页上有注释),我发现很难想到将其放入的适当方法数据框,同时处理任何动态问题。
我已通过以下方法将其作为对象读入 R:
library(pdftools)
library(dplyr)
library(tidyverse)
temps <- tempfile(fileext = ".pdf")
download.file("https://www.dmo.gov.uk/dmo_static_reports/Gilt%20Operations.pdf", destfile = temps, mode="wb")
Run Code Online (Sandbox Code Playgroud)
我想我会做类似的事情:
ops <- pdf_text(temps) %>%
readr::read_lines()
Run Code Online (Sandbox Code Playgroud)
然后我正在考虑显式删除不必要的行,然后将其转换为数据框。但考虑到上述问题,我认为这不会长期有效。
有人对这个问题的最佳解决方案有什么建议吗?
我有以下格式的 xts df:
structure(c("May 2022", "Jun 2022", "Jul 2022", "Aug 2022", "Sep 2022",
"Oct 2022", "Nov 2022", "Dec 2022", " 3035.199", " 5500.000",
"11568.750", " 2510.000", " 6999.999", "21792.149", " 9750.000",
" 5624.999", " 2250.000", " 4136.975", " 6525.500", " 2771.875",
" 4637.500", "16273.499", " 6000.000", " 4494.649", " 2500.000",
" 0.000", " 3029.000", " 2803.500", " 0.000", "14481.250",
" 4374.998", " 4062.498", " 0.000", " 3075.000", " 6939.249",
" 1500.000", " 4183.157", " 5769.000", " 3559.500", " 3250.000"
), …Run Code Online (Sandbox Code Playgroud) 我目前在某些代码中将一个可反应的对象存储为对象。我希望能够将所述对象转换为 ggplot,但无论我做什么,我都会得到相同错误的变体。使用blastula的add_ggplot函数,我得到:
Error in UseMethod("grid.draw") :
no applicable method for 'grid.draw' applied to an object of class "c('reactable', 'htmlwidget')"
Run Code Online (Sandbox Code Playgroud)
使用 ggplotify 的 as.ggplot 函数,我得到:
Error in UseMethod("as.grob") :
no applicable method for 'as.grob' applied to an object of class "c('reactable', 'htmlwidget')"
Run Code Online (Sandbox Code Playgroud)
有人对如何达到预期结果有建议吗?
编辑:在回答一个问题时,我可能应该最初回答:可反应源自一个非常普通的数据帧。
df <- structure(list(Date = c("2019-02-09", "2019-02-09", "2019-02-09",
"2019-02-09", "2019-02-09", "2019-02-09", "2020-02-09", "2020-02-09",
"2020-02-09", "2020-02-09", "2021-02-09", "2021-02-09", "2021-02-09",
"2021-02-09"), Type = c("HUF", "HAD", "WOK", "STR", "HUF", "HAD",
"WOK", "STR", "HUF", "HAD", "WOK", "STR", "HUF", "HAD"), Value = c(12L, …Run Code Online (Sandbox Code Playgroud)