由于某种原因,我的R降价(rmd)中的链接没有格式化为蓝色.将下面的简单rmd编织为pdf会使文本颜色变黑.只有当它悬停在它上面时才会意识到它实际上是一个链接.将它编织成html将使链接变为蓝色.当然我可以使用乳胶包装但我不知道为什么会这样?
sessionInfo()R版本3.3.0(2016-05-03)平台:x86_64-w64-mingw32/x64(64位)运行于:Windows 7 x64(build 7601)Service Pack 1通过命名空间加载(并未附加) ):knitr_1.15
RStudio 1.0.44
---
title: "R Notebook"
output:
pdf_document: default
html_notebook: default
---
```{r, echo=F}
# tex / pandoc options for pdf creation
x <- Sys.getenv("PATH")
y <- paste(x, "E:\\miktex\\miktex\\bin", sep=";")
Sys.setenv(PATH = y)
```
[link](www.rstudio.com)
Run Code Online (Sandbox Code Playgroud)
我可以在dplyr join中为NA定义"填充"值吗?例如,在连接中定义所有NA值应为1?
require(dplyr)
lookup <- data.frame(cbind(c("USD","MYR"),c(0.9,1.1)))
names(lookup) <- c("rate","value")
fx <- data.frame(c("USD","MYR","USD","MYR","XXX","YYY"))
names(fx)[1] <- "rate"
left_join(x=fx,y=lookup,by=c("rate"))
Run Code Online (Sandbox Code Playgroud)
上面的代码将为值"XXX"和"YYY"创建NA.在我的情况下,我加入了大量的列,将会有很多不匹配.所有不匹配应具有相同的值.我知道我可以分几步完成,但问题是一切都可以完成吗?谢谢!
我正在探索使用data.table包装聚合函数(但实际上它可以是任何类型的函数)的不同方法(也提供了一个dplyr示例)并且想知道关于函数式编程/元编程的最佳实践
基本应用是灵活地聚合表,即参数化变量以聚合,聚合的维度,两者的相应结果变量名称和聚合函数.我已经在三个data.table和一个dplyr方式中实现了(几乎)相同的功能:
图书馆
library(data.table)
library(dplyr)
Run Code Online (Sandbox Code Playgroud)
数据
n_size <- 1*10^6
sample_metrics <- sample(seq(from = 1, to = 100, by = 1), n_size, rep = T)
sample_dimensions <- sample(letters[10:12], n_size, rep = T)
df <-
data.frame(
a = sample_metrics,
b = sample_metrics,
c = sample_dimensions,
d = sample_dimensions,
x = sample_metrics,
y = sample_dimensions,
stringsAsFactors = F)
dt <- as.data.table(df)
Run Code Online (Sandbox Code Playgroud)
实现
1. fn_dt_agg1
fn_dt_agg1 <-
function(dt, metric, metric_name, dimension, dimension_name) { …
Run Code Online (Sandbox Code Playgroud) 也许我遗漏了一些明显的东西,但试图将 R 中的命名列表的命名列表(甚至可能更加嵌套)扁平化为最终一个平面列表。purrr
并且rlist
似乎有这方面的工具。我怎样才能实现子列表的名称成为扁平化结果列表的名称预加密,例如list1.blist.a
在purrr
?我的实际列表嵌套得更深,具有不同数量的级别和不同级别上的重复名称。最后我执行purrr::map_df(final_list, bind_rows)
,这似乎删除了所有重复的名称(即使没有,我也不知道原始重复的名称来自哪个分支)。我可以做到这一点,rlist
但我希望找到一个tidyverse
解决方案(没有什么反对的rlist
,但很多人已经tidyverse
安装了)。
编辑:
另请注意,rlist::list.flatten()
始终会删除除顶部之外的所有级别,同时一次purrr::flatten()
删除一个级别,这有时可能是您所需要的。您可以根据需要经常嵌套 purrr::map(.x, .f = rlist::list.flatten) 来实现相同的目的,但它很麻烦而且不美观/可读。
alist <- list(list1 = list(a = 1, b = 2, blist = list(a = 3, b = 4)),
list2 = list(a = 1, b = 2, blist = list(a = 3, b = 4)))
str(alist)
List of 2
$ list1:List of 3
..$ a : num 1 …
Run Code Online (Sandbox Code Playgroud) 我想在ggplot中创建动态ylim值,以便ylim参数引用dplyr通过管道提供的值.为了说明问题,请查看我想要更改为(当前不是通用的)通用代码的工作(非通用)代码.
require(dplyr)
require(scales)
require(ggplot2)
x <- data.frame(name = c("A","B","C"),
value = c(2,4,6))
Run Code Online (Sandbox Code Playgroud)
使用非通用代码:
arrange(x[1:2, ], value) %>%
ggplot(data=., aes(x=factor(name), y=value)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=comma,
limits=c(0,max(arrange(x[1:2, ], value)$value) * 1.1))
Run Code Online (Sandbox Code Playgroud)
不工作的通用代码(调用找不到值):
arrange(x[1:2, ], value) %>%
ggplot(data=., aes(x=factor(name), y=value)) +
geom_bar(stat="identity") +
scale_y_continuous(labels=comma,
limits=c(0,max(value) * 1.1))
Run Code Online (Sandbox Code Playgroud)
所以问题是,是否有任何方法可以设置一般限制,即排列后的部分将始终相同(我需要生成许多相同的图形,具有不同的x,即不同的限制).谢谢!
我正在尝试使用ifelse在dplyr管道中使用条件lead
/ lag
函数但是收到错误.但是,在管道外使用相同的方法似乎有效.我错过了什么?
require(dplyr)
Run Code Online (Sandbox Code Playgroud)
数据:
test <- data.frame(a = c("b","b","b","b","b","b",
"m","m","m","m","m","m",
"s","s","s","s","s","s"),
b = replicate(1,n=18),
stringsAsFactors=F)
Run Code Online (Sandbox Code Playgroud)
dplyr管道:
test %>%
mutate(delta = ifelse(a == "s", b + lag(b, n = 2*6),
ifelse(a == "m", b + lag(b, n = 1*6), 0)))
# Error: could not convert second argument to an integer. type=LANGSXP, length = 3
Run Code Online (Sandbox Code Playgroud)
没有管道它工作:
test$delta <- ifelse(test$a == "s", test$b + lag(test$b, n = 2*6),
ifelse(test$a == "m", test$b + lag(test$b, n = 1*6), 0))
Run Code Online (Sandbox Code Playgroud)
我发现有一些迹象表明dplyr lead …
我升级到RStudio 1.0.44,似乎knitr :: opts_knit $ set(root.dir = path)其中path是我的目录不像以前那样工作.它抛出一条消息:
The working directory was changed to /... inside a notebook chunk. The working
directory will be reset when the chunk is finished running. Use the knitr
root.dir option in the setup chunk to change the the working directory for
notebook chunks.
Run Code Online (Sandbox Code Playgroud)
此消息现在将出现在以下每个命令中.请注意,我还没有编织rmd.我只是在运行命令.通过setwd()直接在命令行中设置工作目录会返回getwd()中的正确路径,但是再次加载具有相对路径(./ ...)的文件将返回上面的消息.RStudio 0.99.896完全相同的rmd工作正常.我错过了什么?
sessionInfo()
R version 3.3.0 (2016-05-03)
Platform: x86_64-w64-mingw32/x64 (64-bit)
>Running under: Windows 7 x64 (build 7601) Service Pack 1
other attached packages:
[1] scales_0.4.0 ggplot2_2.1.0 xtable_1.8-2 data.table_1.9.6
[5] dplyr_0.4.3 …
Run Code Online (Sandbox Code Playgroud) 我尝试按照从R 到 Oracle Database Connectivity: Use ROracle for both Performance and Scalability 的说明,通过 DBI 和 ROracle 包简单地连接到 Oracle 数据库。
当我通过 Windows7 > ODBC 数据源管理器(32 位)测试连接时,连接成功。它使用安装在 C:\oracle\Client112_32 中的 Oracle 客户端 OraClient11g_home1。ORACLE_HOME 环境变量设置为 C:\oracle\Client112_32。
我猜它可能与某些 32 位/64 位问题有关?但即使经过相当多的研究,我也没有找到任何解决方案。我也尝试在 R 32 位上运行相同的,但也失败了。顺便说一句,通过 SQL Developer 的连接也成功了。
drv <- DBI::dbDriver("Oracle")
#>Error: Couldn't find driver Oracle. Looked in:
#>* global namespace
#>* in package called Oracle
#>* in package called ROracle
Run Code Online (Sandbox Code Playgroud) 我试图在R中的矩阵中获取某一行的列总和.但是,我不希望对整行进行求和,而只需要指定数量的列,即在这种情况下所有列都在对角线上方.我已经尝试了sum和rowSums函数,但它们要么给我奇怪的结果或错误信息.为了说明,请参阅下面的8x8矩阵的示例代码.对于第一行,我需要除项目[1,1]之外的行的总和,对于第二行,除了项目[2,1]和[2,2]等之外的总和.
m1 <- matrix(c(0.2834803,0.6398198,0.0766999,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,
0.0000000,0.1101746,0.6354086,0.2544168,0.0000000,0.0000000,0.0000000,0.0000000,
0.0000000,0.0000000,0.0548145,0.9451855,0.0000000,0.0000000,0.0000000,0.0000000,
0.0000000,0.0000000,0.0000000,0.3614786,0.6385214,0.0000000,0.0000000,0.0000000,
0.0000000,0.0000000,0.0000000,0.0000000,0.5594658,0.4405342,0.0000000,0.0000000,
0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.7490395,0.2509605,0.0000000,
0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.5834363,0.4165637,
0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,0.0000000,1.0000000),
8, 8, byrow = TRUE,
dimnames = list(c("iAAA", "iAA", "iA", "iBBB", "iBB", "iB", "iCCC", "iD"),
c("iAAA_p", "iAA_p", "iA_p", "iBBB_p", "iBB_p", "iB_p", "iCCC_p", "iD_p")))
Run Code Online (Sandbox Code Playgroud)
我尝试过以下方法:
rowSums(m1[1, 2:8]) --> Error in rowSums(m1[1, 2:8]) :
'x' must be an array of at least two dimensions
Run Code Online (Sandbox Code Playgroud)
或者:
sum(m1[1,2]:m1[1,8]) --> wrong result of 0.6398198 (which is item [1,2])
Run Code Online (Sandbox Code Playgroud)
据我所知,rowSums需要一个数组而不是一个向量(虽然不确定为什么).但我不明白为什么使用sum的第二种方法不起作用.理想情况下,有一些方法可以只对位于对角线上方的行中的所有列求和.
非常感谢!
问题似乎完全无足轻重,但我无法弄清楚为什么它不起作用.我只想将一个包含"+"运算符的字符变量替换为除"+"运算符之外的单个值.由于某种原因,gsub()和sub()函数替换数值但保留运算符.有关如何克服这一问题的任何暗示?非常感谢!
data <- c(1,2,3,4,"5+")
gsub(pattern="5+",replacement="5",x=data)
#[1] "1" "2" "3" "4" "5+"
gsub(pattern="5+",replacement="",x=data)
#[1] "1" "2" "3" "4" "+"
Run Code Online (Sandbox Code Playgroud)
R 3.0.2
似乎我的一台机器为seq函数产生了错误的结果,而另一台机器或在线r-fiddle(http://www.r-fiddle.org)解释器给出了预期的结果.在有问题的机器上发生以下情况:
seq(from = 1, to = 1.1, by = 0.01)
[1] 1.0 1.0 1.0 1.0 1.0 1.0 1.1 1.1 1.1 1.1 1.1
Run Code Online (Sandbox Code Playgroud)
稍微更改命令会返回预期结果
seq(from = 0.99, to = 1.1, by = 0.01)
[1] 0.99 1.00 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 1.09 1.10
Run Code Online (Sandbox Code Playgroud)
一旦我越过"1"阈值,就会出现错误的结果,例如当我从= 2.95到= 3.1等时,我不知道如何找到答案,因为我无法在我的其他机器上复制问题或在小提琴上.即使重新启动电脑后问题仍然存在.
R版本3.1.3(2015-03-09)
平台:x86_64-w64-mingw32/x64(64位)
在以下位置运行:Windows 7 x64(内部版本7601)Service Pack 1
locale:
[1] LC_COLLATE = German_Germany.1252 LC_CTYPE = German_Germany.1252
[3] LC_MONETARY = German_Germany.1252 LC_NUMERIC = C
[5] LC_TIME = German_Germany.1252
我想循环遍历数字数据帧列表,并使用for循环为每个数据帧的特定列创建绘图.我有一个工作代码,但结果很奇怪.我希望只创建两个图,但R创建四个,我可能根本不明白为什么特别是因为当我使用print而不是plot时,他正在打印我期望的值.下面是一个更大的数据集的小例子.任何想法都非常感谢.非常感谢!
# Create data
a <- c(1,2,3,4,5)
b <- c(6,7,8,9,10)
c <- c(0,0,0,1,0)
d <- c(1,2,3,4,5,6,7,8,9,10)
e <- c(11,12,13,14,15,16,17,18,19,20)
f <- c(0,0,0,0,0,0,0,1,0,0)
# Create data frames
df1 <- data.frame(cbind(a,b,c))
df2 <- data.frame(cbind(d,e,f))
names(df2) <- c("a","b","c")
# Create list of data frames
l <- list(df1,df2)
# Create titles for plots
titlenames <- c("Graph 1","Graph 2")
# Loop over list of data frames and create plots
for (i in l){ for(j in titlenames) {
plot(x=(i$a[i$c==0]),y=(i$b[i$c==0]),main="",xlab="",ylab="")
title(main=paste(j))
}}
Run Code Online (Sandbox Code Playgroud) 这可能是一个愚蠢的问题,但是我可以将变量定义为范围,以便以下结果为TRUE吗?
range1 <- range(0.001-0.002)
0.0015 %in% range1
Run Code Online (Sandbox Code Playgroud)