我试图在我的工作计算机上的本地驱动器上安装R和R studio,而不是组织网络文件夹,因为任何通过网络运行的都非常慢.安装时,目标路径显示它是我的本地C:驱动器.但是,当我安装新软件包时,显示的默认路径是我的网络驱动器,没有更改选项:
.libPaths()
[1] "\\\\The library/path/I/don't/want"
[2] "C:/Program Files/R/R-3.2.1/library"
Run Code Online (Sandbox Code Playgroud)
我正在运行Windows 7专业版.如何删除库路径[1]并使路径[2]成为我安装的所有基础包和所有新包的主要路径?
使用以下示例数据,我试图根据三个条件变量(Denial1,Denial2和Denial3)的值创建一个新变量"Den"(值"0"或"1").
如果三个条件变量中的任何一个具有"0"并且仅当具有其中值的EACH条件变量具有值"1"(例如,不是NA)时,我想要"0".
structure(list(Denial1 = NA_real_, Denial2 = 1, Denial3 = NA_real_,
Den = NA), .Names = c("Denial1", "Denial2", "Denial3", "Den"
), row.names = 1L, class = "data.frame")
Run Code Online (Sandbox Code Playgroud)
我已经尝试了以下两个命令导致"Den"缺少值NA:
DF$Den<-ifelse (DF$Denial1 < 1 | DF$Denial2 < 1 | DF$Denial3 < 1, "0", "1")
DF$Den<-ifelse(DF$Denial1 < 1,"0", ifelse (DF$Denial2 < 1,"0", ifelse(DF$Denial3 < 1,"0", "1")))
Run Code Online (Sandbox Code Playgroud)
有人可以演示如何做到这一点?
通过以下示例数据帧,我想从因子"群组"的每个级别绘制ID的"ID"的分层随机样本(例如,40%):
data<-structure(list(Cohort = c(1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L), ID = structure(1:20, .Label = c("a1 ",
"a2", "a3", "a4", "a5", "a6", "a7", "a8", "a9", "b10", "b11",
"b12", "b13", "b14", "b15", "b16", "b17", "b18", "b19", "b20"
), class = "factor")), .Names = c("Cohort", "ID"), class = "data.frame", row.names = c(NA,
-20L))
Run Code Online (Sandbox Code Playgroud)
我只知道如何使用以下内容绘制随机数:
library(dplyr)
data %>%
group_by(Cohort) %>%
sample_n(size = 10)
Run Code Online (Sandbox Code Playgroud)
但我的实际数据是纵向的,所以我在每个队列中有多个相同ID的案例和几个不同大小的队列,因此需要选择一定比例的唯一ID.任何援助将不胜感激.
>ID<-c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C')
>WK<-c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5)
>NumSuccess<-c(0, 0, 2, 0, 0, 1, 0, 0, 0, 0, 3)
>Data<-data.frame(ID, WK, NumSuccess)
Run Code Online (Sandbox Code Playgroud)
我正在尝试根据"NumSuccesses"中的值创建子集data.frame"Data2",该值对应于"ID"中按"ID"分组的"WK"中的最大值.生成的data.frame应如下所示:
>ID<-c('A','B','C')
>WK<-c(3, 3, 5)
>NumSuccess<-c(2, 1, 3)
>Data2<-data.frame(ID, WK, NumSuccess)
Run Code Online (Sandbox Code Playgroud) 当将一个字符串与 data.table 中的两个以上其他字符串进行比较时,我试图创建一个作为逻辑值的变量,我需要忽略 NA。
D2 的样本数据:
structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1",
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1",
"char1", "char1")), row.names = c(NA, -3L), class = c("data.table",
"data.frame"), .internal.selfref = <pointer: 0x0000015eb1261ef0>)
Run Code Online (Sandbox Code Playgroud)
尝试了以下建议的解决方案:
D2[, Match := apply(sapply(.SD, `==`, D2[, "var1"]), 1, any), .SDcols =
c("var2", "var3")]
Run Code Online (Sandbox Code Playgroud)
a003 的结果是 TRUE 而它应该是 FALSE 因为 var1 和 var3 不匹配:
structure(list(ID = c("a001", "a002", "a003"), var1 = c("char1",
"char1", "char2"), var2 = c("char1", NA, "char2"), var3 = c("char1", …Run Code Online (Sandbox Code Playgroud) 我如何创建一个新的变量"CountWK",它基于"WK"中的值的计数,直到"性能"中的第一个"1"实例按"ID"分组?
ID<-c('A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C', 'C', 'C')
WK<-c(1, 2, 3, 1, 2, 3, 1, 2, 3, 4, 5)
Performance<-c(0,1,1,0,1,0,0,1,0,1,1)
Data<-data.frame(ID, WK, Performance)
Run Code Online (Sandbox Code Playgroud)
因此,对于ID"A",CountWk将为"2",对于"B""2",对于C"2",除了包含第一个实例的行之外的每个其他行,"CountWk"中的值为N/A. "表演"中的"1".
我有一个数据文件,其中包含三列中的数值和两个分组变量(ID 和 Group),我需要从中通过 ID 和 Group 计算单个最大值:
structure(list(ID = structure(c(1L, 1L, 1L, 2L), .Label = c("a1",
"a2"), class = "factor"), Group = structure(c(1L, 1L, 2L, 2L), .Label =
c("abc",
"def"), class = "factor"), Score1 = c(10L, 0L, 0L, 5L), Score2 = c(0L,
0L, 5L, 10L), Score3 = c(0L, 11L, 2L, 11L)), class = "data.frame", row.names =
c(NA,
-4L))
Run Code Online (Sandbox Code Playgroud)
我试图获得的结果是:
structure(list(ID = structure(c(1L, 1L, 2L), .Label = c("a1",
"a2"), class = "factor"), Group = structure(c(1L, 2L, 2L), .Label = c("abc",
"def"), …Run Code Online (Sandbox Code Playgroud) 我想创建一个新变量"Count",它是一个因子"Period"的唯一值的计数,通过对变量"ID"进行分组.以下数据包含一个列,其中包含我想要的"Count"值:
structure(list(ID = structure(c(1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L
), .Label = c("a", "b"), class = "factor"), Period = c(1.1, 1.1,
1.2, 1.3, 1.2, 1.3, 1.5, 1.5), Count = c(1L, 1L, 2L, 3L, 1L,
2L, 3L, 3L)), .Names = c("ID", "Period", "Count"), class = "data.frame", row.names = c(NA,
-8L))
Run Code Online (Sandbox Code Playgroud)
我尝试使用mutate Count = 1:length(Period)但它创建了"Period"的每个值的累积计数,而我想要只有唯一值的累积计数.这是我试过的:
library(plyr)
samp1<-ddply(samp, .(ID, Period), mutate, Count = 1:length(Period))
Run Code Online (Sandbox Code Playgroud)
任何人都可以提供正确的功能吗?
我创建了一个分组的boxplot,并添加了三个特定geom_hlines的情节.但是,我想将hline颜色设置为fill=factor(Training.Location),而不是尝试使用调色板手动匹配颜色.有没有办法做到这一点?
ggplot(aes(x=factor(CumDes),y=Mn_Handle), data=NH_C) +
geom_boxplot( aes(fill=factor(Training.Location))) +
geom_point( aes(color=factor(Training.Location)),
position=position_dodge(width=0.75) ) +
theme(axis.ticks = element_blank(), axis.text.x = element_blank()) +
coord_cartesian(ylim = c(0, 2000)) +
geom_hline(yintercept=432, linetype="dashed", lwd=1.2) +
geom_hline(yintercept=583, linetype="dashed", lwd=1.2) +
geom_hline(yintercept=439, linetype="dashed", lwd=1.2)
Run Code Online (Sandbox Code Playgroud) r ×9
data.table ×2
dplyr ×2
count ×1
ggplot2 ×1
if-statement ×1
plyr ×1
random ×1
sampling ×1
subset ×1