R中的ifelse函数组

BIN*_*BIN 4 if-statement r duplicates data.table

我有数据集

ID <- c(1,1,2,2,2,2,3,3,3,3,3,4,4,4)
Eval <- c("A","A","B","B","A","A","A","A","B","B","A","A","A","B")
med <- c("c","d","k","k","h","h","c","d","h","h","h","c","h","k")
df <- data.frame(ID,Eval,med)
> df
    ID Eval med
 1   1    A   c
 2   1    A   d
 3   2    B   k
 4   2    B   k
 5   2    A   h
 6   2    A   h
 7   3    A   c
 8   3    A   d
 9   3    B   h
 10  3    B   h
 11  3    A   h
 12  4    A   c
 13  4    A   h
 14  4    B   k
Run Code Online (Sandbox Code Playgroud)

我尝试创建变量,x并按yID和Eval分组.对于每个ID,if Eval = A, and med = "h" or "k"我设置x = 1,其他明智x = 0,if Eval = B and med = "h" or "k"我设置y = 1,其他明智y = 0.我使用的方式我不喜欢它,我得到了答案,但它似乎不那么好

df <- data.table(df)
setDT(df)[, count := uniqueN(med) , by = .(ID,Eval)]
setDT(df)[Eval == "A", x:= ifelse(count == 1 & med %in% c("k","h"),1,0), by=ID]
setDT(df)[Eval == "B", y:= ifelse(count == 1 & med %in% c("k","h"),1,0), by=ID]


     ID Eval med count  x  y
 1:  1    A   c     2  0 NA
 2:  1    A   d     2  0 NA
 3:  2    B   k     1 NA  1
 4:  2    B   k     1 NA  1
 5:  2    A   h     1  1 NA
 6:  2    A   h     1  1 NA
 7:  3    A   c     3  0 NA
 8:  3    A   d     3  0 NA
 9:  3    B   h     1 NA  1
10:  3    B   h     1 NA  1
11:  3    A   h     3  0 NA
12:  4    A   c     2  0 NA
13:  4    A   h     2  0 NA
14:  4    B   k     1 NA  1
Run Code Online (Sandbox Code Playgroud)

然后我需要折叠行来获取唯一ID,我不知道如何折叠行,任何想法?

输出

 ID x y
 1  0 0
 2  1 1
 3  0 1
 4  0 1
Run Code Online (Sandbox Code Playgroud)

akr*_*run 6

我们创建按'ID'分组的'x'和'y'变量,而NA元素不直接将逻辑向量强制转换为binary(as.integer)

df[, x := as.integer(Eval == "A" & count ==1 & med %in% c("h", "k")) , by = ID]
Run Code Online (Sandbox Code Playgroud)

和'y'类似

df[, y := as.integer(Eval == "B" & count ==1 & med %in% c("h", "k")) , by = ID]
Run Code Online (Sandbox Code Playgroud)

any通过"ID"分组后使用它进行总结

df[, lapply(.SD, function(x) as.integer(any(x))) , ID, .SDcols = x:y]
#   ID x y
#1:  1 0 0
#2:  2 1 1
#3:  3 0 1
#4:  4 0 1
Run Code Online (Sandbox Code Playgroud)

如果我们需要一个紧凑的方法,而不是assinging(:=),我们总结根据条件按"ID","Eval"分组的输出,然后按'ID'分组,我们检查any'x'中是否有TRUE值'y'循环遍历在中描述的列.SDcols.

setDT(df)[,  if(any(uniqueN(med)==1 & med %in% c("h", "k"))) {
        .(x= Eval=="A", y= Eval == "B") } else .(x=FALSE, y=FALSE),
     by = .(ID, Eval)][, lapply(.SD, any) , by = ID, .SDcols = x:y]
#  ID     x     y
#1:  1 FALSE FALSE
#2:  2  TRUE  TRUE
#3:  3 FALSE  TRUE
#4:  4 FALSE  TRUE
Run Code Online (Sandbox Code Playgroud)

如果需要,我们可以转换为二进制类似于第一个解决方案中显示的方法.


Fra*_*ank 5

OP的目标......

"我尝试创建变量x和y,按ID和Eval分组.对于每个ID,如果Eval = A,med ="h"或"k",我设置x = 1,其他方式x = 0,如果Eval = B和med ="h"或"k",我设置y = 1,其他y = 0. [...]然后我需要折叠该行以获得唯一ID"

可以简化为......

对于每个ID和Eval,如果所有med值都是h或所有med值都是k,则标记.

setDT(df) # only do this once
df[, all(med=="k") | all(med=="h"), by=.(ID,Eval)][, dcast(.SD, ID ~ Eval, fun=any)]

   ID     A     B
1:  1 FALSE FALSE
2:  2  TRUE  TRUE
3:  3 FALSE  TRUE
4:  4 FALSE  TRUE
Run Code Online (Sandbox Code Playgroud)

要查看dcast正在做什么,请阅读?dcast并尝试单独运行第一部分,df[, all(med=="k") | all(med=="h"), by=.(ID,Eval)].

使用x和y而不是A和B的更改很简单但不明智(因为不必要的重命名可能会造成混淆,并且当有新的Eval值时会导致额外的工作); 并且改变1/0而不是TRUE/FALSE(因为捕获的值实际上是布尔值).