我有一个稀疏数据表,如下所示:
data = data.table(
var1 = c("a","",""),
var2 = c("","","c"),
var3 = c("a","b",""),
var4 = c("","b","")
)
var1 var2 var3 var4
1: a a
2: b b
3: c
Run Code Online (Sandbox Code Playgroud)
我想添加一个包含一串零和一列的列,指示任何行中存在哪些变量,如下所示:
var1 var2 var3 var4 concat
1: a a 1|0|1|0
2: b b 0|0|1|1
3: c 0|1|0|0
Run Code Online (Sandbox Code Playgroud)
我可以使用以下命令来实现此目的:
data[, concat := paste(
as.integer(var1 != ""),
as.integer(var2 != ""),
as.integer(var3 != ""),
as.integer(var4 != ""),
sep = "|")]
Run Code Online (Sandbox Code Playgroud)
但是,如果我有数百个变量,我宁愿使用某种计算来获得所需的表达式.也许基于paste0("var",1:4)或至少是列名称的向量.有什么建议?
我正在学习操作data.table变量的语法.虽然我可以做简单的事情,但我的理解对于更复杂的任务来说还不够彻底.例如,我想将以下数据转换为每行具有一个不同的"类型"值,基于"子类型"的值生成单独的列,并且当存在具有相同"类型/子类型的多个行时折叠唯一值"组合.
给定输入数据:
data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)
var1 var2 subtype type
1 a aa 1 A
2 b bb 2 A
3 c cc 2 A
4 b dd 2 A
5 d ee 1 B
6 e ee 1 B
7 f ff 2 B
Run Code Online (Sandbox Code Playgroud)
我想得出:
1.var1 1.var2 2.var1 2.var2 2.type
A "a" "aa" "b|c" "bb|cc|dd" "A"
B "d|e" "ee" "f" "ff" "B"
Run Code Online (Sandbox Code Playgroud)
使用数据框,我可以使用以下代码实现此目的:
data.derived = do.call(
rbind, …Run Code Online (Sandbox Code Playgroud)