我正在学习操作data.table变量的语法.虽然我可以做简单的事情,但我的理解对于更复杂的任务来说还不够彻底.例如,我想将以下数据转换为每行具有一个不同的"类型"值,基于"子类型"的值生成单独的列,并且当存在具有相同"类型/子类型的多个行时折叠唯一值"组合.
给定输入数据:
data = data.frame(
var1 = c("a","b","c","b","d","e","f"),
var2 = c("aa","bb","cc","dd","ee","ee","ff"),
subtype = c("1","2","2","2","1","1","2"),
type = c("A","A","A","A","B","B","B")
)
var1 var2 subtype type
1 a aa 1 A
2 b bb 2 A
3 c cc 2 A
4 b dd 2 A
5 d ee 1 B
6 e ee 1 B
7 f ff 2 B
Run Code Online (Sandbox Code Playgroud)
我想得出:
1.var1 1.var2 2.var1 2.var2 2.type
A "a" "aa" "b|c" "bb|cc|dd" "A"
B "d|e" "ee" "f" "ff" "B"
Run Code Online (Sandbox Code Playgroud)
使用数据框,我可以使用以下代码实现此目的:
data.derived = do.call(
rbind,
lapply(
split(data,list(data$type)),
function(x) {
do.call (
c,
lapply(
split(x, list(x$subtype)),
function(y) {
result = c(
var1 = paste(unique(y$var1),collapse ="|"),
var2 = paste(unique(y$var2),collapse ="|")
)
if (as.character(y$subtype[1]) == "2") {
result = c(result, type = as.character(y$type[1]))
}
result}))}))
Run Code Online (Sandbox Code Playgroud)
如何使用数据表执行相同操作?
从您的结果中可以清楚地看到,您正在将数据从长格式转换为宽格式,并且子类型沿着行方向传播,因此您需要dcast从data.table.而且,由于你想从聚集你的价值观var1和var2是一个字符串,则需要自定义聚合功能paste崩溃的结果:
library(data.table)
setDT(data)
dcast(data, type ~ subtype, value.var = c("var1", "var2"),
fun = function(v) paste0(unique(v), collapse = "|"))
# type var1_function_1 var1_function_2 var2_function_1 var2_function_2
# 1: A a b|c aa bb|cc|dd
# 2: B d|e f ee ff
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
62 次 |
| 最近记录: |