我有数据描述个人(玩家1)与另外两个人(玩家2和玩家3)的互动.每一行都描述了一个独特的玩家组合,但我想分别将玩家1分析为玩家2,将玩家1分析为玩家3对子.为了实现这一点,我设想了某种堆叠,我可以为第二和第三玩家融合描述性变量,同时保持每一行中玩家1的数据.使事情变得更复杂我为每个人提供了多个描述性变量.
这里有一小部分数据可供使用(我实际上对于玩家2和3有更多的描述性变量,我想叠加/融合):
p1_id <- c(1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1021, 1032, 1032, 1032, 1032, 1032, 1032)
p1_age <- c(53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 53, 45, 45, 45, 45, 45)
p2_id <- c(14372, 15022, 9072, 15052, 2161, 18381, 15032, 14451, 16322, 11142, 8182, 1131, 7092, 4071, 16191, 18142, 4222, 11052, 2202, 16151)
p2_money <- c(4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 10, 0, 0, 10, 0, 6, 6, 4, 6, 6)
p2_age <- c(50, 33, 56, 23, 29, 26, 28, 34, 20, 41, 34, 45, 23, 35, 25, 30, 40, 41, 45, 28)
p3_id <- c(5151, 16181, 5182, 18462, 7231, 14372, 3052, 14532, 4152, 15012, 19212, 9062, 9032, 18351, 14461, 16291, 17102, 10102, 7051, 16282)
p3_money <- c(4, 2, 2, 2, 2, 2, 2, 2, 2, 2, 0, 10, 10, 0, 10, 6, 6, 4, 6, 4)
p3_age <- c(30, 29, 22, 22, 43, 50, 23, 32, 31, 46, 36, 36, 21, 27, 49, 38, 40, 48, 26, 32)
df <- data.frame(p1_id, p1_age, p2_id, p2_money, p2_age, p3_id, p3_money, p3_age)
Run Code Online (Sandbox Code Playgroud)
数据帧:
p1_id p1_age p2_id p2_money p2_age p3_id p3_money p3_age
1 1021 53 14372 4 50 5151 4 30
2 1021 53 15022 2 33 16181 2 29
3 1021 53 9072 2 56 5182 2 22
4 1021 53 15052 2 23 18462 2 22
5 1021 53 2161 2 29 7231 2 43
6 1021 53 18381 2 26 14372 2 50
7 1021 53 15032 2 28 3052 2 23
8 1021 53 14451 2 34 14532 2 32
9 1021 53 16322 2 20 4152 2 31
10 1021 53 11142 2 41 15012 2 46
11 1021 53 8182 10 34 19212 0 36
12 1021 53 1131 0 45 9062 10 36
13 1021 53 7092 0 23 9032 10 21
14 1021 53 4071 10 35 18351 0 27
15 1032 53 16191 0 25 14461 10 49
16 1032 45 18142 6 30 16291 6 38
17 1032 45 4222 6 40 17102 6 40
18 1032 45 11052 4 41 10102 4 48
19 1032 45 2202 6 45 7051 6 26
20 1032 45 16151 6 28 16282 4 32
Run Code Online (Sandbox Code Playgroud)
如果上面的描述太混乱,我希望重构数据看起来如何:
row p1_id p1_age p23_id p23_money p23_age
1 1021 53 14372 4 50
2 1021 53 15022 2 33
3 1021 53 9072 2 56
4 1021 53 15052 2 23
5 1021 53 2161 2 29
6 1021 53 18381 2 26
7 1021 53 15032 2 28
8 1021 53 14451 2 34
9 1021 53 16322 2 20
10 1021 53 11142 2 41
11 1021 53 8182 10 34
12 1021 53 1131 0 45
13 1021 53 7092 0 23
14 1021 53 4071 10 35
15 1032 53 16191 0 25
16 1032 45 18142 6 30
17 1032 45 4222 6 40
18 1032 45 11052 4 41
19 1032 45 2202 6 45
20 1032 45 16151 6 28
21 1021 53 5151 4 30
22 1021 53 16181 2 29
23 1021 53 5182 2 22
24 1021 53 18462 2 22
25 1021 53 7231 2 43
26 1021 53 14372 2 50
27 1021 53 3052 2 23
28 1021 53 14532 2 32
28 1021 53 4152 2 31
30 1021 53 19212 0 36
31 1021 53 9062 10 36
32 1021 53 9032 10 21
33 1021 53 18351 0 27
34 1032 53 16191 0 25
35 1032 53 14461 10 49
36 1032 53 16291 6 38
37 1032 53 17102 6 40
38 1032 53 10102 4 48
39 1032 53 7051 6 26
40 1032 53 16282 4 32
Run Code Online (Sandbox Code Playgroud)
谢谢你的帮助!
如果您修改列名如下,这很容易做到:
names(df) <- gsub("(.*)_(.*)", "\\2\\.\\1", names(df))
names(df)
# [1] "id.p1" "age.p1" "id.p2" "money.p2"
# [5] "age.p2" "id.p3" "money.p3" "age.p3"
Run Code Online (Sandbox Code Playgroud)
接下来,使用data.frame您的"row.names"作为基础R中的"idvar" reshape().
reshape(df, direction = "long", idvar = "row.names",
timevar = "person", varying = 3:8)
# id.p1 age.p1 person id money age row.names
# 1.p2 1021 53 p2 14372 4 50 1
# 2.p2 1021 53 p2 15022 2 33 2
# 3.p2 1021 53 p2 9072 2 56 3
# 4.p2 1021 53 p2 15052 2 23 4
# 5.p2 1021 53 p2 2161 2 29 5
# 6.p2 1021 53 p2 18381 2 26 6
# 7.p2 1021 53 p2 15032 2 28 7
# 8.p2 1021 53 p2 14451 2 34 8
# 9.p2 1021 53 p2 16322 2 20 9
# 10.p2 1021 53 p2 11142 2 41 10
# 11.p2 1021 53 p2 8182 10 34 11
# 12.p2 1021 53 p2 1131 0 45 12
# 13.p2 1021 53 p2 7092 0 23 13
# 14.p2 1021 53 p2 4071 10 35 14
# 15.p2 1032 53 p2 16191 0 25 15
# 16.p2 1032 45 p2 18142 6 30 16
# 17.p2 1032 45 p2 4222 6 40 17
# 18.p2 1032 45 p2 11052 4 41 18
# 19.p2 1032 45 p2 2202 6 45 19
# 20.p2 1032 45 p2 16151 6 28 20
# 1.p3 1021 53 p3 5151 4 30 1
# 2.p3 1021 53 p3 16181 2 29 2
# 3.p3 1021 53 p3 5182 2 22 3
# 4.p3 1021 53 p3 18462 2 22 4
# 5.p3 1021 53 p3 7231 2 43 5
# 6.p3 1021 53 p3 14372 2 50 6
# 7.p3 1021 53 p3 3052 2 23 7
# 8.p3 1021 53 p3 14532 2 32 8
# 9.p3 1021 53 p3 4152 2 31 9
# 10.p3 1021 53 p3 15012 2 46 10
# 11.p3 1021 53 p3 19212 0 36 11
# 12.p3 1021 53 p3 9062 10 36 12
# 13.p3 1021 53 p3 9032 10 21 13
# 14.p3 1021 53 p3 18351 0 27 14
# 15.p3 1032 53 p3 14461 10 49 15
# 16.p3 1032 45 p3 16291 6 38 16
# 17.p3 1032 45 p3 17102 6 40 17
# 18.p3 1032 45 p3 10102 4 48 18
# 19.p3 1032 45 p3 7051 6 26 19
# 20.p3 1032 45 p3 16282 4 32 20
Run Code Online (Sandbox Code Playgroud)
dcast()"reshape2"希望有更精通"reshape2"软件包(或者可能带有"plyr")的人能够提出比下面更简洁的解决方案.该解决方案涉及:
colsplit()(来自"reshape2")生成几个新列.dcast()以获得所需形式.这是它的样子:
df$id <- 1:nrow(df)
df2 <- melt(df, id.vars=c("id", "p1_id", "p1_age"))
df2 <- cbind(df2[-4],
colsplit(df2$variable, "_", c("person", "var")))
head(df2)
out <- dcast(df2, id + p1_id + p1_age + person ~ var)
list(head(out), tail(out))
# [[1]]
# id p1_id p1_age person age id money
# 1 1 1021 53 p2 50 14372 4
# 2 1 1021 53 p3 30 5151 4
# 3 2 1021 53 p2 33 15022 2
# 4 2 1021 53 p3 29 16181 2
# 5 3 1021 53 p2 56 9072 2
# 6 3 1021 53 p3 22 5182 2
#
# [[2]]
# id p1_id p1_age person age id money
# 35 18 1032 45 p2 41 11052 4
# 36 18 1032 45 p3 48 10102 4
# 37 19 1032 45 p2 45 2202 6
# 38 19 1032 45 p3 26 7051 6
# 39 20 1032 45 p2 28 16151 6
# 40 20 1032 45 p3 32 16282 4
Run Code Online (Sandbox Code Playgroud)
所以,基本上,无论你采取什么方法,看起来你都必须对你进行一些预处理,data.frame以便对这种转换更加友好.