我有一个数据框,有2组1时间变量和一个因变量.例如:
name <- c("a", "a", "a", "a", "a", "a","a", "a", "a", "b", "b", "b","b", "b", "b","b", "b", "b")
class <- c("c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3","c1", "c1", "c1", "c2", "c2", "c2", "c3", "c3", "c3")
year <- c("2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008", "2010", "2009", "2008")
value <- c(100, 33, 80, 90, 80, 100, 100, 90, 80, 90, 80, 100, 100, 90, 80, 99, 80, 100)
df <- data.frame(name, class, year, value)
df
Run Code Online (Sandbox Code Playgroud)
并希望在"class"和"name"的每个组合中应用"diff"函数.
我想要的输出应该是这样的:
name class year value.1
1 a c1 2010 -67
2 a c1 2009 47
3 b c1 2010 -10
4 b c1 2009 20
...
Run Code Online (Sandbox Code Playgroud)
我试过了
aggregate(value~name + class, data=df, FUN="diff")
Run Code Online (Sandbox Code Playgroud)
这不会产生我在大型数据集中寻找的解决方案.非常感谢你提前!
Sebatian
该plyr软件包将是你的朋友.该函数ddply采用a data.frame,为每个定义的子集应用函数,然后返回data.frame所有重组的一个.
最简单的解决方案是使用summarize和diff(value)为每个组合.(class, name):
library(plyr)
ddply(df, .(class, name), summarize, diff(value))
class name ..1
1 c1 a -67
2 c1 a 47
3 c1 b -10
4 c1 b 20
5 c2 a -10
6 c2 a 20
7 c2 b -10
8 c2 b -10
9 c3 a -10
10 c3 a -10
11 c3 b -19
12 c3 b 20
Run Code Online (Sandbox Code Playgroud)
为了在结果中获得多年,它需要更多参与:
ddply(df, .(class, name), summarize, year=head(year, -1), value=diff(value))
class name year value
1 c1 a 2010 -67
2 c1 a 2009 47
3 c1 b 2010 -10
4 c1 b 2009 20
5 c2 a 2010 -10
6 c2 a 2009 20
7 c2 b 2010 -10
8 c2 b 2009 -10
9 c3 a 2010 -10
10 c3 a 2009 -10
11 c3 b 2010 -19
12 c3 b 2009 20
Run Code Online (Sandbox Code Playgroud)