计算数据帧的差异

luc*_*ano 1 r mean

我有一个如下所示的数据框:

set.seed(50)
data.frame(distance=c(rep("long", 5), rep("short", 5)),
           year=rep(2002:2006),
           mean.length=rnorm(10))

   distance year mean.length
1      long 2002  0.54966989
2      long 2003 -0.84160374
3      long 2004  0.03299794
4      long 2005  0.52414971
5      long 2006 -1.72760411
6     short 2002 -0.27786453
7     short 2003  0.36082844
8     short 2004 -0.59091244
9     short 2005  0.97559055
10    short 2006 -1.44574995
Run Code Online (Sandbox Code Playgroud)

我需要计算每年mean.length之间long和之间的差异short.这样做最快的方法是什么?

Adr*_*ian 5

这是使用plyr的一种方法:

set.seed(50)
df <- data.frame(distance=c(rep("long", 5),rep("short", 5)),
                 year=rep(2002:2006),
                 mean.length=rnorm(10))

library(plyr)
aggregation.fn <- function(df) {
  data.frame(year=df$year[1],
             diff=(df$mean.length[df$distance == "long"] -
                   df$mean.length[df$distance == "short"]))}
new.df <- ddply(df, "year", aggregation.fn)
Run Code Online (Sandbox Code Playgroud)

给你

> new.df
  year       diff
1 2002  0.8275344
2 2003 -1.2024322
3 2004  0.6239104
4 2005 -0.4514408
5 2006 -0.2818542
Run Code Online (Sandbox Code Playgroud)

第二种方式

df <- df[order(df$year, df$distance), ]
n <- dim(df)[1]
df$new.year <- c(1, df$year[2:n] != df$year[1:(n-1)])
df$diff <- c(-diff(df$mean.length), NA)
df$diff[!df$new.year] <- NA
new.df.2 <- df[!is.na(df$diff), c("year", "diff")]

all(new.df.2 == new.df)  # True
Run Code Online (Sandbox Code Playgroud)