是否有一种简单的方法(即,无需使用"for"循环)来执行以下操作:
我有几个数据框.我想使用plyr操作来总结它们.在这个例子中,我有两个数据框,东部和西部,我想用国家的花费和试验来总结它们.
这是示例数据框:
west <- data.frame(
spend = sample(50:100,50,replace=T),
trials = sample(100:200,50,replace=T),
country = sample(c("usa","canada","uk"),50,replace = T)
)
east <- data.frame(
spend = sample(50:100,50,replace=T),
trials = sample(100:200,50,replace=T),
country = sample(c("china","japan","skorea"),50,replace = T)
)
Run Code Online (Sandbox Code Playgroud)
以及两个数据帧的组合列表:
combined <- c(west,east)
Run Code Online (Sandbox Code Playgroud)
我想要做的是同时对这两个数据帧进行ddply类型的操作,并将输出作为一个列表(至少看起来最简单).例如,如果我只是在一个数据帧上运行,它将是这样的:
country.df <- ddply(west, .(country), summarise,
spend = sum(spend),
trials = sum(trials)
)
Run Code Online (Sandbox Code Playgroud)
但我想大规模地这样做.我尝试在llply参数中使用类似的语法,但这不起作用(我有一种感觉,我错过了一些非常明显的东西):
countries.list <- llply(combined, .(country), summarise,
spend = sum(spend),
trials = sum(trials)
)
Run Code Online (Sandbox Code Playgroud)
返回错误:"FUN中的错误(X [[1L]],...):尝试应用非功能"
...我可以通过编写一个函数来思考一种方法,然后将其传递给apply参数.但似乎llply应该能够处理这种"开箱即用",因为它是一个相当直接的使用工具的功能.
我在这里错过了什么?
这是另一个使用的解决方案dplyr,它是plyr数据帧的高度优化版本.dplyr语法非常直观,恕我直言的可读性要高得多plyr.如果说它更像诗歌(至少在我看来:),这不是一种夸张的说法)
combine = list(west = west, east = east)
library(dplyr)
lapply(combined, function(dat){
dat %.%
group_by(country) %.%
summarise(
trials = sum(trials),
spend = sum(spend)
) %.%
mutate(
status = ifelse(trials < 1000, "Good", "Bad")
)
})
Run Code Online (Sandbox Code Playgroud)
编辑.为了完整起见,这是data.table解决方案.需要注意的是对于大型的数据帧,dplyr并且data.table会吃plyr的午餐:)
library(data.table)
lapply(combined, function(dat){
data.table(dat)[
, list(trials = sum(trials), spend = sum(spend)),country][
, status := ifelse(trials < 1000, "Good", "Bad")]
})
Run Code Online (Sandbox Code Playgroud)
更新2:这是一个更简洁的dplyr解决方案版本
lapply(combined, chain, group_by(country),
summarise(trials = sum(trials), spend = sum(spend)),
mutate(status = ifelse(trials < 1000, "Good", "Bad"))
)
Run Code Online (Sandbox Code Playgroud)
我会这样做:
combined <- list(east, west)
lapply(combined, ddply, .(country), summarise, spend = sum(spend),
trials = sum(trials))
# [[1]]
# country spend trials
# 1 china 1572 2976
# 2 japan 1075 1989
# 3 skorea 1262 2526
#
# [[2]]
# country spend trials
# 1 canada 1459 3117
# 2 uk 910 1967
# 3 usa 1248 2660
Run Code Online (Sandbox Code Playgroud)