use*_*897 5 r aggregation plyr
我按班级R学生每周津贴数据集,如下所示:
Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220
Run Code Online (Sandbox Code Playgroud)
如何按组(年/班)汇总结果以获得总和和%(按组)?获得总和似乎很容易,ddply因为无法获得%by group part.
它适用于sum:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance))
Run Code Online (Sandbox Code Playgroud)
但它不适用于按部分分组的百分比:
summary <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance),
Allow_Pct=Allowance/sum(Allowance))
Run Code Online (Sandbox Code Playgroud)
理想的结果应如下所示:
Year Class Sum_Allow Allow_Pct
2013 Freshman 210 26%
2013 Junior 250 31%
2013 Sophomore 350 43%
2014 Freshman 360 40%
2014 Junior 330 36%
2014 Sophomore 220 24%
Run Code Online (Sandbox Code Playgroud)
我尝试了plyr软件包中的ddply,但是请告诉我这可能有用的方法.
这是使用data.table包的可能解决方案(假设您的数据被调用df)
library(data.table)
setDT(df)[, list(Sum_Allow = sum(Allowance)), keyby = list(Year, Class)][,
Allow_Pct := paste0(round(Sum_Allow/sum(Sum_Allow), 2)*100, "%"), by = Year][]
# Year Class Sum_Allow Allow_Pct
# 1: 2013 Freshman 210 26%
# 2: 2013 Junior 250 31%
# 3: 2013 Sophomore 350 43%
# 4: 2014 Freshman 360 40%
# 5: 2014 Junior 330 36%
# 6: 2014 Sophomore 220 24%
Run Code Online (Sandbox Code Playgroud)
贡献给@rawr,这是一个可能的基础R解决方案
df2 <- aggregate(Allowance ~ Class + Year, df, sum)
transform(df2, Allow_pct = ave(Allowance, Year, FUN = function(x) paste0(round(x/sum(x), 2)*100, "%")))
# Class Year Allowance Allow_pct
# 1 Freshman 2013 210 26%
# 2 Junior 2013 250 31%
# 3 Sophomore 2013 350 43%
# 4 Freshman 2014 360 40%
# 5 Junior 2014 330 36%
# 6 Sophomore 2014 220 24%
Run Code Online (Sandbox Code Playgroud)
您可以分两步完成此操作
my_data <- read.table(header = TRUE,
text = "Year ID Class Allowance
2013 123 Freshman 100
2013 234 Freshman 110
2013 345 Sophomore 150
2013 456 Sophomore 200
2013 567 Junior 250
2014 678 Junior 100
2014 789 Junior 230
2014 890 Freshman 110
2014 891 Freshman 250
2014 892 Sophomore 220")
library(plyr)
(summ <- ddply(my_data, .(Year, Class), summarize, Sum_Allow=sum(Allowance)))
# Year Class Sum_Allow
# 1 2013 Freshman 210
# 2 2013 Junior 250
# 3 2013 Sophomore 350
# 4 2014 Freshman 360
# 5 2014 Junior 330
# 6 2014 Sophomore 220
ddply(summ, .(Year), mutate, Allow_pct = Sum_Allow / sum(Sum_Allow) * 100)
# Year Class Sum_Allow Allow_pct
# 1 2013 Freshman 210 25.92593
# 2 2013 Junior 250 30.86420
# 3 2013 Sophomore 350 43.20988
# 4 2014 Freshman 360 39.56044
# 5 2014 Junior 330 36.26374
# 6 2014 Sophomore 220 24.17582
Run Code Online (Sandbox Code Playgroud)
我不知道其他人是否也会遇到这种情况,但是当我运行最初的尝试时,R 崩溃了而不是抛出警告。或者,如果我拼错了“Allow”而不是“allow”,它就会崩溃。我真的很讨厌这样;哈德利请修复
永远的基础R