假设我有以下示例数据集:
df1 =
ID Group_Type Units
1 A 10
2 A 12
3 A 17
4 B 6
5 B 9
6 D 23
7 D 16
8 D 21
9 G 40
10 G 31
Run Code Online (Sandbox Code Playgroud)
组类型可能是英语字母表中A和Z之间的任何字母.有没有办法同时检测A,B,D和G组(或任何现有组),然后平均每组的单位,并将整个结果分配给矩阵?我认为它看起来像这样:
[,1]
[1,] 13
[2,] 7.5
[3,] 20
[4,] 35
Run Code Online (Sandbox Code Playgroud)
([1,] = A,等等...... [,1] =每组平均值)
我知道如何单独完成这些任务,但我不知道如何将它组合成一段易于管理的代码.我最近使用table,unlist和grep来挑选数据框中的单词,但我无法想象过去.
假设'df1'是'data.frame'.如果是'矩阵'(我怀疑)
df1 <- as.data.frame(df1, stringsAsFactors=FALSE)
df1$Units <- as.numeric(df1$Units)
Run Code Online (Sandbox Code Playgroud)
运用 dplyr
library(dplyr)
df1 %>%
group_by(Group_Type) %>%
summarise(Units=mean(Units))
# Group_Type Units
#1 A 13.0
#2 B 7.5
#3 D 20.0
#4 G 35.5
Run Code Online (Sandbox Code Playgroud)
或使用 base R
aggregate(Units~Group_Type, df1, FUN=mean, na.action=NULL)
# Group_Type Units
#1 A 13.0
#2 B 7.5
#3 D 20.0
#4 G 35.5
Run Code Online (Sandbox Code Playgroud)
要么 data.table
library(data.table)
setDT(df1)[, list(Units=mean(Units)), Group_Type]
# Group_Type Units
#1: A 13.0
#2: B 7.5
#3: D 20.0
#4: G 35.5
Run Code Online (Sandbox Code Playgroud)
对于dplyr,data.table和aggregate,您可以使用该选项从计算中na.rm=TRUE删除NA值mean.即mean(Units, na.rm=TRUE)对dplyr/data.table与...,FUN=mean, na.rm=TRUE, na.action=NULL)该aggregate
或者sqldf.在avg将删除NA/null默认值
library(sqldf)
sqldf('select Group_Type,
avg(Units) as Units
from df1
group by Group_Type',
method = "raw")
# Group_Type Units
#1 A 13.0
#2 B 7.5
#3 D 20.0
#4 G 35.5
Run Code Online (Sandbox Code Playgroud)
假设if是'Group_Type'的'Units'中的单个缺失值,并希望输出为NA.
df1$Units[3] <- NA
sqldf('select Group_Type,
case when count(Units) = count(*)
then avg(Units)
else null
end as Units
from df1
group by Group_Type',
method="raw")
# Group_Type Units
#1 A <NA>
#2 B 7.5
#3 D 20.0
#4 G 35.5
Run Code Online (Sandbox Code Playgroud)