我正在画一个空白 - 我有一组来自数据框的51组分割数据,我想取每组高度的平均值.
print(dataset)
$`1`
ID Species Plant Height
1 A 1 42.7
2 A 1 32.5
$`2`
ID Species Plant Height
3 A 2 43.5
4 A 2 54.3
5 A 2 45.7
Run Code Online (Sandbox Code Playgroud)
...
...
...
$`51`
ID Species Plant Height
134 A 51 52.5
135 A 51 61.2
Run Code Online (Sandbox Code Playgroud)
我知道如何单独运行每个,但是有51个分割部分,这需要我很长时间.
我以为
mean(dataset[,4])
Run Code Online (Sandbox Code Playgroud)
可能有用,但它说我有错误的维数.我现在明白为什么这是不正确的,但我不知道如何平均所有的高度.
这dataset是一个list.我们可以使用lapply/sapply/vapplyetc循环遍历list元素并获得"Height"列的平均值.使用vapply,我们可以指定class和length输出(numeric(1)).这对于调试很有用.
vapply(dataset, function(x) mean(x[,4], na.rm=TRUE), numeric(1))
# 1 2 51
#37.60000 47.83333 56.85000
Run Code Online (Sandbox Code Playgroud)
或者另一种选择(如果我们有相同的columnames为列/数data.frame在S list),是使用rbindlist从data.table与optionidcol = TRUE to generate a singledata.table . The '.id' column shows the name of the名单elements. We group by '.id' and get the意味着of theHeight`.
library(data.table)
rbindlist(dataset, idcol=TRUE)[, list(Mean=mean(Height, na.rm=TRUE)), by = .id]
# .id Mean
#1: 1 37.60000
#2: 2 47.83333
#3: 51 56.85000
Run Code Online (Sandbox Code Playgroud)
或如上所述的类似选项是unnest从library(tidyr)与".ID"专栏,通过".ID"分组返回一个数据集,我们summarise得到了mean'高度’的.
library(tidyr)
library(dplyr)
unnest(dataset, .id) %>%
group_by(.id) %>%
summarise(Mean= mean(Height, na.rm=TRUE))
# .id Mean
#1 1 37.60000
#2 2 47.83333
#3 51 56.85000
Run Code Online (Sandbox Code Playgroud)
的语法plyr是
df1 <- unnest(dataset, .id)
ddply(df1, .(.id), summarise, Mean=mean(Height, na.rm=TRUE))
# .id Mean
#1 1 37.60000
#2 2 47.83333
#3 51 56.85000
Run Code Online (Sandbox Code Playgroud)
dataset <- structure(list(`1` = structure(list(ID = 1:2, Species = c("A",
"A"), Plant = c(1L, 1L), Height = c(42.7, 32.5)), .Names = c("ID",
"Species", "Plant", "Height"), class = "data.frame", row.names = c(NA,
-2L)), `2` = structure(list(ID = 3:5, Species = c("A", "A", "A"
), Plant = c(2L, 2L, 2L), Height = c(43.5, 54.3, 45.7)), .Names = c("ID",
"Species", "Plant", "Height"), class = "data.frame", row.names = c(NA,
-3L)), `51` = structure(list(ID = 134:135, Species = c("A", "A"
), Plant = c(51L, 51L), Height = c(52.5, 61.2)), .Names = c("ID",
"Species", "Plant", "Height"), class = "data.frame", row.names = c(NA,
-2L))), .Names = c("1", "2", "51"))
Run Code Online (Sandbox Code Playgroud)