操作所有拆分数据集

Ada*_*dam 0 r

我正在画一个空白 - 我有一组来自数据框的51组分割数据,我想取每组高度的平均值.

print(dataset)
$`1`
ID   Species  Plant   Height 
1      A        1      42.7
2      A        1      32.5

$`2`
ID   Species  Plant   Height 
3      A        2      43.5
4      A        2      54.3
5      A        2      45.7
Run Code Online (Sandbox Code Playgroud)

...

...

...

$`51`
ID   Species  Plant   Height
134     A       51     52.5
135     A       51     61.2 
Run Code Online (Sandbox Code Playgroud)

我知道如何单独运行每个,但是有51个分割部分,这需要我很长时间.

我以为

mean(dataset[,4])
Run Code Online (Sandbox Code Playgroud)

可能有用,但它说我有错误的维数.我现在明白为什么这是不正确的,但我不知道如何平均所有的高度.

akr*_*run 5

dataset是一个list.我们可以使用lapply/sapply/vapplyetc循环遍历list元素并获得"Height"列的平均值.使用vapply,我们可以指定classlength输出(numeric(1)).这对于调试很有用.

vapply(dataset, function(x) mean(x[,4], na.rm=TRUE), numeric(1))
#     1        2       51 
#37.60000 47.83333 56.85000 
Run Code Online (Sandbox Code Playgroud)

或者另一种选择(如果我们有相同的columnames为列/数data.frame在S list),是使用rbindlistdata.tableoptionidcol = TRUE to generate a singledata.table . The '.id' column shows the name of the名单elements. We group by '.id' and get the意味着of theHeight`.

library(data.table)
rbindlist(dataset, idcol=TRUE)[, list(Mean=mean(Height, na.rm=TRUE)), by = .id]
#   .id     Mean
#1:   1 37.60000
#2:   2 47.83333
#3:  51 56.85000
Run Code Online (Sandbox Code Playgroud)

或如上所述的类似选项是unnestlibrary(tidyr)与".ID"专栏,通过".ID"分组返回一个数据集,我们summarise得到了mean'高度’的.

library(tidyr)
library(dplyr)
unnest(dataset, .id) %>%
          group_by(.id) %>% 
          summarise(Mean= mean(Height, na.rm=TRUE))
# .id     Mean
#1   1 37.60000
#2   2 47.83333
#3  51 56.85000
Run Code Online (Sandbox Code Playgroud)

的语法plyr

df1 <- unnest(dataset, .id)
ddply(df1, .(.id), summarise, Mean=mean(Height, na.rm=TRUE))
# .id     Mean
#1   1 37.60000
#2   2 47.83333
#3  51 56.85000
Run Code Online (Sandbox Code Playgroud)

数据

dataset <- structure(list(`1` = structure(list(ID = 1:2, Species = c("A", 
"A"), Plant = c(1L, 1L), Height = c(42.7, 32.5)), .Names = c("ID", 
"Species", "Plant", "Height"), class = "data.frame", row.names = c(NA, 
-2L)), `2` = structure(list(ID = 3:5, Species = c("A", "A", "A"
), Plant = c(2L, 2L, 2L), Height = c(43.5, 54.3, 45.7)), .Names = c("ID", 
"Species", "Plant", "Height"), class = "data.frame", row.names = c(NA, 
-3L)), `51` = structure(list(ID = 134:135, Species = c("A", "A"
), Plant = c(51L, 51L), Height = c(52.5, 61.2)), .Names = c("ID", 
"Species", "Plant", "Height"), class = "data.frame", row.names = c(NA, 
-2L))), .Names = c("1", "2", "51"))
Run Code Online (Sandbox Code Playgroud)