Joh*_*ish 4 grouping r data-manipulation dataframe
我有这个数据...
Scientificname Level Zone levelmean
<chr> <int> <chr> <dbl>
1 Acanthostracion polygonius 3 B 0.135
2 Acanthostracion quadricornis 1 B 0.286
3 Acanthostracion quadricornis 1 D 0.228
4 Acanthostracion quadricornis 2 B 0.212
5 Acanthostracion quadricornis 2 D 0.181
6 Acanthostracion quadricornis 3 B 0.247
7 Acanthostracion quadricornis 3 D 0.222
8 Acanthostracion quadricornis 4 B 0.151
9 Acanthostracion quadricornis 4 D 0.202
10 Acanthostracion spp. 2 B 0.225
11 Achirus lineatus 1 B 0.204
12 Achirus lineatus 1 D 0.202
13 Achirus lineatus 2 B 0.219
14 Achirus lineatus 2 D 0.181
15 Achirus lineatus 3 B 0.145
16 Achirus lineatus 3 D 0.172
17 Achirus lineatus 4 B 0.135
18 Achirus lineatus 4 D 0.142
Run Code Online (Sandbox Code Playgroud)
structure(list(Scientificname = c("Acanthostracion polygonius",
"Acanthostracion quadricornis", "Acanthostracion quadricornis",
"Acanthostracion quadricornis", "Acanthostracion quadricornis",
"Acanthostracion quadricornis", "Acanthostracion quadricornis",
"Acanthostracion quadricornis", "Acanthostracion quadricornis",
"Acanthostracion spp.", "Achirus lineatus", "Achirus lineatus",
"Achirus lineatus", "Achirus lineatus", "Achirus lineatus", "Achirus lineatus",
"Achirus lineatus", "Achirus lineatus"), Level = c(3L, 1L, 1L,
2L, 2L, 3L, 3L, 4L, 4L, 2L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L),
Zone = c("B", "B", "D", "B", "D", "B", "D", "B", "D", "B",
"B", "D", "B", "D", "B", "D", "B", "D"), levelmean = c(0.134916351861846,
0.286175876741544, 0.228368580556262, 0.21169261421555, 0.181497972824247,
0.247241190981072, 0.221534021013127, 0.151406128200516,
0.201513319317781, 0.224860586436409, 0.204040161766372,
0.201884774621553, 0.219239071775499, 0.18121539764963, 0.144981540016618,
0.172393116267914, 0.134916351861846, 0.141662169454938)), row.names = c(NA,
-18L), groups = structure(list(Scientificname = c("Acanthostracion polygonius",
"Acanthostracion quadricornis", "Acanthostracion quadricornis",
"Acanthostracion quadricornis", "Acanthostracion quadricornis",
"Acanthostracion spp.", "Achirus lineatus", "Achirus lineatus",
"Achirus lineatus", "Achirus lineatus"), Level = c(3L, 1L, 2L,
3L, 4L, 2L, 1L, 2L, 3L, 4L), .rows = structure(list(1L, 2:3,
4:5, 6:7, 8:9, 10L, 11:12, 13:14, 15:16, 17:18), ptype = integer(0), class = c("vctrs_list_of",
"vctrs_vctr", "list"))), row.names = c(NA, -10L), class = c("tbl_df",
"tbl", "data.frame"), .drop = TRUE), class = c("grouped_df",
"tbl_df", "tbl", "data.frame"))
Run Code Online (Sandbox Code Playgroud)
这些是物种名称、排放水平(4 最低,1 最高)和 CPUE(每单位努力的渔获量,即捕获的鱼数)。
我想要做的是获得可量化的放电敏感性测量。所以我能想到的唯一方法是计算每个区域每个物种的每对排放水平的值之间的差异。例如,对于 B 区的 Acanthostracion quadricornis,我将取组 1 和 2、1 和 3、1 和 4、2 和 3、2 和 4 以及 3 和 4 之间的差异,然后取所有这些值的平均值。
它变得更加复杂,因为我只想对每个区域至少出现 2 个级别的物种执行此操作。此外,我有大约 130 个物种,它们在每个区域出现的级别各不相同。
我的理想输出是...
Scientificname Zone Sensitivity
1 Acanthostracion quadricornis B 0.06367512
2 Acanthostracion quadricornis D 0.02399275
3 Achirus lineatus B 0.05164523
4 Achirus lineatus D 0.03447407
Run Code Online (Sandbox Code Playgroud)
理想输出中的值可能已四舍五入。
我们可以将 with 重塑为“宽”格式,pivot_wider然后用于combn获取成对差异并采用meanwithrowMeans
library(dplyr)
library(tidyr)
tmp <- df1 %>%
filter(n() > 1) %>%
ungroup %>%
pivot_wider(names_from = Level, values_from = levelmean, values_fill = 0)
Sensitivity <- rowMeans(do.call(cbind, combn(tmp[-(1:2)], 2,
FUN = function(x) abs(x[1]-x[2]), simplify = FALSE)))
out <- tmp %>%
select(1:2) %>%
mutate(Sensitivity = Sensitivity)
Run Code Online (Sandbox Code Playgroud)
-输出
out
# A tibble: 4 x 3
Scientificname Zone Sensitivity
<chr> <chr> <dbl>
1 Acanthostracion quadricornis B 0.0733
2 Acanthostracion quadricornis D 0.0268
3 Achirus lineatus B 0.0520
4 Achirus lineatus D 0.0316
Run Code Online (Sandbox Code Playgroud)
或者没有重塑
library(purrr)
df1 %>%
filter(n() > 1) %>%
ungroup %>%
nest_by(Scientificname, Zone) %>%
ungroup %>%
transmute(Scientificname, Zone,
Sensitivity = map_dbl(data,
~ mean(abs(combn(.x$levelmean, 2, FUN = \(x) x[1]- x[2])))))
Run Code Online (Sandbox Code Playgroud)
-输出
# A tibble: 4 x 3
Scientificname Zone Sensitivity
<chr> <chr> <dbl>
1 Acanthostracion quadricornis B 0.0733
2 Acanthostracion quadricornis D 0.0268
3 Achirus lineatus B 0.0520
4 Achirus lineatus D 0.0316
Run Code Online (Sandbox Code Playgroud)
这是适合您的基本 R解决方案:
do.call(rbind, lapply(split(df, ~ Scientificname + Zone, drop = TRUE), function(x) {
if(nrow(x) >= 2) {
combs <- as.data.frame(t(combn(x$Level, m = 2)))
x[["Sensitivity"]] <-
mean(abs(mapply(function(a, b) {
x$levelmean[x$Level == a] - x$levelmean[x$Level == b]
}, combs$V1, combs$V2)))
head(x, 1)[, -which(names(x) %in% c("Level", "levelmean"))]
}
}))
# A tibble: 4 x 3
# Groups: Scientificname [2]
Scientificname Zone Sensitivity
<chr> <chr> <dbl>
1 Acanthostracion quadricornis B 0.0733
2 Achirus lineatus B 0.0520
3 Acanthostracion quadricornis D 0.0268
4 Achirus lineatus D 0.0316
Run Code Online (Sandbox Code Playgroud)
一个data.table选项使用combn
setDT(df)[
,
.SD[.N > 1],
.(Scientificname, Zone)
][
,
.(Sensitivity = mean(abs(combn(levelmean,2,diff))))
,
.(Scientificname, Zone)
]
Run Code Online (Sandbox Code Playgroud)
或者更短的(因为df已经是一个分组的 data.frame)
setDT(df %>%
filter(n() > 1))[
,
.(Sensitivity = mean(abs(combn(levelmean,2,diff))))
,
.(Scientificname, Zone)
]
Run Code Online (Sandbox Code Playgroud)
给
Scientificname Zone Sensitivity
1: Acanthostracion quadricornis B 0.07330964
2: Acanthostracion quadricornis D 0.02677209
3: Achirus lineatus B 0.05200446
4: Achirus lineatus D 0.03158168
Run Code Online (Sandbox Code Playgroud)