假设您得到了一个如下所示的汇总交叉表:
kdat <- data.frame(positive = c(8, 4), negative = c(3, 6),
row.names = c("positive", "negative"))
kdat
#> positive negative
#> positive 8 3
#> negative 4 6
Run Code Online (Sandbox Code Playgroud)
现在您想要计算 Cohen 的 Kappa,这是一个用于确定两个评估者之间一致性的统计数据。给定这种格式的数据,您可以使用psych::cohen.kappa:
psych::cohen.kappa(kdat)$kappa
#> Warning in any(abs(bounds)): coercing argument of type 'double' to logical
#> [1] 0.3287671
Run Code Online (Sandbox Code Playgroud)
这让我很恼火,因为我更喜欢我的数据又长又薄,这样我就可以使用irr::kappa2. 出于某种原因我更喜欢类似的功能。所以我组装了这个函数来重新格式化我的数据:
longify_xtab <- function(x) {
nm <- names(x)
# Convert to table
x_tab <- as.table(as.matrix(x))
# Just in case there are now rownames, required for conversion
rownames(x_tab) <- nm
# Use appropriate method to get a df
x_df <- as.data.frame(x_tab)
# Restructure df in a painful and unsightly way
data.frame(lapply(x_df[seq_len(ncol(x_df) - 1)], function(col) {
rep(col, x_df$Freq)
}))
}
Run Code Online (Sandbox Code Playgroud)
该函数返回以下格式:
longify_xtab(kdat)
#> Var1 Var2
#> 1 positive positive
#> 2 positive positive
#> 3 positive positive
#> 4 positive positive
#> 5 positive positive
#> 6 positive positive
#> 7 positive positive
#> 8 positive positive
#> 9 negative positive
#> 10 negative positive
#> 11 negative positive
#> 12 negative positive
#> 13 positive negative
#> 14 positive negative
#> 15 positive negative
#> 16 negative negative
#> 17 negative negative
#> 18 negative negative
#> 19 negative negative
#> 20 negative negative
#> 21 negative negative
Run Code Online (Sandbox Code Playgroud)
...您可以通过以下方式计算 Kappa irr::kappa2:
irr::kappa2(longify_xtab(kdat))$value
#> [1] 0.3287671
Run Code Online (Sandbox Code Playgroud)
我的问题是:
有没有更好的方法来做到这一点(在基础 R 中或使用包)?在我看来,这是一个相对简单的问题,但通过尝试解决它,我意识到它非常棘手,至少在我看来是这样。
kdat <- data.frame(positive = c(8, 4),
negative = c(3, 6),
row.names = c("positive", "negative"))
library(tidyverse)
kdat %>%
rownames_to_column() %>% # set row names as a variable
gather(rowname2,value,-rowname) %>% # reshape
rowwise() %>% # for every row
mutate(value = list(1:value)) %>% # create a series of numbers based on the value
unnest(value) %>% # unnest the counter
select(-value) # remove the counts
# # A tibble: 21 x 2
# rowname rowname2
# <chr> <chr>
# 1 positive positive
# 2 positive positive
# 3 positive positive
# 4 positive positive
# 5 positive positive
# 6 positive positive
# 7 positive positive
# 8 positive positive
# 9 negative positive
# 10 negative positive
# # ... with 11 more rows
Run Code Online (Sandbox Code Playgroud)
以下是一些来自以下公共领域的代码:http://www.cookbook-r.com/Manipulated_data/Converting_ Between_data_frames_and_contingency_tables/,我用它来完全按照您的要求进行操作。
# Convert from data frame of counts to data frame of cases.
# `countcol` is the name of the column containing the counts
countsToCases <- function(x, countcol = "Freq") {
# Get the row indices to pull from x
idx <- rep.int(seq_len(nrow(x)), x[[countcol]])
# Drop count column
x[[countcol]] <- NULL
# Get the rows from x
x[idx, ]
}
Run Code Online (Sandbox Code Playgroud)