有人熟悉在 R 中使用 igraph 实现主路径分析(Hummon 和 Doreian 1989)的方法吗?
\n以下是原始 Hummon 和 Doreian 文章中的示例。它追踪 40 篇有关 DNA 的期刊文章的引用情况。箭头及时向前移动(旧文章中的信息“流向”新文章)。
\ndna_edges <- data.frame(from=c(1,2,3,3,3,5,6,9,12,12,15,15,10,11,11,13,14,14,14,16,16,17,19,19,19,19,19,20,20,20,20,24,24,21,21,23,22,26,27,29,30,31,31,32,32,32,33,33,35,35,36,36,36),\n to=c(8,18,4,5,21,12,9,12,15,29,29,22,17,13,20,20,16,20,31,17,20,34,20,24,25,21,25,31,22,30,22,28,37,22,32,27,27,27,32,32,40,32,40,36,38,33,32,35,38,39,38,39,40))\n\ndna_g <- graph_from_data_frame(dna_edges, directed=T)\nplot(dna_g,\n layout=layout_with_sugiyama(dna_g,\n layers = V(dna_g)$name)$layout)\nRun Code Online (Sandbox Code Playgroud)\n\nLiu 等人(2019)解释说,在引文网络中,节点可以是以下三者之一:
\n因此,在这个例子中,我们有十篇文章是“源”,另外十篇文章是“汇”:
\ndna_sources <- V(dna_g)$name[which(degree(dna_g, mode="in")==0)] # sources\n[1] "8" "18" "4" "34" "25" "28" "37" "40" "38" "39"\ndna_sinks <- V(dna_g)$name[which(degree(dna_g, mode="out")==0)] # sinks\n[1] "1" "2" "3" "6" "10" "11" "14" "19" "23" "26"\nRun Code Online (Sandbox Code Playgroud)\n主路径是将源连接到接收器的最常用路径。搜索路径计数(SPC)是实现这一目标的方法之一。
\n\n\n“引用链接\xe2\x80\x99s SPC 是在没有遍历从所有源\n到引文网络中所有接收器的所有可能引用链的情况下遍历链接的次数。要查找特定的 SPC \n链接,需要枚举所有可能的引用链,\ne从所有源发出并终止于所有接收器”(Liu et al. 2019: 381)
\n
因此,为了继续进行,需要(i)选择一个源-汇对,(ii)找到连接这两个节点的所有路径,并在交叉时为每条边添加 +1 权重,(iii)重复其他源-汇对。
\n关于如何执行(i)至(iii)有什么想法吗?
\n下面的函数实现了SPC。
spc <- function(g) {
linegraph <- make_line_graph(g)
source_edges <- V(linegraph)[degree(linegraph, mode = "in") == 0]
sink_edges <- V(linegraph)[degree(linegraph, mode = "out") == 0]
tabulate(
unlist(
lapply(
source_edges,
all_simple_paths,
graph = linegraph,
to = sink_edges,
mode = "out")))
}
Run Code Online (Sandbox Code Playgroud)
以下函数查找主路径。请注意,如果有多个主路径具有相同的总 SPC 值,则可能还有其他主路径。该函数返回它找到的第一个主路径。
main_search <- function(g) {
linegraph <- make_line_graph(g)
V(linegraph)$spc <- spc(g)
source_edges <- V(linegraph)[degree(linegraph, mode = "in") == 0]
sink_edges <- V(linegraph)[degree(linegraph, mode = "out") == 0]
paths <- unlist(
lapply(
source_edges,
all_simple_paths,
graph = linegraph,
to = sink_edges,
mode = "out"),
recursive = FALSE)
path_lengths <- unlist(lapply(paths, function (x) sum(x$spc)))
vertex_attr(linegraph, "main_path") <- 0
vertex_attr(
linegraph,
"main_path",
paths[[which(path_lengths == max(path_lengths))[[1]]]]) <- 1
V(linegraph)$main_path
}
Run Code Online (Sandbox Code Playgroud)
主路径分析的维基百科文章有一张图,其中所有边都附加了 SPC 值。您可以在上面看到该图的副本。我将此图转录为 R,包括预期的 SPC 值和(全局)主路径。
library(tibble)
wikipedia_g <- graph_from_data_frame(
tibble::tribble(
~from, ~to, ~expected_spc, ~expected_main_path
"A", "C", 2, 0,
"B", "C", 2, 0,
"B", "D", 5, 1,
"B", "J", 1, 0,
"C", "E", 2, 0,
"C", "H", 2, 0,
"D", "F", 3, 1,
"D", "I", 2, 0,
"J", "M", 1, 0,
"E", "G", 2, 0,
"F", "H", 1, 0,
"F", "I", 2, 1,
"G", "H", 2, 0,
"I", "L", 2, 0,
"I", "M", 2, 1,
"H", "K", 5, 0,
"M", "N", 3, 1),
directed = TRUE)
Run Code Online (Sandbox Code Playgroud)
期望函数输出的所有值都spc等于这些expected_spc值,情况就是如此。同样, 的值expected_main_path应与 的输出相匹配main_search,情况也是如此。
all(E(wikipedia_g)$expected_spc == spc(wikipedia_g))
# TRUE
all(E(wikipedia_g)$expected_main_path == main_search(wikipedia_g))
# TRUE
Run Code Online (Sandbox Code Playgroud)