使用 R 中的 igraph 进行引文网络的主路径分析

Raf*_*ael 3 r igraph

有人熟悉在 R 中使用 igraph 实现主路径分析(Hummon 和 Doreian 1989)的方法吗?

\n

以下是原始 Hummon 和 Doreian 文章中的示例。它追踪 40 篇有关 DNA 的期刊文章的引用情况。箭头及时向前移动(旧文章中的信息“流向”新文章)。

\n
dna_edges <- data.frame(from=c(1,2,3,3,3,5,6,9,12,12,15,15,10,11,11,13,14,14,14,16,16,17,19,19,19,19,19,20,20,20,20,24,24,21,21,23,22,26,27,29,30,31,31,32,32,32,33,33,35,35,36,36,36),\n                    to=c(8,18,4,5,21,12,9,12,15,29,29,22,17,13,20,20,16,20,31,17,20,34,20,24,25,21,25,31,22,30,22,28,37,22,32,27,27,27,32,32,40,32,40,36,38,33,32,35,38,39,38,39,40))\n\ndna_g <- graph_from_data_frame(dna_edges, directed=T)\nplot(dna_g,\n     layout=layout_with_sugiyama(dna_g,\n                                 layers = V(dna_g)$name)$layout)\n
Run Code Online (Sandbox Code Playgroud)\n

在此输入图像描述

\n

Liu 等人(2019)解释说,在引文网络中,节点可以是以下三者之一:

\n
    \n
  1. 来源:被引用但没有引用任何人
  2. \n
  3. Sinks:引用别人但从未被引用
  4. \n
  5. 中间体:引用和被引用
  6. \n
\n

因此,在这个例子中,我们有十篇文章是“源”,另外十篇文章是“汇”:

\n
dna_sources <- V(dna_g)$name[which(degree(dna_g, mode="in")==0)] # sources\n[1] "8"  "18" "4"  "34" "25" "28" "37" "40" "38" "39"\ndna_sinks <- V(dna_g)$name[which(degree(dna_g, mode="out")==0)] # sinks\n[1] "1"  "2"  "3"  "6"  "10" "11" "14" "19" "23" "26"\n
Run Code Online (Sandbox Code Playgroud)\n

主路径是将源连接到接收器的最常用路径。搜索路径计数(SPC)是实现这一目标的方法之一。

\n
\n

“引用链接\xe2\x80\x99s SPC 是在没有遍历从所有源\n到引文网络中所有接收器的所有可能引用链的情况下遍历链接的次数。要查找特定的 SPC \n链接,需要枚举所有可能的引用链,\ne从所有源发出并终止于所有接收器”(Liu et al. 2019: 381)

\n
\n

因此,为了继续进行,需要(i)选择一个源-汇对,(ii)找到连接这两个节点的所有路径,并在交叉时为每条边添加 +1 权重,(iii)重复其他源-汇对。

\n

关于如何执行(i)至(iii)有什么想法吗?

\n

Tim*_*Tim 5

程控

下面的函数实现了SPC。

spc <- function(g) {
  linegraph <- make_line_graph(g)
  source_edges <- V(linegraph)[degree(linegraph, mode = "in") == 0]
  sink_edges <- V(linegraph)[degree(linegraph, mode = "out") == 0]
  tabulate(
    unlist(
      lapply(
        source_edges,
        all_simple_paths,
        graph = linegraph,
        to = sink_edges,
        mode = "out")))
}
Run Code Online (Sandbox Code Playgroud)

主路径搜索

以下函数查找主路径。请注意,如果有多个主路径具有相同的总 SPC 值,则可能还有其他主路径。该函数返回它找到的第一个主路径。

main_search <- function(g) {
  linegraph <- make_line_graph(g)
  V(linegraph)$spc <- spc(g)
  source_edges <- V(linegraph)[degree(linegraph, mode = "in") == 0]
  sink_edges <- V(linegraph)[degree(linegraph, mode = "out") == 0]
  paths <- unlist(
    lapply(
      source_edges,
      all_simple_paths,
      graph = linegraph,
      to = sink_edges,
      mode = "out"),
    recursive = FALSE)
  path_lengths <- unlist(lapply(paths, function (x) sum(x$spc)))
  vertex_attr(linegraph, "main_path") <- 0
  vertex_attr(
    linegraph,
    "main_path",
    paths[[which(path_lengths == max(path_lengths))[[1]]]]) <- 1
  V(linegraph)$main_path
}
Run Code Online (Sandbox Code Playgroud)

测试

维基百科主路径图

主路径分析的维基百科文章有一张图,其中所有边都附加了 SPC 值。您可以在上面看到该图的副本。我将此图转录为 R,包括预期的 SPC 值和(全局)主路径。

library(tibble)

wikipedia_g <- graph_from_data_frame(
  tibble::tribble(
    ~from, ~to, ~expected_spc, ~expected_main_path
    "A", "C", 2, 0,
    "B", "C", 2, 0,
    "B", "D", 5, 1,
    "B", "J", 1, 0,
    "C", "E", 2, 0,
    "C", "H", 2, 0,
    "D", "F", 3, 1,
    "D", "I", 2, 0,
    "J", "M", 1, 0,
    "E", "G", 2, 0,
    "F", "H", 1, 0,
    "F", "I", 2, 1,
    "G", "H", 2, 0,
    "I", "L", 2, 0,
    "I", "M", 2, 1,
    "H", "K", 5, 0,
    "M", "N", 3, 1),
  directed = TRUE)
Run Code Online (Sandbox Code Playgroud)

期望函数输出的所有值都spc等于这些expected_spc值,情况就是如此。同样, 的值expected_main_path应与 的输出相匹配main_search,情况也是如此。

all(E(wikipedia_g)$expected_spc == spc(wikipedia_g))
# TRUE
all(E(wikipedia_g)$expected_main_path == main_search(wikipedia_g))
# TRUE
Run Code Online (Sandbox Code Playgroud)