为什么使用igraph生成的随机网络中的最后一个索引的节点不被过度代表？

Question

为什么使用igraph生成的随机网络中的最后一个索引的节点不被过度代表？

我正在使用R接口的R igraph来使用函数生成具有恒定数量的节点n和边缘的随机定向网络（Erdös-Rényi）。msample_gnm

为了确保我了解所使用的算法，尽管没有C方面的经验，但还是检查了C源代码。据我了解C代码，有一条if语句应该导致带有索引的节点的过多表示n接收定向边缘。

这是真实的代码：https : //github.com/igraph/igraph/blob/7d4be976481356fa673772e6e7c30b637ea8dd52/src/games.c#L734-L736 ，这是我如何理解伪代码中的C代码：

# What is the maximum number of edges a network with n nodes could have
maxEdges := n*(n-1)

s := uniformly sample m integers from [1, maxEdges] without replacement

for (i = 1; i = m; i++) {

  # Get IDs for nodes A and B with equal probability over n
  nodeA := floor(s[i] / (n)) + 1
  nodeB := s - ((nodeA - 1) * n)

  # Since we do not allow loops, if nodeA = nodeB, assign n to nodeB
  if (nodeA = nodeB) {
    nodeB := n
  }

}

Run Code Online (Sandbox Code Playgroud)

但是，我还在R中运行了一个仿真，以确保确实如此：

testFun = function(n,m) {

  # Generate the network
  g = sample_gnm(n, m, directed = TRUE, loops = FALSE)

  # Find the "to" node IDs
  toEdgename = ends(g, E(g))[, 2]

  return(toEdgename)

}

# Create 1000 random networks and get the "to" node name for each edge
spam = replicate(1000, testFun(100, 9000))
# Plot the histogram
hist(sapply(1:ncol(spam), 
            # Count the percent of times the index 100 appeared per simulation
            function(ii) sum(spam[, ii] == 100) / 9000), 
     100)

Run Code Online (Sandbox Code Playgroud)

令我惊讶的是，这不会导致明显的偏差。这必须意味着我不理解C代码在做什么。谁能帮助我理解为什么这段C代码不会导致n索引的过多表示？

Answer 1

Kon*_*lph 3

原因是nodeB在你的伪代码中永远不可能是n（或者，在 C 代码中，它永远不可能是no_of_nodes - 1。（但是，nodeA 可以是n！）

\n\n

事实上，的最大值由maxEdges (mod n \xe2\x88\x921)nodeB给出，mod n \xe2\x88\x921 中的值在 [0, n \xe2\x88\x921[; 请注意，上限是排除的。

\n

归档时间：	6 年，10 月前
查看次数：	64 次
最近记录：	6 年，10 月前