Adding ties in a network based on node attribute (weight)

wak*_*ake 5 simulation network-programming r igraph

I am simulating network change over time using igraph in r and am looking for an efficient and scalable way to code this for use in business.

The main drivers of network change are:

  • New nodes
  • New ties
  • New node weights

In the first stage, in the network of 100 nodes 10% are randomly connected. The node weights are also assigned at random. The network is undirected. There are 100 stages.

In each of the following stages:

  • Ten (10) new nodes occur randomly and are added to the model. They are unconnected in this stage.
  • The node weights of these new nodes are assigned at random.
  • The new ties between two nodes in time t+1 are a probabilistic function of the network distance between these nodes in the network and the node weight at previous stage (time t). Nodes at greater network distance are less likely to connect than nodes nodes at shorter distance. The decay function is exponential.
  • Nodes with greater weight attract more ties than those with smaller weights. The relationship between node weight and increased probability of tie-formation should be super-linear.
  • In each step, 10% of the total existing ties is added as a function what the previous point.
  • The network ties and nodes from previous stages are carried over (i.e. the networks are cumulative).
  • At each stage, the node weight can change randomly up to 10% of its current weight (i.e. a weight of 1 can change to {0.9-1.1} in t+1)
  • At each stage, the network needs to be saved.

How can this be written?

Edit: these networks will be examined on a number of graph-level characteristics at a later stage


This is what I have now, but doesn't include the node weights. How do we include this efficiently?

# number of nodes and ties to start with
n = 100
p = 0.1
r = -2


# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)
#plot(net1)
write_graph(net1, paste0("D://network_sim_0.dl"), format="pajek")


for(i in seq(1,100,1)){

print(i) 
time <- proc.time()

net1 <- read_graph(paste0("D://network_sim_",i-1,".dl"), format="pajek")  

# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0)  # 10% of those in net1

# add 10 new nodes
net2 <- add_vertices(net1, 10)

# get network distance for each dyad in net1 + the new nodes
spel <- data.table::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")

# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) +1

# assign a probability (?) with a exponential decay function. Smallest distance == greatest prob.
spel$prob <- -0.5 * spel$distance^r   # is this what I need?
#hist(spel$prob, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")

# lets sample new ties from this probability
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, size = new_ties, prob=spel$prob))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))

# save the network
write_graph(net2, paste0("D://network_sim_",i,".dl"), format="pajek")

print(proc.time()-time)
}

    
Run Code Online (Sandbox Code Playgroud)

mus*_*ben 4

据我所知,我将尝试回答这个问题。

我做了几个假设;我应该澄清它们。

首先,节点权重将遵循什么分布?

如果您正在对自然发生的事件进行建模,则节点权重很可能遵循正态分布。然而,如果事件是面向社交的并且其他社交机制影响事件或事件流行度,则节点权重可能遵循不同的分布——很可能是功率分布。

主要来说,这可能适用于与客户相关的行为。因此,考虑为节点权重建模的随机分布对您来说是有益的。

对于以下示例,我使用正态分布来定义每个节点的正态分布值。在每次迭代结束时,我让节点权重更改为 %10 {.9,1.10}。

其次,平局形成的概率函数是什么?

我们有两个用于做出决策的输入:距离权重和节点权重。因此,我们将使用这两个输入创建一个函数并定义概率权重。我的理解是,距离越小,可能性就越高。然后节点权重越大,可能性也越高。

这可能不是最好的解决方案,但我做了以下操作:

首先,计算距离的衰减函数,并将其称为距离权重。然后,我获取节点权重并使用距离和节点权重创建一个超线性函数。

因此,您可以使用一些参数,看看是否得到您想要的结果。

顺便说一句,我没有改变你的大部分代码。另外,我并没有过多关注处理时间。仍有改进的空间。

library(scales)
library(stringr)
library(igraph)

# number of nodes and ties to start with
n <- 100
p <- 0.2
number_of_simulation <- 100

new_nodes <- 15 ## new nodes for each iteration


## Parameters ##

## How much distance will be weighted? 
## Exponential decay parameter
beta_distance_weight <- -.4

## probability function parameters for the distance and node weights 

impact_of_distances <- 0.3 ## how important is the distance weights?
impact_of_nodes <- 0.7     ## how important is the node weights?
power_base  <- 5.5         ## how important is having a high score? Prefential attachment or super-linear function

# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)

# Assign normally distributed random weights
V(net1)$weight <- rnorm(vcount(net1))

graph_list <- list(net1)

for(i in seq(1,number_of_simulation,1)){

print(i) 
time <- proc.time()

net1 <- graph_list[[i]]

# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0)  # 10% of those in net1
# add 10 new nodes
net2 <- add_vertices(net1, new_nodes)

## Add random weights to new nodes from a normal distribution
V(net2)$weight[is.na(V(net2)$weight)] <- rnorm(new_nodes)

# get network distance for each dyad in net1 + the new nodes
spel <- reshape2::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")

# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) +1

# Do not select nodes if they are self-looped or have already link
spel <- spel[!spel$distance %in% c(0,1) , ]

# Assign distance weights for each dyads
spel$distance_weight <- exp(beta_distance_weight*spel$distance)  

#hist(spel$distance_weight, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")

## Get the node weights for merging the data with the distances 
node_weights <- data.frame(id= 1:vcount(net2),node_weight=V(net2)$weight)
spel <- merge(spel,node_weights,by.x='node_j',by.y='id')

## probability is the function of distince and node weight
spel$prob <- power_base^((impact_of_distances * spel$distance_weight) + (impact_of_nodes * spel$node_weight))
spel <- spel[order(spel$prob, decreasing = T),]

# lets sample new ties from this probability with a beta distribution 
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, new_ties, p = 1/spel$index ))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))

# change in the weights up to %10 
V(net2)$weight <- V(net2)$weight*rescale(rnorm(vcount(net2)), to = c(0.9, 1.1))

graph_list[[i+1]] <- net2

print(proc.time()-time)
}
Run Code Online (Sandbox Code Playgroud)

要获取结果或将图表写入 Pajek,您可以使用以下命令:

lapply(seq_along(graph_list),function(x) write_graph(graph_list[[x]], paste0("network_sim_",x,".dl"), format="pajek"))
Run Code Online (Sandbox Code Playgroud)

编辑

要更改节点权重,可以使用以下语法。

library(scales)
library(stringr)
library(igraph)

# number of nodes and ties to start with
n <- 100
p <- 0.2
number_of_simulation <- 100

new_nodes <- 10 ## new nodes for each iteration


## Parameters ##

## How much distance will be weighted? 
## Exponential decay parameter
beta_distance_weight <- -.4

## Node weights for power-law dist 
power_law_parameter <- -.08
## probability function parameters for the distance and node weights 

impact_of_distances <- 0.3 ## how important is the distance weights?
impact_of_nodes <- 0.7     ## how important is the node weights?
power_base  <- 5.5         ## how important is having a high score? Prefential attachment or super-linear function

# build random network
net1 <- erdos.renyi.game(n, p, "gnp", directed = F)

## MADE A CHANGE HERE 
# Assign normally distributed random weights
V(net1)$weight <- runif(vcount(net1))^power_law_parameter

graph_list <- list(net1)

for(i in seq(1,number_of_simulation,1)){

print(i) 
time <- proc.time()

net1 <- graph_list[[i]]

# how many will we build in next stage?
new_ties <- round(0.1*ecount(net1), 0)  # 10% of those in net1
# add 10 new nodes
net2 <- add_vertices(net1, new_nodes)

## Add random weights to new nodes from a normal distribution
V(net2)$weight[is.na(V(net2)$weight)] <- runif(new_nodes)^power_law_parameter

# get network distance for each dyad in net1 + the new nodes
spel <- reshape2::melt(shortest.paths(net2))
names(spel) <- c("node_i", "node_j", "distance")

# replace inf with max observed value + 1
spel$distance[which(!is.finite(spel$distance))] <- max(spel$distance[is.finite(spel$distance)]) + 2

# Do not select nodes if they are self-looped or have already link
spel <- spel[!spel$distance %in% c(0,1) , ]

# Assign distance weights for each dyads
spel$distance_weight <- exp(beta_distance_weight*spel$distance)  

#hist(spel$distance_weight, freq=T, xlab="Probability of tie-formation")
#hist(spel$distance, freq=T, xlab="Network Distance")

## Get the node weights for merging the data with the distances 
node_weights <- data.frame(id= 1:vcount(net2),node_weight=V(net2)$weight)
spel <- merge(spel,node_weights,by.x='node_j',by.y='id')

## probability is the function of distince and node weight
spel$prob <- power_base^((impact_of_distances * spel$distance_weight) + (impact_of_nodes * spel$node_weight))
spel <- spel[order(spel$prob, decreasing = T),]

# lets sample new ties from this probability with a beta distribution 
spel$index <- seq_along(spel$prob)
to_build <- subset(spel, index %in% sample(spel$index, new_ties, p = 1/spel$index ))
net2 <- add_edges(net2, as.numeric(unlist(str_split(paste(to_build$node_i, to_build$node_j), " "))))

# change in the weights up to %10 
V(net2)$weight <- V(net2)$weight*rescale(rnorm(vcount(net2)), to = c(0.9, 1.1))

graph_list[[i+1]] <- net2

print(proc.time()-time)
}
Run Code Online (Sandbox Code Playgroud)

结果

因此,为了验证代码是否有效,我检查了有限节点的少量迭代:4 个节点的 10 次迭代。对于每次迭代,我添加了 3 个新节点和一个新关系。

我使用三种不同的设置进行了模拟。

第一个设置仅关注距离的权重函数:节点越接近,它们之间形成新关系的可能性就越大。

第二个设置仅关注节点的权重函数:节点的权重越多,与它们形成新关系的可能性就越大。

第三个设置重点关注距离和节点的权重函数:节点的权重越多且距离越近,与它们形成新关系的可能性就越大。

请观察网络行为,每种设置如何提供不同的结果。

  1. 只有距离才重要

在此输入图像描述

  1. 只有节点权重很重要 在此输入图像描述

  2. 节点权重和距离都很重要

在此输入图像描述

  • 您可以更改重量分布。我现在将编辑幂律分布。 (2认同)