如何在R中可视化大型网络？

Question

如何在R中可视化大型网络？

maj*_*jom 56 visualization r graph social-networking graph-visualization

网络可视化在实践中在科学中变得普遍.但随着网络规模的扩大,常见的可视化变得不那么有用.有太多的节点/顶点和链接/边缘.通常,可视化工作最终会产生"毛球".

已经提出了一些新方法来克服这个问题,例如:

边缘捆绑:
- http://vis.stanford.edu/papers/divided-edge-bundling或
- https://gephi.org/tag/edge-bundling/
层次边缘捆绑:
- http://graphics.cs.illinois.edu/sites/graphics.dev.engr.illinois.edu/files/edgebundles.pdf
组属性布局:
- http://wiki.cytoscape.org/Cytoscape_3/UserManual
- 如何在igraph中制作分组布局？

我相信还有更多方法.因此,我的问题是: 如何克服毛球问题,即如何通过使用R来可视化大型网络？

以下是一些模拟示例网络的代码:

# Load packages
lapply(c("devtools", "sna", "intergraph", "igraph", "network"), install.packages)
library(devtools)
devtools::install_github(repo="ggally", username="ggobi")
lapply(c("sna", "intergraph", "GGally", "igraph", "network"), 
       require, character.only=T)

# Set up data
set.seed(123)
g <- barabasi.game(1000)

# Plot data
g.plot <- ggnet(g, mode = "fruchtermanreingold")
g.plot

Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

这个问题与可视化GraphViz太大的无向图有关吗？.但是,在这里我不是寻找一般的软件推荐,而是寻找具体的例子(使用上面提供的数据)哪些技术有助于通过使用R来实现对大型网络的良好可视化(与此线程中的示例相当:R:Scatterplot with太多分了).

Answer 1

wjr*_*jrl 16

另一种可视化非常大型网络的方法是使用BioFabric(www.BioFabric.org),它使用水平线而不是点来表示节点.然后使用垂直线段显示边缘.有关此技术的快速D3演示,请参见:http://www.biofabric.org/gallery/pages/SuperQuickBioFabric.html.

BioFabric是一个Java应用程序,但可以在以下网址获得简单的R版本:https://github.com/wjrl/RBioFabric.

这是一段R代码:

 # You need 'devtools':
 install.packages("devtools")
 library(devtools)

 # you need igraph:
 install.packages("igraph")
 library(igraph)

 # install and load 'RBioFabric' from GitHub
 install_github('RBioFabric',  username='wjrl')
 library(RBioFabric)

 #
 # This is the example provided in the question:
 #

 set.seed(123)
 bfGraph = barabasi.game(1000)

 # This example has 1000 nodes, just like the provided example, but it 
 # adds 6 edges in each step, making for an interesting shape; play
 # around with different values.

 # bfGraph = barabasi.game(1000, m=6, directed=FALSE)

 # Plot it up! For best results, make the PDF in the same
 # aspect ratio as the network, though a little extra height
 # covers the top labels. Given the size of the network,
 # a PDF width of 100 gives us good resolution.

 height <- vcount(bfGraph)
 width <- ecount(bfGraph)
 aspect <- height / width;
 plotWidth <- 100.0
 plotHeight <- plotWidth * (aspect * 1.2)
 pdf("myBioFabricOutput.pdf", width=plotWidth, height=plotHeight)
 bioFabric(bfGraph)
 dev.off()

Run Code Online (Sandbox Code Playgroud)

以下是提问者提供的BioFabric数据版本的镜头,尽管使用m> 1的值创建的网络更有趣.插图细节显示了网络左上角的特写; 节点BF4是网络中的最高度节点,默认布局是从该节点开始的网络(忽略边缘方向)的广度优先搜索,其中相邻节点以节点度降低的顺序遍历.请注意,我们可以立即看到,例如,大约60%的节点BF4的邻居是1级.我们还可以从严格的45度下边缘看到这个1000节点网络有999个边缘,因此是树.

BioFabric显示示例数据

完全披露:BioFabric是我写的工具.

Answer 2

Vin*_*tut 10

这是一个有趣的问题,我不知道你列出的大部分工具,谢谢.您可以将HivePlot添加到列表中.这是一种确定性方法,包括在固定数量的轴上投影节点(通常为2或3).查看链接页面,有很多可视化示例.

在此输入图像描述

如果数据集中有分类节点属性,则可以更好地工作,以便您可以使用它来选择节点所在的轴.例如,在研究大学的社交网络时:一个是学生,另一个是教师,第三个是行政人员.但是,当然,它也可以使用离散的数字属性(例如,各自轴上的年轻人,中年人和老年人).

然后你需要另一个属性,这次它必须是数字(或至少是序数).它用于确定节点在其轴上的位置.您还可以使用一些拓扑测量,例如度或传递性(聚类系数).

如何建立一个hiveplot http://www.hiveplot.net/img/hiveplot-undirected-01.png

该方法具有确定性的事实很有意思,因为它允许比较代表不同(但可比较)系统的不同网络.例如,您可以比较两所大学(假设您使用相同的属性/度量来确定轴和位置).它还允许通过选择不同的属性/度量组合来生成可视化,以各种方式描述相同的网络.实际上,这是实现网络可视化的推荐方式,这得益于所谓的蜂巢面板.

我在本文开头提到的页面中列出了几个能够生成这些hive图的软件,包括Java和R中的实现.

Answer 3

Jac*_*b H 7

我最近一直在处理这个问题.结果,我想出了另一个解决方案.按社区/群集折叠图表.这种方法类似于上述OP概述的第三种选择.作为警告,这种方法最适用于无向图.例如:

library(igraph)

set.seed(123)
g <- barabasi.game(1000) %>%
  as.undirected()

#Choose your favorite algorithm to find communities.  The algorithm below is great for large networks but only works with undirected graphs
c_g <- fastgreedy.community(g)

#Collapse the graph by communities.  This insight is due to this post http://stackoverflow.com/questions/35000554/collapsing-graph-by-clusters-in-igraph/35000823#35000823

res_g <- simplify(contract(g, membership(c_g)))

Run Code Online (Sandbox Code Playgroud)

此过程的结果如下图所示,顶点的名称代表社区成员资格.

plot(g, margin = -.5)

Run Code Online (Sandbox Code Playgroud)

以上显然比这个可怕的混乱更好

plot(r_g, margin = -.5)

Run Code Online (Sandbox Code Playgroud)

要将社区链接到原始顶点,您将需要类似于以下内容的内容

mem <- data.frame(vertices = 1:vcount(g), memeber = as.numeric(membership(c_g)))

Run Code Online (Sandbox Code Playgroud)

IMO这是一个很好的方法有两个原因.首先,它理论上可以处理任何大小的图形.在折叠图上可以不断重复查找社区的过程.其次,采用交互式方法会产生非常可读的结果.例如,可以想象用户能够点击折叠图中的顶点来扩展该社区,显示其所有原始顶点.

Answer 4

小智 5

我环顾四周，没有找到好的解决方案。我的方法是删除节点并使用边缘透明度。它更像是一种设计解决方案而不是技术解决方案，但我已经能够在我的笔记本电脑上绘制多达 50,000 个边缘的类似 gephi 的网络，而不会出现太多复杂情况。

用你的例子：

plot(simplify(g), vertex.size= 0.01,edge.arrow.size=0.001,vertex.label.cex = 0.75,vertex.label.color = "black"  ,vertex.frame.color = adjustcolor("white", alpha.f = 0),vertex.color = adjustcolor("white", alpha.f = 0),edge.color=adjustcolor(1, alpha.f = 0.15),display.isolates=FALSE,vertex.label=ifelse(page_rank(g)$vector > 0.1 , "important nodes", NA))

Run Code Online (Sandbox Code Playgroud)

twitter 的示例提到了具有 30,000 条边的网络：

归档时间：	11 年，10 月前
查看次数：	20739 次
最近记录：	6 年，4 月前