我在 R 中模拟了一些图网络数据(~10,000 个观察值),并尝试使用 R 中的 visNetwork 库将其可视化。 然而,数据非常混乱,很难进行可视化分析(我明白在现实生活中,网络数据是旨在使用图查询语言进行分析)。
就目前而言,我可以做些什么来改进我创建的图网络的可视化(这样我就可以探索一些相互叠加的链接和节点)?
可以使用诸如“networkD3”和“diagrammeR”之类的库来更好地可视化该网络吗?
我在下面附上了我的可重现代码:
library(igraph)
library(dplyr)
library(visNetwork)
#create file from which to sample from
x5 <- sample(1:10000, 10000, replace=T)
#convert to data frame
x5 = as.data.frame(x5)
#create first file (take a random sample from the created file)
a = sample_n(x5, 9000)
#create second file (take a random sample from the created file)
b = sample_n(x5, 9000)
#combine
c = cbind(a,b)
#create dataframe
c = data.frame(c)
#rename column names
colnames(c) <- c("a","b")
graph …Run Code Online (Sandbox Code Playgroud) 我有如下所示的数据:https : //imgur.com/a/1hOsFpF
第一个数据集是标准格式数据集,其中包含人员及其财务属性的列表。
第二个数据集包含这些人之间的“关系”——他们互相支付了多少,以及他们彼此欠了多少。
我有兴趣了解更多关于网络和基于图的聚类 - 但我试图更好地了解什么类型的情况需要基于网络的聚类,即我不想在不需要的地方使用图聚类(避免“方钉圆孔"类型情况)。
使用 R,首先我创建了一些假数据:
library(corrr)
library(dplyr)
library(igraph)
library(visNetwork)
library(stats)
# create first data set
Personal_Information <- data.frame(
"name" = c("John", "Jack", "Jason", "Jim", "Julian", "Jack", "Jake", "Joseph"),
"age" = c("41","33","24","66","21","66","29", "50"),
"salary" = c("50000","20000","18000","66000","77000","0","55000","40000"),
"debt" = c("10000","5000","4000","0","20000","5000","0","1000"
)
Personal_Information$age = as.numeric(Personal_Information$age)
Personal_Information$salary = as.numeric(Personal_Information$salary)
Personal_Information$debt = as.numeric(Personal_Information$debt)
create second data set
Relationship_Information <-data.frame(
"name_a" = c("John","John","John","Jack","Jack","Jack","Jason","Jason","Jim","Jim","Jim","Julian","Jake","Joseph","Joseph"),
"name_b" = c("Jack", "Jason", "Joseph", "John", "Julian","Jim","Jim", "Joseph", "Jack", "Julian", "John", "Joseph", "John", "Jim", "John"),
"how_much_they_owe_each_other" = …Run Code Online (Sandbox Code Playgroud) 我正在使用 R。使用教程,我能够创建一个统计模型并为一些输出生成可视化图:
#load libraries
library(survival)
library(dplyr)
library(ranger)
library(data.table)
library(ggplot2)
#use the built in "lung" data set
#remove missing values (dataset is called "a")
a <- na.omit(lung)
#create id variable
a$ID <- seq_along(a[,1])
#create test set with only the first 3 rows
new <- a[1:3,]
#create a training set by removing first three rows
a <- a[-c(1:3),]
#fit survival model (random survival forest)
r_fit <- ranger(Surv(time,status) ~ age + sex + ph.ecog + ph.karno + pat.karno + meal.cal + …Run Code Online (Sandbox Code Playgroud) 使用 plotly 库,我在 R 中绘制了以下图:
library(dplyr)
library(ggplot2)
library(plotly)
set.seed(123)
df <- data.frame(var1 = rnorm(1000,10,10),
var2 = rnorm(1000,5,5))
df <- df %>% mutate(var3 = ifelse(var1 <= 5 & var2 <= 5, "a", ifelse(var1 <= 10 & var2 <= 10, "b", "c")))
plot = df %>%
ggplot() + geom_point(aes(x=var1, y= var2, color= var3))
ggplotly(plot)
Run Code Online (Sandbox Code Playgroud)
这是一个简单的散点图 - 生成两个随机变量,然后点的颜色由某些标准决定(例如,如果 var1 和 var2 在某些范围之间)。
从这里,我还可以总结统计数据:
df$var3 = as.factor(df$var3)
summary = df %>%
group_by(var3) %>%
summarize(Mean_var1 = mean(var1), Mean_var2 = mean(var2), count=n())
# A tibble: 3 …Run Code Online (Sandbox Code Playgroud) 我在 R 中使用 ggplot2 库。
假设我有一个看起来像这样的图表:
library(ggplot2)
ggplot(work) + geom_line(aes(x = var1, y = var2, group = 1)) +
theme(axis.text.x = element_text(angle = 90)) +
ggtitle("sample graph")
Run Code Online (Sandbox Code Playgroud)
有没有办法直接在这个图中添加第二行?
例如
ggplot(work) + geom_line(aes(x = var1, y = var2, group = 1)) +
geom_line(aes(x = var1, y = mean(var2), group = 1)) +
theme(axis.text.x = element_text(angle = 90)) +
ggtitle("Sample graph")
Run Code Online (Sandbox Code Playgroud)
谢谢