use*_*108 5 r googlevis sankey-diagram
我的目标是使用googleVis包装在R中制作多个Sankey .输出应该类似于:
我在R中创建了一些虚拟数据:
set.seed(1)
source <- sample(c("North","South","East","West"),100,replace=T)
mid <- sample(c("North ","South ","East ","West "),100,replace=T)
destination <- sample(c("North","South","East","West"),100,replace=T) # N.B. It is important to have a space after the second set of destinations to avoid a cycle
dummy <- rep(1,100) # For aggregation
dat <- data.frame(source,mid,destination,dummy)
aggdat <- aggregate(dummy~source+mid+destination,dat,sum)
Run Code Online (Sandbox Code Playgroud)
如果我只有一个源和目的地,我可以构建一个有2个变量的Sankey,但不是中间点:
aggdat <- aggregate(dummy~source+destination,dat,sum)
library(googleVis)
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
plot(p)
Run Code Online (Sandbox Code Playgroud)
代码产生了这个:
我该如何修改?
p <- gvisSankey(aggdat,from="source",to="destination",weight="dummy")
Run Code Online (Sandbox Code Playgroud)
接受mid变量呢?
函数gvisSankey确实直接接受中级.这些级别必须在底层数据中编码.
source <- sample(c("NorthSrc", "SouthSrc", "EastSrc", "WestSrc"), 100, replace=T)
mid <- sample(c("NorthMid", "SouthMid", "EastMid", "WestMid"), 100, replace=T)
destination <- sample(c("NorthDes", "SouthDes", "EastDes", "WestDes"), 100, replace=T)
dummy <- rep(1,100) # For aggregation
Run Code Online (Sandbox Code Playgroud)
现在,我们将重塑原始数据:
library(dplyr)
datSM <- dat %>%
group_by(source, mid) %>%
summarise(toMid = sum(dummy) ) %>%
ungroup()
Run Code Online (Sandbox Code Playgroud)
数据框datSM汇总了从Source到Mid的单位数.
datMD <- dat %>%
group_by(mid, destination) %>%
summarise(toDes = sum(dummy) ) %>%
ungroup()
Run Code Online (Sandbox Code Playgroud)
数据框datMD汇总了从中间到目的地的单位数.该数据帧将被添加到最终数据帧中.数据框必须是ungroup相同的colnames.
colnames(datSM) <- colnames(datMD) <- c("From", "To", "Dummy")
Run Code Online (Sandbox Code Playgroud)
当datMD作为最后一个附加时,gvisSankey将自动识别中间步骤.
datVis <- rbind(datSM, datMD)
p <- gvisSankey(datVis, from="From", to="To", weight="dummy")
plot(p)
Run Code Online (Sandbox Code Playgroud)