Ben*_*Ben 5 r data-visualization bar-chart ggplot2 r-grid
在Facebook研究中,我发现这些漂亮的条形图通过线条连接来表示等级变化:

https://research.fb.com/do-jobs-run-in-families/
我想用ggplot2创建它们.条形图部分很简单:
library(ggplot2)
library(ggpubr)
state1 <- data.frame(state=c(rep("ALABAMA",3), rep("CALIFORNIA",3)),
value=c(61,94,27,10,30,77),
type=rep(c("state","local","fed"),2),
cumSum=c(rep(182,3), rep(117,3)))
state2 <- data.frame(state=c(rep("ALABAMA",3), rep("CALIFORNIA",3)),
value=c(10,30,7,61,94,27),
type=rep(c("state","local","fed"),2),
cumSum=c(rep(117,3), rep(182,3)))
fill <- c("#40b8d0", "#b2d183", "#F9756D")
p1 <- ggplot(data = state1) +
geom_bar(aes(x = reorder(state, value), y = value, fill = type), stat="identity") +
theme_bw() +
scale_fill_manual(values=fill) +
labs(x="", y="Total budget in 1M$") +
theme(legend.position="none",
legend.direction="horizontal",
legend.title = element_blank(),
axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(), panel.background = element_blank()) +
coord_flip()
p2 <- ggplot(data = state2) +
geom_bar(aes(x = reorder(state, value), y = value, fill = type), stat="identity") +
theme_bw() +
scale_fill_manual(values=fill) + labs(x="", y="Total budget in 1M$") +
theme(legend.position="none",
legend.direction="horizontal",
legend.title = element_blank(),
axis.line = element_line(size=1, colour = "black"),
panel.grid.major = element_blank(),
panel.grid.minor = element_blank(),
panel.border = element_blank(),
panel.background = element_blank()) +
scale_x_discrete(position = "top") +
scale_y_reverse() +
coord_flip()
p3 <- ggarrange(p1, p2, common.legend = TRUE, legend = "bottom")
Run Code Online (Sandbox Code Playgroud)
但我无法想出线路部分的解决方案.在例如左侧添加行时
p3 + geom_segment(aes(x = rep(1:2, each=3), xend = rep(1:10, each=3),
y = cumSum[order(cumSum)], yend=cumSum[order(cumSum)]+10), size = 1.2)
Run Code Online (Sandbox Code Playgroud)
基本上,我想连接左边的'California'酒吧和右边的Caifornia酒吧.
为此,我想,我必须以某种方式访问图表的上级.我查看了视口,并且能够使用geom_segment制作的图表覆盖两个条形图,但后来我无法找到正确的线条布局:
subplot <- ggplot(data = state1) +
geom_segment(aes(x = rep(1:2, each=3), xend = rep(1:2, each=3),
y = cumSum[order(cumSum)], yend =cumSum[order(cumSum)]+10),
size = 1.2)
vp <- viewport(width = 1, height = 1, x = 1, y = unit(0.7, "lines"),
just ="right", "bottom"))
print(p3)
print(subplot, vp = vp)
Run Code Online (Sandbox Code Playgroud)
非常感谢帮助或指针.
这是一个非常有趣的问题.我使用patchwork库来近似它,它允许你将ggplots 添加到一起并为你提供一种简单的方法来控制它们的布局 - 我更喜欢它做任何grid.arrange基于任何事情的东西,而对于某些事情它更好地工作cowplot.
我扩展了数据集只是为了在两个数据框中获得更多的值.
library(tidyverse)
library(patchwork)
set.seed(1017)
state1 <- data_frame(
state = rep(state.name[1:5], each = 3),
value = floor(runif(15, 1, 100)),
type = rep(c("state", "local", "fed"), times = 5)
)
state2 <- data_frame(
state = rep(state.name[1:5], each = 3),
value = floor(runif(15, 1, 100)),
type = rep(c("state", "local", "fed"), times = 5)
)
Run Code Online (Sandbox Code Playgroud)
然后我创建了一个数据框,根据原始数据框(state1或state2)中的其他值为每个状态分配排名.
ranks <- bind_rows(
state1 %>% mutate(position = 1),
state2 %>% mutate(position = 2)
) %>%
group_by(position, state) %>%
summarise(state_total = sum(value)) %>%
mutate(rank = dense_rank(state_total)) %>%
ungroup()
Run Code Online (Sandbox Code Playgroud)
我做了一个快速的主题,以保持非常小的东西和下降轴标记:
theme_min <- function(...) theme_minimal(...) +
theme(panel.grid = element_blank(), legend.position = "none", axis.title = element_blank())
Run Code Online (Sandbox Code Playgroud)
凹凸图表(中间凹凸图表)基于ranks数据框,没有标签.使用因子而不是数值变量来获得位置和等级,这让我对间距有了更多的控制,并让等级与离散的1到5值对齐,其方式与条形图中的状态名称相匹配.
p_ranks <- ggplot(ranks, aes(x = as.factor(position), y = as.factor(rank), group = state)) +
geom_path() +
scale_x_discrete(breaks = NULL, expand = expand_scale(add = 0.1)) +
scale_y_discrete(breaks = NULL) +
theme_min()
p_ranks
Run Code Online (Sandbox Code Playgroud)

对于左侧条形图,我按值对状态进行排序,将值为负,将其指向左侧,然后为其指定相同的最小主题:
p_left <- state1 %>%
mutate(state = as.factor(state) %>% fct_reorder(value, sum)) %>%
arrange(state) %>%
mutate(value = value * -1) %>%
ggplot(aes(x = state, y = value, fill = type)) +
geom_col(position = "stack") +
coord_flip() +
scale_y_continuous(breaks = NULL) +
theme_min() +
scale_fill_brewer()
p_left
Run Code Online (Sandbox Code Playgroud)

右边的条形图几乎是一样的,除了值保持正值,我将x轴移动到顶部(当我翻转坐标时变为右边):
p_right <- state2 %>%
mutate(state = as.factor(state) %>% fct_reorder(value, sum)) %>%
arrange(state) %>%
ggplot(aes(x = state, y = value, fill = type)) +
geom_col(position = "stack") +
coord_flip() +
scale_x_discrete(position = "top") +
scale_y_continuous(breaks = NULL) +
theme_min() +
scale_fill_brewer()
Run Code Online (Sandbox Code Playgroud)
然后因为我已经加载了patchwork,我可以将这些图添加到一起并指定布局.
p_left + p_ranks + p_right +
plot_layout(nrow = 1)
Run Code Online (Sandbox Code Playgroud)

您可能需要更多地调整间距和边距,例如expand_scale使用凹凸图表调用.我没有试过这个沿着y轴的轴标记(即翻转后的底部),但是我觉得如果不在排列中添加虚拟轴,事情可能会被抛出.还有很多东西要搞乱,但这是你提出的一个很酷的可视化项目!
这是一个纯ggplot2解决方案,它将基础数据帧组合为一个并在单个图中绘制所有内容:
数据处理:
library(dplyr)
bar.width <- 0.9
# combine the two data sources
df <- rbind(state1 %>% mutate(source = "state1"),
state2 %>% mutate(source = "state2")) %>%
# calculate each state's rank within each data source
group_by(source, state) %>%
mutate(state.sum = sum(value)) %>%
ungroup() %>%
group_by(source) %>%
mutate(source.rank = as.integer(factor(state.sum))) %>%
ungroup() %>%
# calculate the dimensions for each bar
group_by(source, state) %>%
arrange(type) %>%
mutate(xmin = lag(cumsum(value), default = 0),
xmax = cumsum(value),
ymin = source.rank - bar.width / 2,
ymax = source.rank + bar.width / 2) %>%
ungroup() %>%
# shift each data source's coordinates away from point of origin,
# in order to create space for plotting lines
mutate(x = ifelse(source == "state1", -max(xmax) / 2, max(xmax) / 2)) %>%
mutate(xmin = ifelse(source == "state1", x - xmin, x + xmin),
xmax = ifelse(source == "state1", x - xmax, x + xmax)) %>%
# calculate label position for each data source
group_by(source) %>%
mutate(label.x = max(abs(xmax))) %>%
ungroup() %>%
mutate(label.x = ifelse(source == "state1", -label.x, label.x),
hjust = ifelse(source == "state1", 1.1, -0.1))
Run Code Online (Sandbox Code Playgroud)
情节:
ggplot(df,
aes(x = x, y = source.rank,
xmin = xmin, xmax = xmax,
ymin = ymin, ymax = ymax,
fill = type)) +
geom_rect() +
geom_line(aes(group = state)) +
geom_text(aes(x = label.x, label = state, hjust = hjust),
check_overlap = TRUE) +
# allow some space for the labels; this may be changed
# depending on plot dimensions
scale_x_continuous(expand = c(0.2, 0)) +
scale_fill_manual(values = fill) +
theme_void() +
theme(legend.position = "top")
Run Code Online (Sandbox Code Playgroud)
数据源(与@camille相同):
set.seed(1017)
state1 <- data_frame(
state = rep(state.name[1:5], each = 3),
value = floor(runif(15, 1, 100)),
type = rep(c("state", "local", "fed"), times = 5)
)
state2 <- data_frame(
state = rep(state.name[1:5], each = 3),
value = floor(runif(15, 1, 100)),
type = rep(c("state", "local", "fed"), times = 5)
)
Run Code Online (Sandbox Code Playgroud)