使用 ggplot,我想将 4 个四分位数显示为 400 人(每个四分位数 100 个)作为按性别细分的每个人的单独点
library(tidyverse)
dat_url <- 'https://gender-pay-gap.service.gov.uk/viewing/download-data/2019'
dat <- read_csv(dat_url)
a <- dat %>%
filter(str_detect(EmployerName,'ZELLIS')) %>% # pick a company
select(matches("\\bMale\\w+le", perl=TRUE)) %>% # grab male quartiles
pivot_longer(everything()) %>%
extract(name, c('gender', 'quartile'), '(\\bMale)(\\w+\\b)') %>%
mutate(men=round(value), women = 100 - men) %>%
select(-c(gender, value)) %>%
pivot_longer(c('men','women'), names_to='gender', values_to='value') %>%
mutate(quartile = str_replace(quartile,'(^\\w+?)(Middle)', '\\2\\1'))
Run Code Online (Sandbox Code Playgroud)
样本数据:
a
# A tibble: 8 x 3
quartile gender value
<chr> <chr> <dbl>
1 LowerQuartile men 39
2 LowerQuartile women 61
3 LowerMiddleQuartile men 39
4 LowerMiddleQuartile women 61
5 UpperMiddleQuartile men 57
6 UpperMiddleQuartile women 43
7 TopQuartile men 64
8 TopQuartile women 36
Run Code Online (Sandbox Code Playgroud)
这是一种可能性。您可以“不计算”数据,这样每个点就有一行,然后用正方形绘制点。这可能是这样的
a %>% uncount(value) %>%
group_by(quartile) %>%
mutate(row = (row_number() -1)%/% 10 + 1,
col = (row_number() -1) %% 10 + 1) %>%
ggplot() +
aes(col, row, color=gender) +
geom_point(shape=15) +
facet_grid(~quartile) +
coord_equal() +
theme(axis.ticks.x=element_blank(), axis.ticks.y=element_blank(),
axis.text.x=element_blank(), axis.text.y=element_blank(),
axis.title.x=element_blank(), axis.title.y=element_blank())
Run Code Online (Sandbox Code Playgroud)