Bra*_*roy 3 r data-visualization ggplot2
首先,我还是初学者.我正在尝试用R解释并绘制一个堆栈条形图.我已经看过一些答案,但有些不是我的案例和其他我不明白的:
我有一个dvl包含五列的数据集,Variant,Region,Time,Person和PrecededByPrep.我想对Variant与其他四个预测变量进行多变量比较.每列可以具有两个可能值之一:
elk或ieder.VL或NL.time或no timeperson或no person1或0这是逻辑回归
从我收集的答案中,图书馆ggplot2可能是最好的绘图库.我已经阅读了它的文档,但对于我的生活,我无法弄清楚如何绘制这个:我怎样才能Variant与其他三个因素进行比较?
我花了一段时间,但我在Photoshop中做了类似于我想要的东西(虚构的价值观!).

深灰色/浅灰色:Variant
y轴的可能值:频率x轴:每列,细分为可能的值
我知道要制作单独的条形图,堆叠和分组,但基本上我不知道如何堆叠,分组条形图.ggplot2可以使用,但如果可以在没有我喜欢的情况下完成.
我认为这可以看作是一个样本数据集,但我并不完全确定.我是R的初学者,我读到了关于创建样本集的内容.
t <- data.frame(Variant = sample(c("iedere","elke"),size = 50, replace = TRUE),
Region = sample(c("VL","NL"),size = 50, replace = TRUE),
PrecededByPrep = sample(c("1","0"),size = 50, replace = TRUE),
Person = sample(c("person","no person"),size = 50, replace = TRUE),
Time = sample(c("time","no time"),size = 50, replace = TRUE))
Run Code Online (Sandbox Code Playgroud)
我希望这个情节在美学方面也令人愉悦.我的想法:
col=c("paleturquoise3", "palegreen3")font.lab=2,但不为值标签(例如'region in bold, butVL andNL`不以粗体显示)#404040 作为字体,轴和线的颜色factors,y:frequency这里是用"未制表"数据帧开始一种可能性,melt它与绘制它geom_bar在ggplot2(这确实每组的计数),通过可变通过使用单独的情节facet_wrap.
创建玩具数据:
set.seed(123)
df <- data.frame(Variant = sample(c("iedere", "elke"), size = 50, replace = TRUE),
Region = sample(c("VL", "NL"), size = 50, replace = TRUE),
PrecededByPrep = sample(c("1", "0"), size = 50, replace = TRUE),
Person = sample(c("person", "no person"), size = 50, replace = TRUE),
Time = sample(c("time", "no time"), size = 50, replace = TRUE))
Run Code Online (Sandbox Code Playgroud)
重塑数据:
library(reshape2)
df2 <- melt(df, id.vars = "Variant")
Run Code Online (Sandbox Code Playgroud)
情节:
library(ggplot2)
ggplot(data = df2, aes(factor(value), fill = Variant)) +
geom_bar() +
facet_wrap(~variable, nrow = 1, scales = "free_x") +
scale_fill_grey(start = 0.5) +
theme_bw()
Run Code Online (Sandbox Code Playgroud)

有很多机会可以自定义绘图,例如设置因子级别的顺序,旋转轴标签,在两条线上包裹刻面标签(例如,对于较长的变量名称"PrecededByPrep"),或者更改刻面之间的间距.
自定义(有关OP的更新和评论)
# labeller function used in facet_grid to wrap "PrecededByPrep" on two lines
# see http://www.cookbook-r.com/Graphs/Facets_%28ggplot2%29/#modifying-facet-label-text
my_lab <- function(var, value){
value <- as.character(value)
if (var == "variable") {
ifelse(value == "PrecededByPrep", "Preceded\nByPrep", value)
}
}
ggplot(data = df2, aes(factor(value), fill = Variant)) +
geom_bar() +
facet_grid(~variable, scales = "free_x", labeller = my_lab) +
scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
theme_bw() +
theme(axis.text = element_text(face = "bold"), # axis tick labels bold
axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
line = element_line(colour = "gray25"), # line colour gray25 = #404040
strip.text = element_text(face = "bold")) + # facet labels bold
xlab("factors") + # set axis labels
ylab("frequency")
Run Code Online (Sandbox Code Playgroud)

向每个栏添加计数(编辑OP的注释).
计算y坐标的基本原则可以在本问答中找到.在这里,我dplyr用来计算每条杆的数量(即labelin geom_text)和它们的y坐标,但这当然可以在baseR中完成,plyr或者data.table.
# calculate counts (i.e. labels for geom_text) and their y positions.
library(dplyr)
df3 <- df2 %>%
group_by(variable, value, Variant) %>%
summarise(n = n()) %>%
mutate(y = cumsum(n) - (0.5 * n))
# plot
ggplot(data = df2, aes(x = factor(value), fill = Variant)) +
geom_bar() +
geom_text(data = df3, aes(y = y, label = n)) +
facet_grid(~variable, scales = "free_x", labeller = my_lab) +
scale_fill_manual(values = c("paleturquoise3", "palegreen3")) + # manual fill colors
theme_bw() +
theme(axis.text = element_text(face = "bold"), # axis tick labels bold
axis.text.x = element_text(angle = 45, hjust = 1), # rotate x axis labels
line = element_line(colour = "gray25"), # line colour gray25 = #404040
strip.text = element_text(face = "bold")) + # facet labels bold
xlab("factors") + # set axis labels
ylab("frequency")
Run Code Online (Sandbox Code Playgroud)

这是我barplot对基础R 功能的解决方案的提议:
1.计算计数
l_count_df<-lapply(colnames(t)[-1],function(nomcol){table(t$Variant,t[,nomcol])})
count_df<-l_count_df[[1]]
for (i in 2:length(l_count_df)){
count_df<-cbind(count_df,l_count_df[[i]])
}
Run Code Online (Sandbox Code Playgroud)
2.绘制没有轴名称的条形图,保存条形坐标
par(las=1,col.axis="#404040",mar=c(5,4.5,4,2),mgp=c(3.5,1,0))
bp<-barplot(count_df,width=1.2,space=rep(c(1,0.3),4),col=c("paleturquoise3", "palegreen3"),border="#404040", axisname=F, ylab="Frequency",
legend=row.names(count_df),ylim=c(0,max(colSums(count_df))*1.2))
Run Code Online (Sandbox Code Playgroud)
3.标记栏
mtext(side=1,line=0.8,at=bp,text=colnames(count_df))
mtext(side=1,line=2,at=(bp[seq(1,8,by=2)]+bp[seq(2,8,by=2)])/2,text=colnames(t)[-1],font=2)
Run Code Online (Sandbox Code Playgroud)
4.在栏内添加值
for(i in 1:ncol(count_df)){
val_elke<-count_df[1,i]
val_iedere<-count_df[2,i]
text(bp[i],val_elke/2,val_elke)
text(bp[i],val_elke+val_iedere/2,val_iedere)
}
Run Code Online (Sandbox Code Playgroud)
这是我得到的(使用我的随机数据):

| 归档时间: |
|
| 查看次数: |
1722 次 |
| 最近记录: |