整理或融化后轻松重新排序因子水平

Question

整理或融化后轻松重新排序因子水平

I'm trying efficiently plot a series of bivariate barplots. Each plot should show the frequency of cases of a series of demographic variables distributed by gender. This code works nicely but when creating the tidied variable variable it's levels are all the levels of the different demographic variables. As it is a new factor, R orders the factor levels in it's own alphabetical way. But, as you can see from the factor levels of 'variable' below and the resulting plot, they are out of meaningful order. i.e. the income categories are out of order as are the levels of education.

In my real data set, there are quite a few more factor levels, so a simple relevelling of variable is possible but not really feasible. One option I thought of was to not melt the variables into variable but to try to do some version of summarise_each(). But I couldn't get that to work.

Thanks for any assistance.

#Age variable
age<-sample(c('18 to 24', '25 to 45', '45+'), size=100, replace=T)
#gender variable
gender<-sample(c('M', 'F'), size=100, replace=T)
#income variable
income<-sample(c(10,20,30,40,50,60,70,80,100,110), size=100, replace=T)
#education variable
education<-sample(c('High School', 'College', 'Elementary'), size=100, replace=T)
#tie together in df
df<-data.frame(age, gender, income, education)
#begin tidying
df %>% 
#tidy, not gender
gather(variable, value, -c(gender))%>%
#group by value, variable, then gender
group_by(value, variable, gender)  %>%
#summarise to obtain table cell frequencies
summarise(freq=n())%>%
#begin plotting, value (categories) as x-axis, frequency as y, gender as grouping variable, original variable as the facetting
ggplot(aes(x=value, y=freq, group=gender))+geom_bar(aes(fill=gender),  stat='identity', position='dodge')+facet_wrap(~variable, scales='free_x')

Run Code Online (Sandbox Code Playgroud)

Answer 1

tho*_*hal 5

数据

df$education <- factor(df$education, c("Elementary", "High School", 
                        "College"))
ddf <- df %>% 
       gather(variable, value, -gender) %>%
       group_by(value, variable, gender)  %>%
       summarise(freq = n())

Run Code Online (Sandbox Code Playgroud)

代码

lvl <- unlist(lapply(df[, -2], function(.) levels(as.factor(.))))
ddf$value <- factor(ddf$value, lvl)
ddf %>% ggplot(aes(x = value, y = freq, group = gender)) + 
        geom_bar(aes(fill = gender), stat = 'identity', 
                 position = 'dodge') + 
        facet_wrap(~variable, scales='free_x')

Run Code Online (Sandbox Code Playgroud)

解释

gather变换中的值education，income并且age为字符向量。ggplot然后使用这些值的规范 orderig（即按字母顺序）。如果您希望它们具有特定的顺序，您应该首先将列转换为一个因子，然后按照您喜欢的顺序分配级别（正如您所提到的）。我只是按照原始级别的顺序（并默默地将数字income转换为一个因子 - 可能需要对您的代码进行一些调整）。但它表明，假设级别在原始数据集中的顺序正确，您不必自己硬编码任何级别。

所以在你的真实情况下，你应该做的是：

将字符向量value转换为因子
将级别分配给您希望它们显示在 ggplot

阴谋

归档时间：	10 年，4 月前
查看次数：	1065 次
最近记录：	10 年，4 月前