岭图：按值/等级排序

Question

岭图：按值/等级排序

k1n*_*ext 4 r data-visualization ggplot2 ridgeline-plot

我有一个数据集，作为 CSV 格式的要点上传到这里。它是 YouGov 文章“‘好’有多好？”中提供的 PDF 的提取形式。. 被要求用 0（非常负面）和 10（非常正面）之间的分数对单词（例如“完美”、“糟糕”）进行评分的人。要点正好包含该数据，即对于每个单词（列：单词），它为从 0 到 10（列：类别）的每个排名存储投票数（列：总计）。

我通常会尝试使用 matplotlib 和 Python 来可视化数据，因为我缺乏 R 方面的知识，但似乎 ggridges 可以创建比我使用 Python 所做的更好的绘图。

使用：

library(ggplot2)
library(ggridges)

YouGov <- read_csv("https://gist.githubusercontent.com/camminady/2e3aeab04fc3f5d3023ffc17860f0ba4/raw/97161888935c52407b0a377ebc932cc0c1490069/poll.csv")

ggplot(YouGov, aes(x=Category, y=Word, height = Total, group = Word, fill=Word)) + 
  geom_density_ridges(stat = "identity", scale = 3)

Run Code Online (Sandbox Code Playgroud)

我能够创建这个图（仍然远非完美）：

忽略我必须调整美学的事实，我很难做到三件事：

按单词的平均排名对单词进行排序。
按平均等级为山脊着色。
或按类别值为脊着色，即使用不同的颜色。

我试图调整来自这个来源的建议，但最终失败了，因为我的数据似乎格式错误：我已经有了每个类别的汇总投票数，而不是单一的投票实例。

我希望最终得到一个更接近这个情节的结果，它满足标准 3（来源）：

Answer 1

OTS*_*ats 5

我自己花了一点时间才到达那里。对我来说，理解数据以及如何Word根据平均Category分进行排序的关键。那么我们先来看一下数据：

> YouGov
# A tibble: 440 x 17
      ID Word  Category Total  Male Female `18 to 35` `35 to 54` `55+`
   <dbl> <chr>    <dbl> <dbl> <dbl>  <dbl>      <dbl>      <dbl> <dbl>
 1     0 Incr~        0     0     0      0          0          0     0
 2     1 Incr~        1     1     1      1          1          1     0
 3     2 Incr~        2     0     0      0          0          0     0
 4     3 Incr~        3     1     1      1          1          1     1
 5     4 Incr~        4     1     1      1          1          1     1
 6     5 Incr~        5     5     6      5          6          5     5
 7     6 Incr~        6     6     7      5          5          8     5
 8     7 Incr~        7     9    10      8         10          7    10
 9     8 Incr~        8    15    16     14         13         15    16
10     9 Incr~        9    20    20     20         22         18    19
# ... with 430 more rows, and 8 more variables: Northeast <dbl>,
#   Midwest <dbl>, South <dbl>, West <dbl>, White <dbl>, Black <dbl>,
#   Hispanic <dbl>, `Other (NET)` <dbl>

Run Code Online (Sandbox Code Playgroud)

每个单词都有一个对应每个类别的行（或分数，1-10）。总数提供该词/类别组合的响应数。因此，尽管没有“难以置信”这个词得分为零的回复，但它仍然是一排。

在我们计算每个词的平均分之前，我们计算每个词-类别组合的类别和总分的乘积，我们称之为总分。从那里，我们可以将其Word视为一个因素，并根据平均总分使用重新排序forcats。之后，您可以像以前一样绘制数据。

library(tidyverse)
library(ggridges)

YouGov <- read_csv("https://gist.githubusercontent.com/camminady/2e3aeab04fc3f5d3023ffc17860f0ba4/raw/97161888935c52407b0a377ebc932cc0c1490069/poll.csv")

YouGov %>% 
  mutate(total_score = Category*Total) %>% 
  mutate(Word = fct_reorder(.f = Word, .x = total_score, .fun = mean)) %>% 
  ggplot(aes(x=Category, y=Word, height = Total, group = Word, fill=Word)) + 
  geom_density_ridges(stat = "identity", scale = 3)

Run Code Online (Sandbox Code Playgroud)

通过将单词视为一个因素，我们根据单词的平均类别对单词进行了重新排序。ggplot 还相应地对颜色进行排序，因此我们不必修改自己，除非您更喜欢不同的调色板。

归档时间：	6 年前
查看次数：	1240 次
最近记录：	5 年，6 月前