使用数据框中的另一列标记 x 轴

Question

使用数据框中的另一列标记 x 轴

我有一个从运行 GWAS 的输出得出的数据帧。每行都是基因组中的一个 SNP，及其染色体、位置和 P 值。从这个数据框中，我想生成一个曼哈顿图，其中 x 轴从 Chr 1 上的第一个 SNP 到 Chr 5 上的最后一个 SNP，y 轴是 -log10(P.value)。为此，我生成了一个索引列，以沿着 x 轴以正确的顺序绘制 SNP，但是，我希望 x 轴由染色体列而不是索引来标记。不幸的是，我无法使用染色体来绘制 x 轴，因为这样任何给定染色体上的所有 SNP 都将绘制在单列点中。

这是一个可以使用的示例数据框：

library(tidyverse)

df <- tibble(Index = seq(1, 500, by = 1),
             Chromosome = rep(seq(1, 5, by = 1), each = 100),
             Position = rep(seq(1, 500, by = 5), 5),
             P.value = sample(seq(1e-5, 1e-2, by = 1e-5), 500, replace = TRUE))

Run Code Online (Sandbox Code Playgroud)

到目前为止我所掌握的情节：

df %>%
    ggplot(aes(x = Index, y = -log10(P.value), color = as.factor(Chromosome))) +
    geom_point()

Run Code Online (Sandbox Code Playgroud)

我尝试过使用scale_x_discrete选项，但无法找到解决方案。

这是我在网上找到的曼哈顿图的一个例子。看看 x 轴是如何根据染色体标记的？这就是我想要的输出。

曼哈顿图示例

Answer 1

ped*_*sso 5

geom_jitter是你的朋友：

df %>%
    ggplot(aes(x = Chromosome, y = -log10(P.value), color = as.factor(Chromosome))) +
    geom_jitter()

Run Code Online (Sandbox Code Playgroud)

编辑给定OP的评论：

使用基本 R 图，您可以执行以下操作：

cols = sample(colors(), length(unique(df$Chromosome)))[df$Chromosome]

plot(df$Index, -log10(df$P.value), col=cols, xaxt="n")
axis(1, at=c(50, 150, 250, 350, 450), labels=c(1:5))

Run Code Online (Sandbox Code Playgroud)

您需要准确指定每个染色体标签在该axis函数中的位置。感谢这篇文章。

编辑#2：

我使用找到了答案ggplot2。您可以使用该annotate函数按坐标绘制点，并使用该scale_x_discrete函数（如您建议的那样）根据染色体将标签放置在 x 轴上。我们还需要定义pos向量来获取绘图标签的位置。我使用每个组的列的平均值Index作为示例，但如果您愿意，您可以手动定义它。

pos <- df %>% 
    group_by(Chromosome) %>% 
    summarize(avg = round(mean(Index))) %>% 
    pull(avg)

ggplot(df) +
    annotate("point", x=df$Index, y=-log10(df$P.value),
          color=as.factor(df$Chromosome)) +
    scale_x_discrete(limits = pos, 
          labels = unique(df$Chromosome))

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年前
查看次数：	6558 次
最近记录：	6 年前