R 中列联表的维恩图

Uma*_*mar 3 r contingency venn-diagram dataframe venn

我有一个像列联表这样的数据,它显示了大量的数据,但我想从这个数据帧中绘制维恩图。

我的数据结构:

species_abundance<-data.frame(Genus = c("Parasphingorhabdus", "Loktanella", "Cytobacillus", "Paracoccus", "Paucisalibacillus", "Kytococcus", "Salinibacterium", "Acinetobacter baumanni","Marinococcus","Bacillus"),
               S3 = c(0, 0, 1, 1, 0, 0, 1,0,4,0),
               S5 = c(0, 0, 0, 1, 1, 0, 1,0,3,5),
               S7 = c(3, 1, 0, 2, 0, 1, 0,0,3,1),
               S9 = c(0, 1, 0, 3, 0, 0, 0,1,2,0)
Run Code Online (Sandbox Code Playgroud)

我如何从这个数据框中绘制维恩图,以便找到不同站(S3、S5、S7......)的独特和共享物种?

如果我按照下面给出的方式转换数据并尝试使用 Venny2,我将得到这样的图像,类似的图像并发现我想使用 R 做,请帮助

species_abundance1<-data.frame(S3 = c("", "", "Cytobacillus", "Paracoccus", "", "", "Salinibacterium","", "Marinococcus", ""),
                          S5 = c("", "", "", "Paracoccus", "Paucisalibacillus", "", "Salinibacterium","", "Marinococcus","Bacillus"),
                          S7 = c("Parasphingorhabdus", "Loktanella", "", "", "", "Kytococcus", "","", "Marinococcus","Bacillus"),
                          S9 = c("", "Loktanella", "", "", "", "", "","Acinetobacter baumanni", "Marinococcus",""))
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

All*_*ron 6

在 R 中获取 4 变量维恩图的方法有多种,但超出此数量类别的维恩图极其复杂,并不是可视化数据的好方法。以下是来自维基共享资源的 5 类维恩图示例:

在此输入图像描述

7 类维恩甚至无法使用椭圆来绘制,并且涉及复杂的花卉形状,如链接文章中所示。

无论如何,您可以看到,即使具有 5 个类别的维恩也不是一种非常用户友好的表示数据的方式。

就您而言,呈现此类数据的自然方式是通过热图。您首先需要将数据重塑为长格式。

library(tidyverse)

species_abundance %>%
  pivot_longer(-Genus, names_to = 'Site', values_to = 'Count') %>%
  mutate(Site = factor(Site, unique(Site))) %>%
  ggplot(aes(Site, Genus, fill = factor(Count))) +
  geom_tile(color = 'black') +
  geom_text(aes(label = ifelse(Count == 0, '', Count))) +
  coord_equal() +
  scale_fill_manual(guide = 'none', 
                    values = c('white', 'lightyellow', 'yellow', 'orange')) +
  theme_minimal(base_size = 16)
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述


附录

如果您确实想要一个显示 5 个站点共有的物种数量的 5 类维恩图,您可以这样做:

library(VennDiagram)

grid::grid.newpage()

with(sign(species_abundance[-1]),
     draw.quintuple.venn(sum(S3), sum(S5), sum(S7), sum(S9), sum(S10),
        sum(S3 == 1 & S5 == 1),  sum(S3 == 1 & S7 == 1),
        sum(S3 == 1 & S9 == 1),  sum(S3 == 1 & S10 == 1),
        sum(S5 == 1 & S7 == 1),  sum(S5 == 1 & S9 == 1),
        sum(S5 == 1 & S10 == 1), sum(S7 == 1 & S9 == 1),
        sum(S7 == 1 & S10 == 1), sum(S9 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1),
        sum(S3 == 1 & S5 == 1 & S9 == 1),
        sum(S3 == 1 & S5 == 1 & S10 == 1),
        sum(S3 == 1 & S7 == 1 & S9 == 1),
        sum(S3 == 1 & S7 == 1 & S10 == 1),
        sum(S3 == 1 & S9 == 1 & S10 == 1),
        sum(S5 == 1 & S7 == 1 & S9 == 1),
        sum(S5 == 1 & S7 == 1 & S10 == 1),
        sum(S5 == 1 & S9 == 1 & S10 == 1),
        sum(S7 == 1 & S9 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1 & S9 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S9 == 1 & S10 == 1),
        sum(S3 == 1 & S7 == 1 & S9 == 1 & S10 == 1),
        sum(S5 == 1 & S7 == 1 & S9 == 1 & S10 == 1),
        sum(S3 == 1 & S5 == 1 & S7 == 1 & S9 == 1 & S10 == 1),
        category = c("S3", "S5", "S7", "S9", "S10"),
        fill = c("orange", "red", "green", "blue", "yellow"),
        cex = 2,
        cat.cex = 2,
        cat.col = 'black'
))
Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

尽管阅读/理解要困难得多,但它包含的信息也比热图少。例如,我可以从 Venn 看到只有 S3 和 S5 有一个共同点,但我可以从热图中清楚地看到这一点。此外,我可以告诉您属(副球菌),以及使用热图在每个站点对其进行的观察次数。你不能用维恩图来做到这一点。维恩根本就是错误的呈现数据的工具。