r中约10个因子共存的饼图

Question

r中约10个因子共存的饼图

719*_*016 5 grouping r data-representation

我有一个包含大约30000个簇的两列数据集和10个这样的因子:

cluster-1 Factor1
cluster-1 Factor2
...
cluster-2 Factor2
cluster-2 Factor3
...

Run Code Online (Sandbox Code Playgroud)

我想代表集群中的因素共同出现.像"因子1 +因子3 +因子5在1234簇"中的东西,等等,用于不同的组合.我以为我可以像饼图一样,但有10个因素,我认为可能有太多的组合.

什么是代表这个的好方法？

Answer 1

Joh*_*lby 2

这里有一个很好的编程问题需要解决：

如何计算不同聚类中因子同时出现的数量？

首先模拟一些数据：

n = 1000

set.seed(12345)
n.clusters = 100
clusters = rep(1:n.clusters, length.out=n)

n.factors = 10
factors = round(rnorm(n, n.factors/2, n.factors/5))
factors[factors > n.factors] = n.factors
factors[factors < 1] = 1

data = data.frame(cluster=clusters, factor=factors)

Run Code Online (Sandbox Code Playgroud)

n = 1000

set.seed(12345)
n.clusters = 100
clusters = rep(1:n.clusters, length.out=n)

n.factors = 10
factors = round(rnorm(n, n.factors/2, n.factors/5))
factors[factors > n.factors] = n.factors
factors[factors < 1] = 1

data = data.frame(cluster=clusters, factor=factors)

Run Code Online (Sandbox Code Playgroud)

下面是可用于对每个因素组合在簇中出现的次数进行列表的代码：

counts = with(data, table(tapply(factor, cluster, function(x) paste(as.character(sort(unique(x))), collapse=''))))

Run Code Online (Sandbox Code Playgroud)

这可以表示为一个简单的饼图，例如，

dev.new(width=5, height=5)
pie(counts[counts>1])

Run Code Online (Sandbox Code Playgroud)

在此输入图像描述

但像这样的简单计数通常最有效地显示为排序表。有关这方面的更多信息，请查看爱德华·塔夫特 (Edward Tufte)。

您可以执行“data.frame(Count=sort(counts, dec=T))”，然后将其截断以仅显示出现在 >0 或 >1 簇中的组合。 (2认同)

归档时间：	14 年，1 月前
查看次数：	296 次
最近记录：	14 年，1 月前