数据集如下所示:
Gene SampleName
gene1 sample1
gene1 sample2
gene1 sample3
gene2 sample2
gene2 sample3
gene2 sample4
gene3 sample1
gene3 sample5
Run Code Online (Sandbox Code Playgroud)
我的目标是创建一个这样的数据矩阵:
gene1 gene2 gene3
gene1 - 2 1
gene2 - - 0
gene3 - - -
Run Code Online (Sandbox Code Playgroud)
gene1vs gene2是2因为他们共享相同的样本sample2和sample3.gene1vs gene3是1,因为他们只共享一个相同的样本 - sample1.
我的问题是如何在R或Perl中实现这一目标?实际数据集要大得多.我非常感谢你的帮助.
这是dput(df)R 的输出:
df <- structure(list(Gene = c("gene1", "gene1", "gene1", "gene2", "gene2",
"gene2", "gene3", "gene3"), SampleName = c("sample1", "sample2",
"sample3", "sample2", "sample3", "sample4", "sample1", "sample5"
)), …Run Code Online (Sandbox Code Playgroud)