如何使用所有列创建频率矩阵

Question

如何使用所有列创建频率矩阵

假设我有一个数据集，其中包含2行的4个二进制列。

看起来像这样：

    c1 c2 c3 c4 c5
r1  0   1  0  1 0
r2  1   1  1  1 0

Run Code Online (Sandbox Code Playgroud)

我想创建一个给出一列出现次数的矩阵，假设它也出现在另一列中。有点像混乱矩阵

我想要的输出是：

   c1 c2  c3  c4 c5
c1  -  1   1   1  0
c2  1  -   1   2  0
c3  1  1   -   1  0
c4  1  2   1   -  0

Run Code Online (Sandbox Code Playgroud)

我使用了pandas crosstab，但是当使用2列时，它只能提供所需的输出。我想使用所有列

Answer 1

piR*_*red 6

`dot`

df.T.dot(df)
# same as
# df.T @ df

    c1  c2  c3  c4  c5
c1   1   1   1   1   0
c2   1   2   1   2   0
c3   1   1   1   1   0
c4   1   2   1   2   0
c5   0   0   0   0   0

Run Code Online (Sandbox Code Playgroud)

您可以使用np.fill_diagonal将对角线设为零

d = df.T.dot(df)
np.fill_diagonal(d.to_numpy(), 0)
d

    c1  c2  c3  c4  c5
c1   0   1   1   1   0
c2   1   0   1   2   0
c3   1   1   0   1   0
c4   1   2   1   0   0
c5   0   0   0   0   0

Run Code Online (Sandbox Code Playgroud)

只要我们使用Numpy，您就可以一直进行下去...

a = df.to_numpy()
b = a.T @ a
np.fill_diagonal(b, 0)

pd.DataFrame(b, df.columns, df.columns)

    c1  c2  c3  c4  c5
c1   0   1   1   1   0
c2   1   0   1   2   0
c3   1   1   0   1   0
c4   1   2   1   0   0
c5   0   0   0   0   0

Run Code Online (Sandbox Code Playgroud)

归档时间：	6 年，7 月前
查看次数：	43 次
最近记录：	6 年，7 月前