Har*_*ish 3 python dataframe python-3.x pandas
我有两列"ID"和"division",如下所示.
df = pd.DataFrame(np.array([['111', 'AAA'],['222','AAA'],['333','BBB'],['444','CCC'],['444','AAA'],['222','BBB'],['111','BBB']]),columns=['ID','division'])
ID division
0 111 AAA
1 222 AAA
2 333 BBB
3 444 CCC
4 444 AAA
5 222 BBB
6 111 BBB
Run Code Online (Sandbox Code Playgroud)
预期的输出如下所示,我需要在同一列上进行旋转,但计数取决于"除法".这应该在热图中显示.
df = pd.DataFrame(np.array([['0','2','1','1'],['2','0','1','1'],['1','1','0','0'],['1','1','0','0']]),columns=['111','222','333','444'],index=['111','222','333','444'])
111 222 333 444
111 0 2 1 1
222 2 0 1 1
333 1 1 0 0
444 1 1 0 0
Run Code Online (Sandbox Code Playgroud)
所以,从技术上讲,我在ID之间就分裂做了重叠.
示例:突出显示的框为红色,其中111和222 ID之间的重叠为2(AAA和BBB).其中111和444之间的重叠是1(黑框中突出显示AAA).
我可以通过两个步骤在excel中做到这一点.不确定是否有一个帮助.Step1:= SUM(COUNTIFS($B$2:$B$8,$B2,$A$2:$A$8,$G2),COUNTIFS($B$2:$B$8,$B2,$A$2:$A$8,H$1))-1
Step2:=IF($G12=H$1,0,SUMIFS(H$2:H$8,$G$2:$G$8,$G12))
但有没有办法我们可以使用数据框在Python中完成它.感谢您的帮助
案例2
if df = pd.DataFrame(np.array([['111', 'AAA','4'],['222','AAA','5'],['333','BBB','6'],
['444','CCC','3'],['444','AAA','2'], ['222','BBB','2'],
['111','BBB','7']]),columns=['ID','division','count'])
ID division count
0 111 AAA 4
1 222 AAA 5
2 333 BBB 6
3 444 CCC 3
4 444 AAA 2
5 222 BBB 2
6 111 BBB 7
Run Code Online (Sandbox Code Playgroud)
预期的产出将是
df_result = pd.DataFrame(np.array([['0','18','13','6'],['18','0','8','7'],['13','8','0','0'],['6','7','0','0']]),columns=['111','222','333','444'],index=['111','222','333','444'])
111 222 333 444
111 0 18 13 6
222 18 0 8 7
333 13 8 0 0
444 6 7 0 0
Run Code Online (Sandbox Code Playgroud)
计算:这里相对于划分AAA和BBB在111和222之间存在重叠,因此总和将是4 + 5 + 2 + 7 = 18
另一种方式做,这是使用具有自连接merge和pd.crosstab:
df_out = df.merge(df, on='division')
results = pd.crosstab(df_out.ID_x, df_out.ID_y)
np.fill_diagonal(results.values, 0)
Run Code Online (Sandbox Code Playgroud)
输出:
ID_y 111 222 333 444
ID_x
111 0.0 2.0 1.0 1.0
222 2.0 0.0 1.0 1.0
333 1.0 1.0 0.0 0.0
444 1.0 1.0 0.0 0.0
Run Code Online (Sandbox Code Playgroud)
df = pd.DataFrame(np.array([['111', 'AAA','4'],['222','AAA','5'],['333','BBB','6'],
['444','CCC','3'],['444','AAA','2'], ['222','BBB','2'],
['111','BBB','7']]),columns=['ID','division','count'])
df['count'] = df['count'].astype(int)
df_out = df.merge(df, on='division')
df_out = df_out.assign(count = df_out.count_x + df_out.count_y)
results = pd.crosstab(df_out.ID_x, df_out.ID_y, df_out['count'], aggfunc='sum').fillna(0)
np.fill_diagonal(results.values, 0)
Run Code Online (Sandbox Code Playgroud)
输出:
ID_y 111 222 333 444
ID_x
111 0.0 18.0 13.0 6.0
222 18.0 0.0 8.0 7.0
333 13.0 8.0 0.0 0.0
444 6.0 7.0 0.0 0.0
Run Code Online (Sandbox Code Playgroud)