基于另一个 pandas 聚合一列

Wbo*_*boy 2 python pandas

从技术上讲,这应该是一件简单的事情,但不幸的是,目前我并没有想到这一点。

我试图根据另一列找到另一列的比例。例如:

Column 1   |  target_variable
'potato'         1
'potato'         0
'tomato'         1
'brocolli'       1
'tomato'         0
Run Code Online (Sandbox Code Playgroud)

预期输出是:

column 1   | target = 1  | target = 0 | total_count
'potato'   |     1       |      1     |     2
'tomato'   |     1       |      1     |     2
'brocolli' |     1       |      0     |     1
Run Code Online (Sandbox Code Playgroud)

但是,我认为我错误地使用了聚合,因此我采用了以下简单的实现:

z = {}
for i in train.index:
    fruit = train["fruit"][i]
    l = train["target"][i]
    if fruit not in z:
        if l == 1:
            z[fruit] = {1:1,0:0,'count':1}
        else:
            z[fruit] = {1:0,0:1,'count':1}
    else:
        if l == 1:
            z[fruit][1] += 1
        else:
            z[fruit][0] += 1
        z[fruit]['count'] += 1
Run Code Online (Sandbox Code Playgroud)

它以字典形式提供类似的输出。

谁能启发我 pandas 方式的正确语法?:)

谢谢你!:)

jez*_*ael 5

你需要groupby+ size+ unstack+ add_prefix+ sum

df1 = df.groupby(['Column 1','target_variable']).size() \
        .unstack(fill_value=0) \
        .add_prefix('target = ')
df1['total_count'] = df1.sum(axis=1)
print (df1)
target_variable  target = 0  target = 1  total_count
Column 1                                            
brocolli                  0           1            1
potato                    1           1            2
tomato                    1           1            2
Run Code Online (Sandbox Code Playgroud)

或者crosstab

df1 = pd.crosstab(df['Column 1'],df['target_variable'], margins=True)
print (df1)
target_variable  0  1  All
Column 1                  
brocolli         0  1    1
potato           1  1    2
tomato           1  1    2
All              2  3    5

df1 = df1.rename(columns = {'All': 'total_count'}).iloc[:-1]
print (df1)
target_variable  0  1  total_count
Column 1                          
brocolli         0  1            1
potato           1  1            2
tomato           1  1            2
Run Code Online (Sandbox Code Playgroud)