Pandas DataFrame.groupby()到具有多列值的字典

Mic*_*hes 7 python dictionary dataframe pandas jupyter

type(Table)
pandas.core.frame.DataFrame

Table
======= ======= =======
Column1 Column2 Column3
0       23      1
1       5       2
1       2       3
1       19      5
2       56      1
2       22      2
3       2       4
3       14      5
4       59      1
5       44      1
5       1       2
5       87      3
Run Code Online (Sandbox Code Playgroud)

对于任何有熊猫的人,如何使用该.groupby()方法构建一个多值字典?

我想输出类似于这种格式:

{
    0: [(23,1)]
    1: [(5,  2), (2, 3), (19, 5)]
    # etc...
    }
Run Code Online (Sandbox Code Playgroud)

其中Col1值表示为键,对应的Col2Col3是元组打包到每个Col1键的数组中.

我的语法只用于将一列合并到.groupby():

Table.groupby('Column1')['Column2'].apply(list).to_dict()
# Result as expected
{
    0: [23], 
    1: [5, 2, 19], 
    2: [56, 22], 
    3: [2, 14], 
    4: [59], 
    5: [44, 1, 87]
}
Run Code Online (Sandbox Code Playgroud)

但是,为索引指定多个值会导致返回值的列名:

Table.groupby('Column1')[('Column2', 'Column3')].apply(list).to_dict()
# Result has column namespace as array value
{
    0: ['Column2', 'Column3'],
    1: ['Column2', 'Column3'],
    2: ['Column2', 'Column3'],
    3: ['Column2', 'Column3'],
    4: ['Column2', 'Column3'],
    5: ['Column2', 'Column3']
 }
Run Code Online (Sandbox Code Playgroud)

我如何返回值数组中的元组列表?

Psi*_*dom 10

自定义您使用的功能,apply以便返回每个组的列表列表:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
# {0: [[23, 1]], 
#  1: [[5, 2], [2, 3], [19, 5]], 
#  2: [[56, 1], [22, 2]], 
#  3: [[2, 4], [14, 5]], 
#  4: [[59, 1]], 
#  5: [[44, 1], [1, 2], [87, 3]]}
Run Code Online (Sandbox Code Playgroud)

如果您需要显式的元组列表,请使用list(map(tuple, ...))转换:

df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
# {0: [(23, 1)], 
#  1: [(5, 2), (2, 3), (19, 5)], 
#  2: [(56, 1), (22, 2)], 
#  3: [(2, 4), (14, 5)], 
#  4: [(59, 1)], 
#  5: [(44, 1), (1, 2), (87, 3)]}
Run Code Online (Sandbox Code Playgroud)


jpp*_*jpp 8

一种方法是创建一个新tup列,然后创建字典。

df['tup'] = list(zip(df['Column2'], df['Column3']))
df.groupby('Column1')['tup'].apply(list).to_dict()

# {0: [(23, 1)],
#  1: [(5, 2), (2, 3), (19, 5)],
#  2: [(56, 1), (22, 2)],
#  3: [(2, 4), (14, 5)],
#  4: [(59, 1)],
#  5: [(44, 1), (1, 2), (87, 3)]}
Run Code Online (Sandbox Code Playgroud)

@Psidom 的解决方案更有效,但如果性能不是问题,请使用对您更有意义的解决方案:

df = pd.concat([df]*10000)

def jp(df):
    df['tup'] = list(zip(df['Column2'], df['Column3']))
    return df.groupby('Column1')['tup'].apply(list).to_dict()

def psi(df):
    return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()

%timeit jp(df)   # 110ms
%timeit psi(df)  # 80ms
Run Code Online (Sandbox Code Playgroud)