Mic*_*hes 7 python dictionary dataframe pandas jupyter
type(Table)
pandas.core.frame.DataFrame
Table
======= ======= =======
Column1 Column2 Column3
0 23 1
1 5 2
1 2 3
1 19 5
2 56 1
2 22 2
3 2 4
3 14 5
4 59 1
5 44 1
5 1 2
5 87 3
Run Code Online (Sandbox Code Playgroud)
对于任何有熊猫的人,如何使用该.groupby()方法构建一个多值字典?
我想输出类似于这种格式:
{
0: [(23,1)]
1: [(5, 2), (2, 3), (19, 5)]
# etc...
}
Run Code Online (Sandbox Code Playgroud)
其中Col1值表示为键,对应的Col2和Col3是元组打包到每个Col1键的数组中.
我的语法只用于将一列合并到.groupby():
Table.groupby('Column1')['Column2'].apply(list).to_dict()
# Result as expected
{
0: [23],
1: [5, 2, 19],
2: [56, 22],
3: [2, 14],
4: [59],
5: [44, 1, 87]
}
Run Code Online (Sandbox Code Playgroud)
但是,为索引指定多个值会导致返回值的列名:
Table.groupby('Column1')[('Column2', 'Column3')].apply(list).to_dict()
# Result has column namespace as array value
{
0: ['Column2', 'Column3'],
1: ['Column2', 'Column3'],
2: ['Column2', 'Column3'],
3: ['Column2', 'Column3'],
4: ['Column2', 'Column3'],
5: ['Column2', 'Column3']
}
Run Code Online (Sandbox Code Playgroud)
我如何返回值数组中的元组列表?
Psi*_*dom 10
自定义您使用的功能,apply以便返回每个组的列表列表:
df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: g.values.tolist()).to_dict()
# {0: [[23, 1]],
# 1: [[5, 2], [2, 3], [19, 5]],
# 2: [[56, 1], [22, 2]],
# 3: [[2, 4], [14, 5]],
# 4: [[59, 1]],
# 5: [[44, 1], [1, 2], [87, 3]]}
Run Code Online (Sandbox Code Playgroud)
如果您需要显式的元组列表,请使用list(map(tuple, ...))转换:
df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
# {0: [(23, 1)],
# 1: [(5, 2), (2, 3), (19, 5)],
# 2: [(56, 1), (22, 2)],
# 3: [(2, 4), (14, 5)],
# 4: [(59, 1)],
# 5: [(44, 1), (1, 2), (87, 3)]}
Run Code Online (Sandbox Code Playgroud)
一种方法是创建一个新tup列,然后创建字典。
df['tup'] = list(zip(df['Column2'], df['Column3']))
df.groupby('Column1')['tup'].apply(list).to_dict()
# {0: [(23, 1)],
# 1: [(5, 2), (2, 3), (19, 5)],
# 2: [(56, 1), (22, 2)],
# 3: [(2, 4), (14, 5)],
# 4: [(59, 1)],
# 5: [(44, 1), (1, 2), (87, 3)]}
Run Code Online (Sandbox Code Playgroud)
@Psidom 的解决方案更有效,但如果性能不是问题,请使用对您更有意义的解决方案:
df = pd.concat([df]*10000)
def jp(df):
df['tup'] = list(zip(df['Column2'], df['Column3']))
return df.groupby('Column1')['tup'].apply(list).to_dict()
def psi(df):
return df.groupby('Column1')[['Column2', 'Column3']].apply(lambda g: list(map(tuple, g.values.tolist()))).to_dict()
%timeit jp(df) # 110ms
%timeit psi(df) # 80ms
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
3687 次 |
| 最近记录: |