Okt*_*ner 5 python dictionary tuples dataframe pandas
我有一个如下所示的数据框
user item \
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters
rating
0 1
1 2
2 1
3 1
4 1
Run Code Online (Sandbox Code Playgroud)
并希望实现以下结构:
dict-> list of tuples
user-> (item, rating)
b80344d063b5ccb3212f76538f3d9e43d87dca9e -> list((The Cove - Jack
Johnson, 1), ... , )
Run Code Online (Sandbox Code Playgroud)
我可以:
item_set = dict((user, set(items)) for user, items in \
data.groupby('user')['item'])
Run Code Online (Sandbox Code Playgroud)
但这只能让我半途而废。如何从 groupby 中获取相应的“评级”值?
设置user为索引,使用转换为元组,使用df.apply分组索引df.groupby(level=0)并使用获取列表dfGroupBy.agg并使用转换为字典df.to_dict:
In [1417]: df
Out[1417]:
user item \
0 b80344d063b5ccb3212f76538f3d9e43d87dca9e The Cove - Jack Johnson
1 b80344d063b5ccb3212f76538f3d9e43d87dca9e Entre Dos Aguas - Paco De Lucia
2 b80344d063b5ccb3212f76538f3d9e43d87dca9e Stronger - Kanye West
3 b80344d063b5ccb3212f76538f3d9e43d87dca9e Constellations - Jack Johnson
4 b80344d063b5ccb3212f76538f3d9e43d87dca9e Learn To Fly - Foo Fighters
rating
0 1
1 2
2 2
3 2
4 2
In [1418]: df.set_index('user').apply(tuple, 1)\
.groupby(level=0).agg(lambda x: list(x.values))\
.to_dict()
Out[1418]:
{'b80344d063b5ccb3212f76538f3d9e43d87dca9e': [('The Cove - Jack Johnson', 1),
('Entre Dos Aguas - Paco De Lucia', 2),
('Stronger - Kanye West', 2),
('Constellations - Jack Johnson', 2),
('Learn To Fly - Foo Fighters', 2)]}
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
397 次 |
| 最近记录: |