根据公共 ID 对元组列表中的项目进行分组

max*_*max 2 python

我有一个大型同义词数据集(10000+)作为元组列表,如下所示:

data = [
    (435347,'cat'),
    (435347,'feline'),
    (435347,'lion'),
    (6765756,'dog'),
    (6765756,'hound'),
    (6765756,'puppy'),
    (435347,'kitten'),
    (987977,'frog')
]
Run Code Online (Sandbox Code Playgroud)

其中每个同义词由任意共享 ID 标识,在本例中为4353476765756987977

我想编写一个函数,使数据看起来像这样:

processed_data = [
    (435347,'cat','feline','lion','kitten'),
    (6765756,'dog','hound','puppy'),
    (987977,'frog')
]
Run Code Online (Sandbox Code Playgroud)

任何建议将不胜感激!

小智 5

尝试这个:

groups = {}

for x, y in data:
    group = groups.get(x, [])
    group.append(y)
    groups[x] = group

print(groups)
Run Code Online (Sandbox Code Playgroud)

输出:

{987977: ['frog'], 435347: ['cat', 'feline', 'lion', 'kitten'], 6765756: ['dog', 'hound', 'puppy']}
Run Code Online (Sandbox Code Playgroud)