Chr*_*ris 2 python group-by dataframe pandas pandas-groupby
我有一个看起来像这样的DataFrame:
df = pd.DataFrame({'ID':[1,1,2,2,3,4],'Name':['John Doe','Jane Doe','John Smith','Jane Smith','Jack Hill','Jill Hill']})
ID Name
0 1 John Doe
1 1 Jane Doe
2 2 John Smith
3 2 Jane Smith
4 3 Jack Hill
5 4 Jill Hill
Run Code Online (Sandbox Code Playgroud)
然后我按ID添加了另一个列分组并获取Name中的唯一值:
df['Multi Name'] = df.groupby('ID')['Name'].transform('unique')
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]
Run Code Online (Sandbox Code Playgroud)
如何从多名称中删除括号?
我试过了:
df['Multi Name'] = df['Multi Name'].str.strip('[]')
ID Name Multi Name
0 1 John Doe NaN
1 1 Jane Doe NaN
2 2 John Smith NaN
3 2 Jane Smith NaN
4 3 Jack Hill NaN
5 4 Jill Hill NaN
Run Code Online (Sandbox Code Playgroud)
期望的输出:
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Run Code Online (Sandbox Code Playgroud)
看起来这里unique是错误的功能选择.我建议使用自定义lambda函数str.join:
df['Multi Name'] = df.groupby('ID')['Name'].transform(lambda x: ', '.join(set(x)))
Run Code Online (Sandbox Code Playgroud)
df
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Run Code Online (Sandbox Code Playgroud)
transformdf.join(df.groupby('ID').Name.transform('unique').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe [John Doe, Jane Doe]
1 1 Jane Doe [John Doe, Jane Doe]
2 2 John Smith [John Smith, Jane Smith]
3 2 Jane Smith [John Smith, Jane Smith]
4 3 Jack Hill [Jack Hill]
5 4 Jill Hill [Jill Hill]
Run Code Online (Sandbox Code Playgroud)
df.join(df.groupby('ID').Name.transform('unique').str.join(', ').rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Run Code Online (Sandbox Code Playgroud)
mapdf.join(df.ID.map(df.groupby('ID').Name.unique().str.join(', ')).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe John Doe, Jane Doe
1 1 Jane Doe John Doe, Jane Doe
2 2 John Smith John Smith, Jane Smith
3 2 Jane Smith John Smith, Jane Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Run Code Online (Sandbox Code Playgroud)
itertools.groupbyfrom itertools import groupby
d = {
k: ', '.join(x[1] for x in v)
for k, v in groupby(sorted(set(zip(df.ID, df.Name))), key=lambda x: x[0])
}
df.join(df.ID.map(d).rename('Multi Name'))
ID Name Multi Name
0 1 John Doe Jane Doe, John Doe
1 1 Jane Doe Jane Doe, John Doe
2 2 John Smith Jane Smith, John Smith
3 2 Jane Smith Jane Smith, John Smith
4 3 Jack Hill Jack Hill
5 4 Jill Hill Jill Hill
Run Code Online (Sandbox Code Playgroud)