jea*_*elj 2 python conditional if-statement pandas
使用以下python pandas dataframe df:
df = pd.DataFrame({'transaction_id': ['A123','A123','B345','B345','C567','C567','D678','D678'],
'product_id': [255472, 251235, 253764,257344,221577,209809,223551,290678],
'product_category': ['X','X','Y','Y','X','Y','Y','X']})
transaction_id | product_id | product_category
A123 255472 X
A123 251235 X
B345 253764 Y
B345 257344 Y
C567 221577 X
C567 209809 Y
D678 223551 Y
D678 290678 X
Run Code Online (Sandbox Code Playgroud)
我需要添加另一列"transaction_category",它查看transaction_id以及transaction_id中的哪些产品类别.这是我要找的输出:
transaction_id | product_id | product_category | transaction_id
123 255472 X X only
123 251235 X X only
345 253764 Y Y only
345 257344 Y Y only
567 221577 X X & Y
567 209809 Y X & Y
678 223551 Y X & Y
678 290678 X X & Y
Run Code Online (Sandbox Code Playgroud)
请注意,我有,我没有使用我的数据框等栏目,所以我想我需要开始用grouby?
df2 = df.groupby(['transaction_id','product_category']).reset_index()
Run Code Online (Sandbox Code Playgroud)
IIUC通过使用transform和join
df.groupby('transaction_id').product_category.transform(lambda x : '&'.join(set(x)))
Out[468]:
0 X
1 X
2 Y
3 Y
4 X&Y
5 X&Y
6 X&Y
7 X&Y
Name: product_category, dtype: object
Run Code Online (Sandbox Code Playgroud)
来自scott匹配您的预期出局:
df['transaction_category']=df.groupby('transaction_id')['product_category'].transform(lambda x: x + ' only' if len(set(x)) < 2 else ' & '.join(set(x)))
df
Out[479]:
product_category product_id transaction_id transaction_category
0 X 255472 A123 X only
1 X 251235 A123 X only
2 Y 253764 B345 Y only
3 Y 257344 B345 Y only
4 X 221577 C567 X & Y
5 Y 209809 C567 X & Y
6 Y 223551 D678 X & Y
7 X 290678 D678 X & Y
Run Code Online (Sandbox Code Playgroud)