我正在尝试创建一个由每个ID的唯一值组成的列(每个ID都有与其关联的许多行),如果该ID的任何行都与已回答的标签相关联,则所有与该ID关联的行都应标记为回答。如果与ID关联的所有行都具有未回答的标记,则所有行都应标记为未回答(当前发生的情况)
这是我编写的代码:
将numpy导入为np
conds = [file.data__answered_at.isna(),file.data__answered_at.notna()]
choices = ["not answered","answered"]
file['call_status'] = np.select(conds,choices,default=np.nan)
data__id call_status rank
1 answered 1
1 not_answered 2
1 answered 3
2 not_answered 1
2 answered 2
3 not_answered 1
4 answered 1
4 not_answered 2
5 not_answered 1
5 not_answered 2
Run Code Online (Sandbox Code Playgroud)
在这种情况下,期望的结果将是
data__id call_status rank
1 answered 1
1 answered 2
1 answered 3
2 answered 1
2 answered 2
3 not_answered 1
4 answered 1
4 answered 2
5 not_answered 1
5 not_answered 2
Run Code Online (Sandbox Code Playgroud)
每组至少要GroupBy.transform与with一起使用,并通过以下方式设置值:GroupBy.anyansweredDataFrame.loc
mask = df['call_status'].eq('answered').groupby(df['data__id']).transform('any')
Run Code Online (Sandbox Code Playgroud)
或data__id通过另一列过滤所有内容并通过Series.isin以下方式测试成员资格:
mask = df['data__id'].isin(df.loc[df['call_status'].eq('answered'), 'data__id'].unique())
Run Code Online (Sandbox Code Playgroud)
df.loc[mask, 'call_status'] = 'answered'
print (df)
data__id call_status rank
0 1 answered 1
1 1 answered 2
2 1 answered 3
3 2 answered 1
4 2 answered 2
5 3 not_answered 1
6 4 answered 1
7 4 answered 2
8 5 not_answered 1
9 5 not_answered 2
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
37 次 |
| 最近记录: |