Pandas 数据框：从数据框的子集中获取值对

Question

Pandas 数据框：从数据框的子集中获取值对

我有一个 df：

df = pd.DataFrame({'id': [1, 1, 2, 2, 2, 3, 4, 4, 4], \
                    "name": ["call", "response", "call", "call", "response", "call", "call", "response", "response"]})

Run Code Online (Sandbox Code Playgroud)

    id  name
0   1   call
1   1   response
2   2   call
3   2   call
4   2   response
5   3   call
6   4   call
7   4   response
8   4   response

Run Code Online (Sandbox Code Playgroud)

我正在尝试提取一个呼叫-响应对，其中呼叫后的第一个响应是正确的模式。调用和响应对位于它们自己的子集中，id如下所示：

    id  name
0   1   call
1   1   response
3   2   call
4   2   response
6   4   call
7   4   response

Run Code Online (Sandbox Code Playgroud)

理想情况下，我会将保留indexes在数据框中，以便稍后可以df.loc与索引一起使用。

我尝试过的是遍历df子集和apply某些内容或使用rolling window. 但只成功得到错误。

unique_ids = df.id.unique()

for unique_id in unique_ids :
    df.query('id== @unique_id').apply(something))

Run Code Online (Sandbox Code Playgroud)

我还没有发现可以专门用于subsetsdataframe 的东西

Answer 1

jez*_*ael 5

DataFrameGroupBy.shift与比较值一起使用来Series.eq检查相等性并过滤boolean indexing：

m1 = df['name'].eq('call') & df.groupby('id')['name'].shift(-1).eq('response')
m2 = df['name'].eq('response') & df.groupby('id')['name'].shift().eq('call')
df2 = df[m1 | m2]

print (df2)
   id      name
0   1      call
1   1  response
3   2      call
4   2  response
6   4      call
7   4  response

Run Code Online (Sandbox Code Playgroud)

归档时间：	4 年，7 月前
查看次数：	461 次
最近记录：	4 年前