熊猫:通过键获得第一次分组

Nov*_*oll 5 python pandas

如果我有以下数据帧

| id | timestamp           | code | id2
| 10 | 2017-07-12 13:37:00 | 206  | a1
| 10 | 2017-07-12 13:40:00 | 206  | a1
| 10 | 2017-07-12 13:55:00 | 206  | a1
| 10 | 2017-07-12 19:00:00 | 206  | a2
| 11 | 2017-07-12 13:37:00 | 206  | a1
...
Run Code Online (Sandbox Code Playgroud)

我需要按id, id2列分组并获得第一次出现的timestamp值,例如id=10, id2=a1, timestamp=2017-07-12 13:37:00.

我用Google搜索并发现了一些可能的解决方案,但无法弄清楚如何正确实现它们.这可能应该是这样的:

df.groupby(["id", "id2"])["timestamp"].apply(lambda x: ....)
Run Code Online (Sandbox Code Playgroud)

jez*_*ael 7

我想你需要GroupBy.first:

df.groupby(["id", "id2"])["timestamp"].first()
Run Code Online (Sandbox Code Playgroud)

或者drop_duplicates:

df.drop_duplicates(subset=['id','id2'])
Run Code Online (Sandbox Code Playgroud)

对于相同的输出:

df1 = df.groupby(["id", "id2"], as_index=False)["timestamp"].first()
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00

df1 = df.drop_duplicates(subset=['id','id2'])[['id','id2','timestamp']]
print (df1)
   id id2            timestamp
0  10  a1  2017-07-12 13:37:00
1  10  a2  2017-07-12 19:00:00
2  11  a1  2017-07-12 13:37:00
Run Code Online (Sandbox Code Playgroud)