Nil*_*age 110 python dataframe pandas
我有一只DataFrame像熊猫一样的熊猫.
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4,5,6,6,6,7,7],
'value' : ["first","second","second","first",
"second","first","third","fourth",
"fifth","second","fifth","first",
"first","second","third","fourth","fifth"]})
Run Code Online (Sandbox Code Playgroud)
我想通过["id","value"]对此进行分组,并得到每个组的第一行.
id value
0 1 first
1 1 second
2 1 second
3 2 first
4 2 second
5 3 first
6 3 third
7 3 fourth
8 3 fifth
9 4 second
10 4 fifth
11 5 first
12 6 first
13 6 second
14 6 third
15 7 fourth
16 7 fifth
Run Code Online (Sandbox Code Playgroud)
预期结果
id value
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth
Run Code Online (Sandbox Code Playgroud)
我试过以下只给出了第一行DataFrame.对此有任何帮助表示赞赏.
In [25]: for index, row in df.iterrows():
....: df2 = pd.DataFrame(df.groupby(['id','value']).reset_index().ix[0])
Run Code Online (Sandbox Code Playgroud)
Rom*_*kar 195
>>> df.groupby('id').first()
value
id
1 first
2 first
3 first
4 second
5 first
6 first
7 fourth
Run Code Online (Sandbox Code Playgroud)
如果您需要id列:
>>> df.groupby('id').first().reset_index()
id value
0 1 first
1 2 first
2 3 first
3 4 second
4 5 first
5 6 first
6 7 fourth
Run Code Online (Sandbox Code Playgroud)
要获得n个第一个记录,可以使用head():
>>> df.groupby('id').head(2).reset_index(drop=True)
id value
0 1 first
1 1 second
2 2 first
3 2 second
4 3 first
5 3 third
6 4 second
7 4 fifth
8 5 first
9 6 first
10 6 second
11 7 fourth
12 7 fifth
Run Code Online (Sandbox Code Playgroud)
小智 46
这将为您提供每组的第二行(零索引,nth(0)与first()相同):
df.groupby('id').nth(1)
Run Code Online (Sandbox Code Playgroud)
文档:http://pandas.pydata.org/pandas-docs/stable/groupby.html#taking-the-nth-row-of-each-group
vit*_*dml 24
我建议使用.nth(0)而不是.first()你需要获得第一行.
它们之间的区别在于它们如何处理NaN,因此.nth(0)无论此行中的值是什么,都将返回组的第一行,而.first()最终将返回每列中的第一个非 NaN值.
例如,如果您的数据集是:
df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,4,4],
'value' : ["first","second","third", np.NaN,
"second","first","second","third",
"fourth","first","second"]})
>>> df.groupby('id').nth(0)
value
id
1 first
2 NaN
3 first
4 first
Run Code Online (Sandbox Code Playgroud)
和
>>> df.groupby('id').first()
value
id
1 first
2 second
3 first
4 first
Run Code Online (Sandbox Code Playgroud)
如果您只需要我们可以使用的每个组的第一行drop_duplicates,请注意函数默认方法keep='first'。
df.drop_duplicates('id')
Out[1027]:
id value
0 1 first
3 2 first
5 3 first
9 4 second
11 5 first
12 6 first
15 7 fourth
Run Code Online (Sandbox Code Playgroud)
也许这就是你想要的
import pandas as pd
idx = pd.MultiIndex.from_product([['state1','state2'], ['county1','county2','county3','county4']])
df = pd.DataFrame({'pop': [12,15,65,42,78,67,55,31]}, index=idx)
Run Code Online (Sandbox Code Playgroud)
Run Code Online (Sandbox Code Playgroud)pop state1 county1 12 county2 15 county3 65 county4 42 state2 county1 78 county2 67 county3 55 county4 31
df.groupby(level=0, group_keys=False).apply(lambda x: x.sort_values('pop', ascending=False)).groupby(level=0).head(3)
> Out[29]:
pop
state1 county3 65
county4 42
county2 15
state2 county1 78
county2 67
county3 55
Run Code Online (Sandbox Code Playgroud)