pandas:如何保持每个组的最后`n`条记录按另一个变量排序?

xia*_*dai 6 python dataframe pandas

我想使用 Pandasn将每个组的最后一行按变量排序var_to_sort

这是我会怎么做,现在,我想组下面的数据帧通过name,然后sort通过date再利用tail(n),以获得最后n通过,组中的元素。

data = [
    ['tom', date(2018,2,1), "I want this"],
    ['tom', date(2018,1,1), "Don't want"],
    ['nick', date(2019,4,1), "Don't want"],
    ['nick', date(2019,5,1), "I want this"]]

# Create the pandas DataFrame
df = pd.DataFrame(data)
df.columns = ["names", "date", "result"]

# sort it
df.sort_values("date", inplace=True)

df.groupby("names").tail(1)
Run Code Online (Sandbox Code Playgroud)

有没有更有效的方法来做到这一点?如果数据集被索引"date"["date", "name"]已经被索引怎么办?

jez*_*ael 6

我认为你的解决方案很好,也可以使用,sort_values无需inplace链码的情况下一起使用。

对于另一个问题:

data = [
    ['tom', date(2018,2,1), "I want this"],
    ['tom', date(2018,1,1), "Don't want"],
    ['nick', date(2019,4,1), "Don't want"],
    ['nick', date(2019,5,1), "I want this"]]

# Create the pandas DataFrame
df = pd.DataFrame(data)
df.columns = ["names", "date", "result"]
Run Code Online (Sandbox Code Playgroud)
df1 = df.sort_values("date").groupby("names").tail(1)
print (df1)
  names        date       result
0   tom  2018-02-01  I want this
3  nick  2019-05-01  I want this
Run Code Online (Sandbox Code Playgroud)
df2 = df.set_index('date')
print (df2)
           names       result
date                         
2018-02-01   tom  I want this
2018-01-01   tom   Don't want
2019-04-01  nick   Don't want
2019-05-01  nick  I want this

df22 = df2.sort_index().groupby("names").tail(1)
print (df22)
           names       result
date                         
2018-02-01   tom  I want this
2019-05-01  nick  I want this
Run Code Online (Sandbox Code Playgroud)
df3 = df.set_index(['date','names'])
print (df3)
                       result
date       names             
2018-02-01 tom    I want this
2018-01-01 tom     Don't want
2019-04-01 nick    Don't want
2019-05-01 nick   I want this

df33 = df3.sort_index().groupby(level=1).tail(1)
print (df33)
                       result
date       names             
2018-02-01 tom    I want this
2019-05-01 nick   I want this
Run Code Online (Sandbox Code Playgroud)