如何访问多索引数据帧的最后一个元素

Question

如何访问多索引数据帧的最后一个元素

我有一个带有 ID 和时间戳的数据框作为多索引。数据框中的索引按 ID 和时间戳排序，我想为每个 ID 选择最新的时间戳。例如：

IDs    timestamp     value
0      2010-10-30     1
       2010-11-30     2
1      2000-01-01     300
       2007-01-01     33
       2010-01-01     400
2      2000-01-01     11

Run Code Online (Sandbox Code Playgroud)

所以基本上我想要的结果是

IDs    timestamp    value
0      2010-11-30   2
1      2010-01-01   400
2      2000-01-01   11

Run Code Online (Sandbox Code Playgroud)

在熊猫中执行此操作的命令是什么？

Answer 1

unu*_*tbu 5

鉴于此设置：

import pandas as pd
import numpy as np
import io

content = io.BytesIO("""\
IDs    timestamp     value
0      2010-10-30     1
0      2010-11-30     2
1      2000-01-01     300
1      2007-01-01     33
1      2010-01-01     400
2      2000-01-01     11""")

df = pd.read_table(content, header=0, sep='\s+', parse_dates=[1])
df.set_index(['IDs', 'timestamp'], inplace=True)

Run Code Online (Sandbox Code Playgroud)

使用reset_index后跟groupby

df.reset_index(['timestamp'], inplace=True)
print(df.groupby(level=0).last())

Run Code Online (Sandbox Code Playgroud)

产量

              timestamp  value
IDs                           
0   2010-11-30 00:00:00      2
1   2010-01-01 00:00:00    400
2   2000-01-01 00:00:00     11

Run Code Online (Sandbox Code Playgroud)

然而，这并不是最好的解决方案。应该有一种方法可以在不打电话的情况下做到这一点reset_index......

正如您在评论中指出的，last忽略 NaN 值。要不跳过 NaN 值，您可以groupby/agg像这样使用：

df.reset_index(['timestamp'], inplace=True)
grouped = df.groupby(level=0)
print(grouped.agg(lambda x: x.iloc[-1]))

Run Code Online (Sandbox Code Playgroud)

归档时间：	12 年，3 月前
查看次数：	1486 次
最近记录：	8 年前