xia*_*dai 6 python dataframe pandas
我想使用 Pandasn将每个组的最后一行按变量排序var_to_sort。
这是我会怎么做,现在,我想组下面的数据帧通过name,然后sort通过date再利用tail(n),以获得最后n通过,组中的元素。
data = [
['tom', date(2018,2,1), "I want this"],
['tom', date(2018,1,1), "Don't want"],
['nick', date(2019,4,1), "Don't want"],
['nick', date(2019,5,1), "I want this"]]
# Create the pandas DataFrame
df = pd.DataFrame(data)
df.columns = ["names", "date", "result"]
# sort it
df.sort_values("date", inplace=True)
df.groupby("names").tail(1)
Run Code Online (Sandbox Code Playgroud)
有没有更有效的方法来做到这一点?如果数据集被索引"date"或["date", "name"]已经被索引怎么办?
我认为你的解决方案很好,也可以使用,sort_values无需inplace链码的情况下一起使用。
对于另一个问题:
data = [
['tom', date(2018,2,1), "I want this"],
['tom', date(2018,1,1), "Don't want"],
['nick', date(2019,4,1), "Don't want"],
['nick', date(2019,5,1), "I want this"]]
# Create the pandas DataFrame
df = pd.DataFrame(data)
df.columns = ["names", "date", "result"]
Run Code Online (Sandbox Code Playgroud)
df1 = df.sort_values("date").groupby("names").tail(1)
print (df1)
names date result
0 tom 2018-02-01 I want this
3 nick 2019-05-01 I want this
Run Code Online (Sandbox Code Playgroud)
df2 = df.set_index('date')
print (df2)
names result
date
2018-02-01 tom I want this
2018-01-01 tom Don't want
2019-04-01 nick Don't want
2019-05-01 nick I want this
df22 = df2.sort_index().groupby("names").tail(1)
print (df22)
names result
date
2018-02-01 tom I want this
2019-05-01 nick I want this
Run Code Online (Sandbox Code Playgroud)
df3 = df.set_index(['date','names'])
print (df3)
result
date names
2018-02-01 tom I want this
2018-01-01 tom Don't want
2019-04-01 nick Don't want
2019-05-01 nick I want this
df33 = df3.sort_index().groupby(level=1).tail(1)
print (df33)
result
date names
2018-02-01 tom I want this
2019-05-01 nick I want this
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
1096 次 |
| 最近记录: |