Chr*_*s B 5 python dataframe pandas pandas-groupby
I've got a date-ordered dataframe that can be grouped. What I am attempting to do is groupby a variable (Person), determine the maximum (weight) for each group (person), and then drop all rows that come after (date) the maximum.
Here's an example of the data:
df = pd.DataFrame({'Person': 1,1,1,1,1,2,2,2,2,2],'Date': '1/1/2015','2/1/2015','3/1/2015','4/1/2015','5/1/2015','6/1/2011','7/1/2011','8/1/2011','9/1/2011','10/1/2011'], 'MonthNo':[1,2,3,4,5,1,2,3,4,5], 'Weight':[100,110,115,112,108,205,210,211,215,206]})
Date MonthNo Person Weight
0 1/1/2015 1 1 100
1 2/1/2015 2 1 110
2 3/1/2015 3 1 115
3 4/1/2015 4 1 112
4 5/1/2015 5 1 108
5 6/1/2011 1 2 205
6 7/1/2011 2 2 210
7 8/1/2011 3 2 211
8 9/1/2011 4 2 215
9 10/1/2011 5 2 206
Run Code Online (Sandbox Code Playgroud)
Here's what I want the result to look like:
Date MonthNo Person Weight
0 1/1/2015 1 1 100
1 2/1/2015 2 1 110
2 3/1/2015 3 1 115
5 6/1/2011 1 2 205
6 7/1/2011 2 2 210
7 8/1/2011 3 2 211
8 9/1/2011 4 2 215
Run Code Online (Sandbox Code Playgroud)
I think its worth noting, there can be disjoint start dates and the maximum may appear at different times.
My idea was to find the maximum for each group, obtain the MonthNo the maximum was in for that group, and then discard any rows with MonthNo greater Max Weight MonthNo. So far I've been able to obtain the max by group, but cannot get past doing a comparison based on that.
Please let me know if I can edit/provide more information, haven't posted many questions here! Thanks for the help, sorry if my formatting/question isn't clear.
Using idxmax with groupby
df.groupby('Person',sort=False).apply(lambda x : x.reset_index(drop=True).iloc[:x.reset_index(drop=True).Weight.idxmax()+1,:])
Out[131]:
Date MonthNo Person Weight
Person
1 0 1/1/2015 1 1 100
1 2/1/2015 2 1 110
2 3/1/2015 3 1 115
2 0 6/1/2011 1 2 205
1 7/1/2011 2 2 210
2 8/1/2011 3 2 211
3 9/1/2011 4 2 215
Run Code Online (Sandbox Code Playgroud)
您可以groupby.transform与 一起使用idxmax。前 2 个步骤可能不是必需的,具体取决于数据框的结构。
# convert Date to datetime
df['Date'] = pd.to_datetime(df['Date'])
# sort by Person and Date to make index usable for next step
df = df.sort_values(['Person', 'Date']).reset_index(drop=True)
# filter for index less than idxmax transformed by group
df = df[df.index <= df.groupby('Person')['Weight'].transform('idxmax')]
print(df)
Date MonthNo Person Weight
0 2015-01-01 1 1 100
1 2015-02-01 2 1 110
2 2015-03-01 3 1 115
5 2011-06-01 1 2 205
6 2011-07-01 2 2 210
7 2011-08-01 3 2 211
8 2011-09-01 4 2 215
Run Code Online (Sandbox Code Playgroud)