5 python where-clause dataframe pandas
所以我有三个数据框:X、Y 和事件。df_X 有 X 坐标,df_Y 有 Y 坐标,Events_df 有已发生事件的列表(数据与篮球相关)。通过查看以下内容,您将了解它们如何链接在一起:
df_Event:
Seconds Passed Event Type Player
1.0 Passed The Ball Steve
2.0 Received Pass Michael
3.0 Touch Michael
4.0 Passed The Ball Michael
5.0 Received The Ball George
df_X:
Seconds Passed Steve Michael George
1.0 11.43 12.33 15.33
2.0 11.45 12.46 13.22
3.0 10.99 10.33 14.33
4.0 11.34 10.36 11.22
5.0 12.43 12.22 11.78
df_Y:
....
(The Same As Above Just With Different Numbers)
Run Code Online (Sandbox Code Playgroud)
我想记录随时间变化的事件模式,然后获取与每个数据帧的“通过秒数”列相对应的 X、Y 坐标。例如,如果我想知道通行证的开始和结束位置,我将需要以下信息。
我希望在标记为“Passes_df”的新数据框中包含以下信息:
Passing Player Receiving Player X Coordinate PP Y Coordinate PP X Coordinate RP Y Coordinate RP
Steve Michael 11.43 .... 12.46 .....
Run Code Online (Sandbox Code Playgroud)
我知道我可以使用以下内容:
Passes_df['Passing Player'] = df_Event['Player'].where(df_Event['Event'] == 'Pass').dropna()
Passes_df['Receiving Player'] = df_Event['Player'].shift(-1).where\
((df_Event['Event'] == 'Pass') & (df_Event['Event'].shift(-1) == 'Received Pass'))
Run Code Online (Sandbox Code Playgroud)
不过,这似乎太啰嗦了?我可以使用一个函数来更流畅地从每个来源挑选信息吗?一些帮助将不胜感激!
您可以pandas.pivot(...)
为此使用:
#assuming it's sorted by Seconds Passed:
df_Event["Event_order"]=df_Event.groupby("Event Type").cumcount()
df_Event["X"]=df_Event.merge(df_X, on="Seconds Passed").apply(lambda x: x[x["Player"]], axis=1)
df_Event["Y"]=df_Event.merge(df_Y, on="Seconds Passed").apply(lambda x: x[x["Player"]], axis=1)
df=df_Event.pivot(index="Event_order", columns="Event Type", values=["Player", "X", "Y"])
#to flatten columns index:
df.columns=map(lambda x: "_".join(x), df.columns)
Run Code Online (Sandbox Code Playgroud)