我有一个数据框('框架'),我想按国家和日期聚合:
aggregated=pd.DataFrame(frame.groupby(['Country','Date']).CaseID.count())
aggregated["Total duration"]=frame.groupby(['Country','Date']).Hours.sum()
aggregated["Mean duration"]=frame.groupby(['Country','Date']).Hours.mean()
Run Code Online (Sandbox Code Playgroud)
我想计算上述数字(总持续时间,平均持续时间等)仅针对'框架'中的正'小时'数字.我怎样才能做到这一点?
谢谢!
样本"框架"
import pandas as pd
Line1 = {"Country": "USA", "Date":"01 jan", "Hours":4}
Line2 = {"Country": "USA", "Date":"01 jan", "Hours":3}
Line3 = {"Country": "USA", "Date":"01 jan", "Hours":-999}
Line4 = {"Country": "Japan", "Date":"01 jan", "Hours":3}
pd.DataFrame([Line1,Line2,Line3,Line4])
Run Code Online (Sandbox Code Playgroud)
怎么样 -
frame[frame["Hours"] > 0].groupby(['Country','Date'])
Run Code Online (Sandbox Code Playgroud)
不像上面那么优雅,但处理不同的角落案例.df代表frame原始问题.
>>> df.groupby(['Country','Date']).agg(lambda x: x[x>0].mean())
Hours
Country Date
Japan 01 jan 3.0
USA 01 jan 3.5
>>> df.ix[3, 'Hours'] = -1
>>> df.groupby(['Country','Date']).agg(lambda x: x[x>0].mean())
Hours
Country Date
Japan 01 jan NaN
USA 01 jan 3.5
Run Code Online (Sandbox Code Playgroud)