找到列中的最大数字

mar*_*rin 2 python dataframe python-3.x pandas pandas-groupby

我正在尝试找到具有最大编号的月份(列'月')(在DepDelay列中)

数据

flightID         Month  ArrTime ActualElapsedTime  DepDelay   ArrDelay
BBYYEUVY67527        1   1514.0               58.0       NA      64.0   
MUPXAQFN40227        1     37.0              120.0       13      52.0   
LQLYUIMN79169        1    916.0              166.0       NA     -25.0   
KTAMHIFO10843        1      NaN                NaN        5       NaN   
BOOXJTEY23623        1      NaN                NaN        4       NaN  
BBYYEUVY67527        2   1514.0               58.0       NA      64.0   
MUPXAQFN40227        2     37.0              120.0       NA      52.0   
LQLYUIMN79169        2    916.0              166.0       NA     -25.0   
KTAMHIFO10843        2      NaN                NaN       15       NaN   
BOOXJTEY23623        2      NaN                NaN        4       NaN  
Run Code Online (Sandbox Code Playgroud)

我试过了:

data = pd.read_csv('data.csv', sep='\t')

dep_delay = all_data.groupby(["Month"].DepDelay.count().max())

print(dep_delay)
Run Code Online (Sandbox Code Playgroud)

错误:

AttributeError                            Traceback (most recent call last)
<ipython-input-14-2ea6213009d6> in <module>()
----> 1 dep_delay = all_data.groupby(["Month"].DepDelay.count().max())
      2 
      3 print(dep_delay)

AttributeError: 'list' object has no attribute 'DepDelay'
Run Code Online (Sandbox Code Playgroud)

好的输出:

Month      DepDelay
    1            22
Run Code Online (Sandbox Code Playgroud)

jpp*_*jpp 5

您需要的sum不是count按组对值求和.这是使用GroupBy+ 的一种方式sum,然后idxmax:

res = df.groupby('Month')['DepDelay'].sum().reset_index()
res = res.loc[[res['DepDelay'].idxmax()]]

print(res)

   Month  DepDelay
0      1      22.0
Run Code Online (Sandbox Code Playgroud)

或者,您可以对组进行分组和排序,然后提取第一行:

res = df.groupby('Month')['DepDelay'].sum()\
        .sort_values(ascending=False).head(1)\
        .reset_index()

print(res)

   Month  DepDelay
0      1      22.0
Run Code Online (Sandbox Code Playgroud)