汇总数据并获得总和和计数

Joh*_*Doe 4 python group-by aggregate pandas

我在python中有一个有很多行的对象:

输入:

    Team1     Player1     idTrip13     133
    Team2     Player333   idTrip10     18373
    Team3     Player22    idTrip12     17338899
    Team2     Player293   idTrip02     17656
    Team3     Player20    idTrip11     1883
    Team1     Player1     idTrip19     19393
Run Code Online (Sandbox Code Playgroud)

我需要聚合这些数据(如数据透视表).

OUTPUT我正在努力:

Team1   Player1 : 2 trips : sum(133+19393)
Team2   Player333 : 1 trip : 18373; Player293 : 1 trip : 17656
Team3   Player22 : 1 trip : 17338899; Player20 : 1 trip : 1883
Run Code Online (Sandbox Code Playgroud)

有人可以建议在Python中使用适当的对象,以便我可以有以下输出吗?

print team, player, trips, time
Run Code Online (Sandbox Code Playgroud)

ily*_*nam 8

groupand函数使用pandas DataFrames

  1. 将数据放入列表列表中,每个内部列表将是数据帧中的一行.

    In[1]:
    
    mydata = [['Team1', 'Player1', 'idTrip13', 133], ['Team2', 'Player333', 'idTrip10', 18373],
    ['Team3', 'Player22', 'idTrip12', 17338899], ['Team2', 'Player293','idTrip02', 17656], 
    ['Team3', 'Player20', 'idTrip11', 1883], ['Team1', 'Player1', 'idTrip19', 19393]]
    
    df = pd.DataFrame(mydata, columns = ['team', 'player', 'trips', 'time'])
    
    df
    Out[1]:
         team    player       trips      time
    0   Team1   Player1     idTrip13    133
    1   Team2   Player333   idTrip10    18373
    2   Team3   Player22    idTrip12    17338899
    3   Team2   Player293   idTrip02    17656
    4   Team3   Player20    idTrip11    1883
    5   Team1   Player1     idTrip19    19393
    
    Run Code Online (Sandbox Code Playgroud)
  2. 调用groupby(),传递您希望用作石斑鱼的列,并将功能应用于组.


例子

防爆.1查找每个团队进行的旅行次数.team是石斑鱼,我们count()在柱上应用函数['trips'].

In[2]:
trip_count = df.groupby(by = ['team'])['trips'].count() 

trip_count              
Out[2]:          

 team
Team1    2
Team2    2
Team3    2
Name: trips, dtype: int64
Run Code Online (Sandbox Code Playgroud)

防爆.2(多列):查找团队中每位玩家所花费的总时间.我们使用2列['team', 'player']作为石斑鱼,并sum()在列上应用该功能['time'].

In[3]:              
trip_time = df.groupby(by = ['team', 'player'])['time'].sum() 

trip_time        
Out[3]:

 team   player   
Team1  Player1         19526
Team2  Player293       17656
       Player333       18373
Team3  Player20         1883
       Player22     17338899
Name: time, dtype: int64
Run Code Online (Sandbox Code Playgroud)

防爆.3 (多种功能):对于团队中的每个玩家,查找旅行总次数和旅行总时间.

player_total = df.groupby(by = ['team', 'player']).agg({'time' : 'sum', 'trips' : 'count'})

player_total
Out[4]:
                 trips  time
team    player      
Team1   Player1     2   19526
Team2   Player293   1   17656
        Player333   1   18373
Team3   Player20    1   1883
        Player22    1   17338899
Run Code Online (Sandbox Code Playgroud)