Python Pandas 滚动聚合一列列表

clg*_*lg4 6 python group-by list pandas pandas-groupby

我有一个简单的数据框 df 与一列列表lists。我想基于lists.

df样子:

import pandas as pd
lists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
#create test dataframe
df=pd.DataFrame.from_dict(lists,orient='index')
df=df.rename(columns={0:'lists'})
df

          lists
1           [1]
2     [1, 2, 3]
3  [2, 9, 7, 9]
4  [2, 7, 3, 5]
Run Code Online (Sandbox Code Playgroud)

我想df看起来像这样:

df
Out[9]: 
          lists                 rolllists
1           [1]                       [1]
2     [1, 2, 3]              [1, 1, 2, 3]
3  [2, 9, 7, 9]     [1, 2, 3, 2, 9, 7, 9]
4  [2, 7, 3, 5]  [2, 9, 7, 9, 2, 7, 3, 5]
Run Code Online (Sandbox Code Playgroud)

基本上我想“求和”/append滚动 2 个列表。请注意,第 1 行,因为我只有 1 个列表1,所以 rolllists 就是该列表。但是在第 2 行,我有 2 个要附加的列表。那么对于三排,追加df[2].listsdf[3].lists等我有类似的事情以前工作过,引用此:熊猫数据框,列表的列,创纪录的差异创建的集累积目录栏,并记录
此外,如果我们可以在上面得到这部分,那么我想在 a 中执行此操作groupby(例如,下面的示例将是 1 组,因此例如在 中df可能看起来像这样groupby):

  Group         lists                 rolllists
1     A           [1]                       [1]
2     A     [1, 2, 3]              [1, 1, 2, 3]
3     A  [2, 9, 7, 9]     [1, 2, 3, 2, 9, 7, 9]
4     A  [2, 7, 3, 5]  [2, 9, 7, 9, 2, 7, 3, 5]
5     B           [1]                       [1]
6     B     [1, 2, 3]              [1, 1, 2, 3]
7     B  [2, 9, 7, 9]     [1, 2, 3, 2, 9, 7, 9]
8     B  [2, 7, 3, 5]  [2, 9, 7, 9, 2, 7, 3, 5]
Run Code Online (Sandbox Code Playgroud)

我尝试了各种类似 df.lists.rolling(2).sum() 的方法,但出现此错误:

TypeError: cannot handle this type -> object 
Run Code Online (Sandbox Code Playgroud)

在 Pandas 0.24.1 中,不幸的是在 Pandas 0.22.0 中,该命令不会出错,而是返回与lists. 所以看起来新版本的 Pandas 不能对列表求和?那是次要问题。

喜欢任何帮助!玩得开心!

J. *_*Doe 4

你可以从

import pandas as pd
mylists={1:[[1]],2:[[1,2,3]],3:[[2,9,7,9]],4:[[2,7,3,5]]}
mydf=pd.DataFrame.from_dict(mylists,orient='index')
mydf=mydf.rename(columns={0:'lists'})
mydf = pd.concat([mydf, mydf], axis=0, ignore_index=True)
mydf['group'] = ['A']*4 + ['B']*4

# initialize your new series
mydf['newseries'] = mydf['lists']

# define the function that appends lists overs rows
def append_row_lists(data):
    for i in data.index:
        try: data.loc[i+1, 'newseries'] = data.loc[i, 'lists'] + data.loc[i+1, 'lists']
        except: pass
    return data

# loop over your groups
for gp in mydf.group.unique():
    condition = mydf.group == gp
    mydf[condition] = append_row_lists(mydf[condition])
Run Code Online (Sandbox Code Playgroud)

输出

          lists Group                 newseries
0           [1]     A                       [1]
1     [1, 2, 3]     A              [1, 1, 2, 3]
2  [2, 9, 7, 9]     A     [1, 2, 3, 2, 9, 7, 9]
3  [2, 7, 3, 5]     A  [2, 9, 7, 9, 2, 7, 3, 5]
4           [1]     B                       [1]
5     [1, 2, 3]     B              [1, 1, 2, 3]
6  [2, 9, 7, 9]     B     [1, 2, 3, 2, 9, 7, 9]
7  [2, 7, 3, 5]     B  [2, 9, 7, 9, 2, 7, 3, 5]
Run Code Online (Sandbox Code Playgroud)