将Pandas Series作为行有效地添加到现有数据框中

Question

将Pandas Series作为行有效地添加到现有数据框中

guy*_*guy 6 numpy pandas

我有一个大约160k行乘24列的大型数据框架.我还有一个长度为26的熊猫系列,我想逐行添加到我的数据框中,以形成一个160k行×50列的最终数据帧,但我的代码非常缓慢.

具体来说这很慢,但它有效: final = df.apply(lambda x: x.append(my_series), axis=1)

这产生了正确的最终形状: Out[49]: (163008, 50)

在哪里,df.shape现在Out[48]: (163008, 24)和my_series.shape将来Out[47]: (26,)

这种方法对于<50k行范围内的较小数据帧表现良好,但显然它并不理想.

更新:为以下解决方案添加了基准

使用%timeit测试数据框和测试系列进行了一些测试,具有以下尺寸: test_df.shape

Out[18]: (156108, 24)

test_series.shape

Out[20]: (26,)

数据框和系列都包含字符串,浮点数,整数,对象等的混合.

使用Numpy接受的解决方案:

%timeit test_df.join(pd.DataFrame(np.tile(test_series.values, len(test_df.index)).reshape(-1, len(attributes)), index=test_df.index, columns=test_series.index))

10 loops, best of 3: 220 ms per loop

使用assign: 我继续接收ValueError: Length of values does not match length of index我的测试系列,但是当我使用更简单的系列时,如果它有效,不知道这里发生了什么......

使用@Divakar的自定义函数

%timeit rowwise_concat_df_series(test_df, test_series)

1 loop, best of 3: 424 ms per loop

Answer 1

jez*_*ael 3

我认为你需要numpy.tilenew numpy.ndarray.reshapeby dfvalueSeries和 last join：

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4],
                   'F':list('aaabbb')})

print (df)
   A  B  C  D  E  F
0  a  4  7  1  5  a
1  b  5  8  3  3  a
2  c  4  9  5  6  a
3  d  5  4  7  9  b
4  e  5  2  1  2  b
5  f  4  3  0  4  b

s = pd.Series([1,5,6,7], index=list('abcd'))
print (s)
a    1
b    5
c    6
d    7
dtype: int64

Run Code Online (Sandbox Code Playgroud)

df1 = pd.DataFrame(np.tile(s.values, len(df.index)).reshape(-1,len(s)), 
                   index=df.index, 
                   columns=s.index)
print (df1)
   a  b  c  d
0  1  5  6  7
1  1  5  6  7
2  1  5  6  7
3  1  5  6  7
4  1  5  6  7
5  1  5  6  7

df = df.join(df1)
print (df)
   A  B  C  D  E  F  a  b  c  d
0  a  4  7  1  5  a  1  5  6  7
1  b  5  8  3  3  a  1  5  6  7
2  c  4  9  5  6  a  1  5  6  7
3  d  5  4  7  9  b  1  5  6  7
4  e  5  2  1  2  b  1  5  6  7
5  f  4  3  0  4  b  1  5  6  7

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，4 月前
查看次数：	359 次
最近记录：	8 年，4 月前