use*_*317 3 python dataframe pandas
想知道是否有更有效的方法将多列分成某一列.比如说我有:
prev open close volume
20.77 20.87 19.87 962816
19.87 19.89 19.56 668076
19.56 19.96 20.1 578987
20.1 20.4 20.53 418597
Run Code Online (Sandbox Code Playgroud)
我想得到:
prev open close volume
20.77 1.0048 0.9567 962816
19.87 1.0010 0.9844 668076
19.56 1.0204 1.0276 578987
20.1 1.0149 1.0214 418597
Run Code Online (Sandbox Code Playgroud)
基本上,列'打开'和'关闭'除以"prev"列的值.
我能够做到这一点
df['open'] = list(map(lambda x,y: x/y, df['open'],df['prev']))
df['close'] = list(map(lambda x,y: x/y, df['close'],df['prev']))
Run Code Online (Sandbox Code Playgroud)
我想知道是否有更简单的方法?特别是如果有10列要被相同的值划分呢?
df2[['open','close']] = df2[['open','close']].div(df2['prev'].values,axis=0)
Run Code Online (Sandbox Code Playgroud)
输出:
prev open close volume
0 20.77 1.004815 0.956668 962816
1 19.87 1.001007 0.984399 668076
2 19.56 1.020450 1.027607 578987
3 20.10 1.014925 1.021393 418597
Run Code Online (Sandbox Code Playgroud)
columns_to_divide = ['open', 'close']
df[columns_to_divide] = df[columns_to_divide] / df['prev']
Run Code Online (Sandbox Code Playgroud)
为了性能,我建议使用底层数组数据,并且array-slicing要修改的两列按顺序使用视图 -
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
Run Code Online (Sandbox Code Playgroud)
为了更多地阐述阵列切片部分,a[:,[1,2]]可能会在那里强制复制,并且会减慢它的速度.a[:,[1,2]]在数据框架方面相当于df[['open','close']]并且我猜测也会减慢速度.df.iloc[:,1:3]因此正在改进它.
样品运行 -
In [64]: df
Out[64]:
prev open close volume
0 20.77 20.87 19.87 962816
1 19.87 19.89 19.56 668076
2 19.56 19.96 20.10 578987
3 20.10 20.40 20.53 418597
In [65]: a = df.values
...: df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
...:
In [66]: df
Out[66]:
prev open close volume
0 20.77 1.004815 0.956668 962816
1 19.87 1.001007 0.984399 668076
2 19.56 1.020450 1.027607 578987
3 20.10 1.014925 1.021393 418597
Run Code Online (Sandbox Code Playgroud)
运行时测试
方法 -
def numpy_app(df): # Proposed in this post
a = df.values
df.iloc[:,1:3] = a[:,1:3]/a[:,0,None]
return df
def pandas_app1(df): # @Scott Boston's soln
df[['open','close']] = df[['open','close']].div(df['prev'].values,axis=0)
return df
Run Code Online (Sandbox Code Playgroud)
计时 -
In [44]: data = np.random.randint(15, 25, (100000,4)).astype(float)
...: df1 = pd.DataFrame(data, columns=(('prev','open','close','volume')))
...: df2 = df1.copy()
...:
In [45]: %timeit pandas_app1(df1)
...: %timeit numpy_app(df2)
...:
100 loops, best of 3: 2.68 ms per loop
1000 loops, best of 3: 885 µs per loop
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
7532 次 |
| 最近记录: |