Fra*_*Fra 12 python buffer pandas
我使用Python deque()来实现一个简单的循环缓冲区:
from collections import deque
import numpy as np
test_sequence = np.array(range(100)*2).reshape(100,2)
mybuffer = deque(np.zeros(20).reshape((10, 2)))
for i in test_sequence:
mybuffer.popleft()
mybuffer.append(i)
do_something_on(mybuffer)
Run Code Online (Sandbox Code Playgroud)
我想知道是否有一种简单的方法可以使用Series(或DataFrame)在Pandas中获得相同的东西.换句话说,如何在末尾有效地添加单行并在a Series或DataFrame?的开头删除单行?
编辑:我试过这个:
myPandasBuffer = pd.DataFrame(columns=('A','B'), data=np.zeros(20).reshape((10, 2)))
newpoint = pd.DataFrame(columns=('A','B'), data=np.array([[1,1]]))
for i in test_sequence:
newpoint[['A','B']] = i
myPandasBuffer = pd.concat([myPandasBuffer.ix[1:],newpoint], ignore_index = True)
do_something_on(myPandasBuffer)
Run Code Online (Sandbox Code Playgroud)
但它比deque()方法慢得多.
正如 dorvak 所指出的,pandas 不是为类似队列的行为而设计的。
下面我使用 h5py 模块从熊猫数据帧、numpy 数组和 hdf5 中的 deque 复制了简单的插入函数。
timeit 函数显示(不出所料)collections 模块要快得多,其次是 numpy,然后是 pandas。
from collections import deque
import pandas as pd
import numpy as np
import h5py
def insert_deque(test_sequence, buffer_deque):
for item in test_sequence:
buffer_deque.popleft()
buffer_deque.append(item)
return buffer_deque
def insert_df(test_sequence, buffer_df):
for item in test_sequence:
buffer_df.iloc[0:-1,:] = buffer_df.iloc[1:,:].values
buffer_df.iloc[-1] = item
return buffer_df
def insert_arraylike(test_sequence, buffer_arr):
for item in test_sequence:
buffer_arr[:-1] = buffer_arr[1:]
buffer_arr[-1] = item
return buffer_arr
test_sequence = np.array(list(range(100))*2).reshape(100,2)
# create buffer arrays
nested_list = [[0]*2]*5
buffer_deque = deque(nested_list)
buffer_df = pd.DataFrame(nested_list, columns=('A','B'))
buffer_arr = np.array(nested_list)
# calculate speed of each process in ipython
print("deque : ")
%timeit insert_deque(test_sequence, buffer_deque)
print("pandas : ")
%timeit insert_df(test_sequence, buffer_df)
print("numpy array : ")
%timeit insert_arraylike(test_sequence, buffer_arr)
print("hdf5 with h5py : ")
with h5py.File("h5py_test.h5", "w") as f:
f["buffer_hdf5"] = np.array(nested_list)
%timeit insert_arraylike(test_sequence, f["buffer_hdf5"])
Run Code Online (Sandbox Code Playgroud)
%timeit 结果:
双端队列:每个循环 34.1 µs
熊猫:每个循环 48 毫秒
numpy 数组:每个循环 187 µs
hdf5 和 h5py:每个循环 31.7 毫秒
笔记:
我的熊猫切片方法只比问题中列出的 concat 方法快一点。
hdf5 格式(通过 h5py)没有显示出任何优势。正如安迪所建议的那样,我也没有看到 HDFStore 的任何优势。
| 归档时间: |
|
| 查看次数: |
2723 次 |
| 最近记录: |