Ker*_*ten 5 python interpolation numpy scipy pandas
我有一个 3D 数组(z, y, x)
,shape=(92, 4800, 4800)
其中每个值都axis 0
代表不同的时间点。在少数情况下,时域中的值获取失败,导致某些值变为np.NaN
。在其他情况下,没有获取任何值,所有值z
都是np.NaN
。
什么是使用线性插值来填充最有效的方式np.NaN
沿着axis 0
无视所有的值是实例np.NaN
?
这是我正在做的工作的一个工作示例,它使用pandas
包装器到scipy.interpolate.interp1d
. 在原始数据集上,每个切片大约需要 2 秒,这意味着整个数组的处理时间为 2.6 小时。减小大小的示例数据集大约需要 9.5 秒。
import numpy as np
import pandas as pd
# create example data, original is (92, 4800, 4800)
test_arr = np.random.randint(low=-10000, high=10000, size=(92, 480, 480))
test_arr[1:90:7, :, :] = -32768 # NaN fill value in original data
test_arr[:, 1:90:6, 1:90:8] = -32768
def interpolate_nan(arr, method="linear", limit=3):
"""return array interpolated along time-axis to fill missing values"""
result = np.zeros_like(arr, dtype=np.int16)
for i in range(arr.shape[1]):
# slice along y axis, interpolate with pandas wrapper to interp1d
line_stack = pd.DataFrame(data=arr[:,i,:], dtype=np.float32)
line_stack.replace(to_replace=-37268, value=np.NaN, inplace=True)
line_stack.interpolate(method=method, axis=0, inplace=True, limit=limit)
line_stack.replace(to_replace=np.NaN, value=-37268, inplace=True)
result[:, i, :] = line_stack.values.astype(np.int16)
return result
Run Code Online (Sandbox Code Playgroud)
使用示例数据集在我的机器上的性能:
%timeit interpolate_nan(test_arr)
1 loops, best of 3: 9.51 s per loop
Run Code Online (Sandbox Code Playgroud)
编辑:
我应该澄清代码正在产生我的预期结果。问题是 - 我怎样才能优化这个过程?
我最近在numba的帮助下为我的特定用例解决了这个问题,并且还写了一些关于它的文章。
from numba import jit
@jit(nopython=True)
def interpolate_numba(arr, no_data=-32768):
"""return array interpolated along time-axis to fill missing values"""
result = np.zeros_like(arr, dtype=np.int16)
for x in range(arr.shape[2]):
# slice along x axis
for y in range(arr.shape[1]):
# slice along y axis
for z in range(arr.shape[0]):
value = arr[z,y,x]
if z == 0: # don't interpolate first value
new_value = value
elif z == len(arr[:,0,0])-1: # don't interpolate last value
new_value = value
elif value == no_data: # interpolate
left = arr[z-1,y,x]
right = arr[z+1,y,x]
# look for valid neighbours
if left != no_data and right != no_data: # left and right are valid
new_value = (left + right) / 2
elif left == no_data and z == 1: # boundary condition left
new_value = value
elif right == no_data and z == len(arr[:,0,0])-2: # boundary condition right
new_value = value
elif left == no_data and right != no_data: # take second neighbour to the left
more_left = arr[z-2,y,x]
if more_left == no_data:
new_value = value
else:
new_value = (more_left + right) / 2
elif left != no_data and right == no_data: # take second neighbour to the right
more_right = arr[z+2,y,x]
if more_right == no_data:
new_value = value
else:
new_value = (more_right + left) / 2
elif left == no_data and right == no_data: # take second neighbour on both sides
more_left = arr[z-2,y,x]
more_right = arr[z+2,y,x]
if more_left != no_data and more_right != no_data:
new_value = (more_left + more_right) / 2
else:
new_value = value
else:
new_value = value
else:
new_value = value
result[z,y,x] = int(new_value)
return result
Run Code Online (Sandbox Code Playgroud)
这比我的初始代码快大约20 倍。
归档时间: |
|
查看次数: |
2007 次 |
最近记录: |