kid*_*ixo 7 python arrays numpy image
我正在尝试调整给定因子的2D numpy数组,在输出中获得更小的数组.
从图像文件中读取数组,并且一些值应该是NaN(非数字,来自numpy的np.nan):它是来自卫星的遥感测量的结果,并且仅测量了一些像素.
我找到的合适的包是scypy.misc.imresize,但是包含NaN的输出数组中的每个像素都设置为NaN,即使在原始像素中有一些有效数据被插入在一起.
我的解决方案附在此处,我所做的基本上是:
我打算在不同的输出(平均值,中位数,输入像素的标准偏差等)之间添加关键字.
它按预期工作,但在~110px图像上大约需要3秒钟.由于我缺乏python的经验,我正在寻找改进.
有没有人建议如何更好,更有效地做到这一点?
有谁知道已经实现了所有这些东西的图书馆?
谢谢.
这里有一个输出随机像素输入的示例输出,使用下面的代码生成:

import numpy as np
import pylab as plt
from scipy import misc
def resize_2d_nonan(array,factor):
"""
Resize a 2D array by different factor on two axis sipping NaN values.
If a new pixel contains only NaN, it will be set to NaN
Parameters
----------
array : 2D np array
factor : int or tuple. If int x and y factor wil be the same
Returns
-------
array : 2D np array scaled by factor
Created on Mon Jan 27 15:21:25 2014
@author: damo_ma
"""
xsize, ysize = array.shape
if isinstance(factor,int):
factor_x = factor
factor_y = factor
elif isinstance(factor,tuple):
factor_x , factor_y = factor[0], factor[1]
else:
raise NameError('Factor must be a tuple (x,y) or an integer')
if not (xsize %factor_x == 0 or ysize % factor_y == 0) :
raise NameError('Factors must be intger multiple of array shape')
new_xsize, new_ysize = xsize/factor_x, ysize/factor_y
new_array = np.empty([new_xsize, new_ysize])
new_array[:] = np.nan # this saves us an assignment in the loop below
# submatrix indexes : is the average box on the original matrix
subrow, subcol = np.indices((factor_x, factor_y))
# new matrix indexs
row, col = np.indices((new_xsize, new_ysize))
# some output for testing
#for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
# print '----------------------------------------------'
# print 'i: %i, j: %i, ind: %i ' % (i, j, ind)
# print 'subrow+i*new_ysize, subcol+j*new_xsize :'
# print i,'*',new_xsize,'=',i*factor_x
# print j,'*',new_ysize,'=',j*factor_y
# print subrow+i*factor_x,subcol+j*factor_y
# print '---'
# print 'array[subrow+i*factor_x,subcol+j*factor_y] : '
# print array[subrow+i*factor_x,subcol+j*factor_y]
for i, j, ind in zip(row.reshape(-1), col.reshape(-1),range(row.size)) :
# define the small sub_matrix as view of input matrix subset
sub_matrix = array[subrow+i*factor_x,subcol+j*factor_y]
# modified from any(a) and all(a) to a.any() and a.all()
# see https://stackoverflow.com/a/10063039/1435167
if not (np.isnan(sub_matrix)).all(): # if we haven't all NaN
if (np.isnan(sub_matrix)).any(): # if we haven no NaN at all
msub_matrix = np.ma.masked_array(sub_matrix,np.isnan(sub_matrix))
(new_array.reshape(-1))[ind] = np.mean(msub_matrix)
else: # if we haven some NaN
(new_array.reshape(-1))[ind] = np.mean(sub_matrix)
# the case assign NaN if we have all NaN is missing due
# to the standard values of new_array
return new_array
row , cols = 6, 4
a = 10*np.random.random_sample((row , cols))
a[0:3,0:2] = np.nan
a[0,2] = np.nan
factor_x = 2
factor_y = 2
a_misc = misc.imresize(a, .5, interp='nearest', mode='F')
a_2d_nonan = resize_2d_nonan(a,(factor_x,factor_y))
print a
print
print a_misc
print
print a_2d_nonan
plt.subplot(131)
plt.imshow(a,interpolation='nearest')
plt.title('original')
plt.xticks(arange(a.shape[1]))
plt.yticks(arange(a.shape[0]))
plt.subplot(132)
plt.imshow(a_misc,interpolation='nearest')
plt.title('scipy.misc')
plt.xticks(arange(a_misc.shape[1]))
plt.yticks(arange(a_misc.shape[0]))
plt.subplot(133)
plt.imshow(a_2d_nonan,interpolation='nearest')
plt.title('my.func')
plt.xticks(arange(a_2d_nonan.shape[1]))
plt.yticks(arange(a_2d_nonan.shape[0]))
Run Code Online (Sandbox Code Playgroud)
编辑
我添加了一些修改来解决ChrisProsser评论.
如果我用一些其他值替换NaN,假设非NaN像素的平均值,它将影响所有后续计算:重新采样的原始数组和使用NaN替换的重采样数组之间的差异显示2个像素更改了它们的值.
我的目标是简单地跳过所有NaN像素.
# substitute NaN with the average value
ind_nonan , ind_nan = np.where(np.isnan(a) == False), np.where(np.isnan(a) == True)
a_substitute = np.copy(a)
a_substitute[ind_nan] = np.mean(a_substitute[ind_nonan]) # substitute the NaN with average on the not-Nan
a_substitute_misc = misc.imresize(a_substitute, .5, interp='nearest', mode='F')
a_substitute_2d_nonan = resize_2d_nonan(a_substitute,(factor_x,factor_y))
print a_2d_nonan-a_substitute_2d_nonan
[[ nan -0.02296697]
[ 0.23143208 0. ]
[ 0. 0. ]]
Run Code Online (Sandbox Code Playgroud)

**第二次编辑**
为了解决Hooked的回答,我添加了一些额外的代码.这是一个迭代的想法,遗憾的是它在像素上插入新的值应该是"空"(NaN),并且对于我的小例子,产生的NaN比良好的值更多.
X , Y = np.indices((row , cols))
X_new , Y_new = np.indices((row/factor_x , cols/factor_y))
from scipy.interpolate import CloughTocher2DInterpolator as intp
C = intp((X[ind_nonan],Y[ind_nonan]),a[ind_nonan])
a_interp = C(X_new , Y_new)
print a
print
print a_interp
[[ nan, nan],
[ nan, nan],
[ nan, 6.32826577]])
Run Code Online (Sandbox Code Playgroud)

您正在阵列的小窗口上进行操作。可以通过操纵数组的步幅来有效地重构数组,而不是循环遍历数组来创建窗口。numpy 库提供了as_strided()帮助实现这一点的函数。SciPy CookBook生命游戏的跨步技巧中提供了一个示例。
下面将使用广义的滑动窗口函数,我将把它放在最后。
确定新数组的形状:
rows, cols = a.shape
new_shape = rows / 2, cols / 2
Run Code Online (Sandbox Code Playgroud)
将数组重构为您需要的窗口,并创建一个标识 NaN 的索引数组:
# 2x2 windows of the original array
windows = sliding_window(a, (2,2))
# make a windowed boolean array for indexing
notNan = sliding_window(np.logical_not(np.isnan(a)), (2,2))
Run Code Online (Sandbox Code Playgroud)
可以使用列表理解或生成器表达式来创建新数组。
# using a list comprehension
# make a list of the means of the windows, disregarding the Nan's
means = [window[index].mean() for window, index in zip(windows, notNan)]
new_array = np.array(means).reshape(new_shape)
# generator expression
# produces the means of the windows, disregarding the Nan's
means = (window[index].mean() for window, index in zip(windows, notNan))
new_array = np.fromiter(means, dtype = np.float32).reshape(new_shape)
Run Code Online (Sandbox Code Playgroud)
生成器表达式应该节省内存。itertools.izip()如果内存有问题,使用而不是“zip”也应该有所帮助。我刚刚使用列表理解来解决您的问题。
你的职能:
def resize_2d_nonan(array,factor):
"""
Resize a 2D array by different factor on two axis skipping NaN values.
If a new pixel contains only NaN, it will be set to NaN
Parameters
----------
array : 2D np array
factor : int or tuple. If int x and y factor wil be the same
Returns
-------
array : 2D np array scaled by factor
Created on Mon Jan 27 15:21:25 2014
@author: damo_ma
"""
xsize, ysize = array.shape
if isinstance(factor,int):
factor_x = factor
factor_y = factor
window_size = factor, factor
elif isinstance(factor,tuple):
factor_x , factor_y = factor
window_size = factor
else:
raise NameError('Factor must be a tuple (x,y) or an integer')
if (xsize % factor_x or ysize % factor_y) :
raise NameError('Factors must be integer multiple of array shape')
new_shape = xsize / factor_x, ysize / factor_y
# non-overlapping windows of the original array
windows = sliding_window(a, window_size)
# windowed boolean array for indexing
notNan = sliding_window(np.logical_not(np.isnan(a)), window_size)
#list of the means of the windows, disregarding the Nan's
means = [window[index].mean() for window, index in zip(windows, notNan)]
# new array
new_array = np.array(means).reshape(new_shape)
return new_array
Run Code Online (Sandbox Code Playgroud)
我没有与你的原始函数进行任何时间比较,但它应该更快。
我在这里看到的许多解决方案都对操作进行矢量化以提高速度/效率 - 我不太了解这一点,也不知道它是否可以应用于您的问题。在 SO 中搜索窗口、数组、移动平均、向量化和 numpy 应该会产生类似的问题和答案以供参考。
sliding_window()请参阅下面的归因:
import numpy as np
from numpy.lib.stride_tricks import as_strided as ast
from itertools import product
def norm_shape(shape):
'''
Normalize numpy array shapes so they're always expressed as a tuple,
even for one-dimensional shapes.
Parameters
shape - an int, or a tuple of ints
Returns
a shape tuple
'''
try:
i = int(shape)
return (i,)
except TypeError:
# shape was not a number
pass
try:
t = tuple(shape)
return t
except TypeError:
# shape was not iterable
pass
raise TypeError('shape must be an int, or a tuple of ints')
def sliding_window(a,ws,ss = None,flatten = True):
'''
Return a sliding window over a in any number of dimensions
Parameters:
a - an n-dimensional numpy array
ws - an int (a is 1D) or tuple (a is 2D or greater) representing the size
of each dimension of the window
ss - an int (a is 1D) or tuple (a is 2D or greater) representing the
amount to slide the window in each dimension. If not specified, it
defaults to ws.
flatten - if True, all slices are flattened, otherwise, there is an
extra dimension for each dimension of the input.
Returns
an array containing each n-dimensional window from a
'''
if None is ss:
# ss was not provided. the windows will not overlap in any direction.
ss = ws
ws = norm_shape(ws)
ss = norm_shape(ss)
# convert ws, ss, and a.shape to numpy arrays so that we can do math in every
# dimension at once.
ws = np.array(ws)
ss = np.array(ss)
shape = np.array(a.shape)
# ensure that ws, ss, and a.shape all have the same number of dimensions
ls = [len(shape),len(ws),len(ss)]
if 1 != len(set(ls)):
raise ValueError(\
'a.shape, ws and ss must all have the same length. They were %s' % str(ls))
# ensure that ws is smaller than a in every dimension
if np.any(ws > shape):
raise ValueError(\
'ws cannot be larger than a in any dimension.\
a.shape was %s and ws was %s' % (str(a.shape),str(ws)))
# how many slices will there be in each dimension?
newshape = norm_shape(((shape - ws) // ss) + 1)
# the shape of the strided array will be the number of slices in each dimension
# plus the shape of the window (tuple addition)
newshape += norm_shape(ws)
# the strides tuple will be the array's strides multiplied by step size, plus
# the array's strides (tuple addition)
newstrides = norm_shape(np.array(a.strides) * ss) + a.strides
strided = ast(a,shape = newshape,strides = newstrides)
if not flatten:
return strided
# Collapse strided so that it has one more dimension than the window. I.e.,
# the new array is a flat list of slices.
meat = len(ws) if ws.shape else 0
firstdim = (np.product(newshape[:-meat]),) if ws.shape else ()
dim = firstdim + (newshape[-meat:])
# remove any dimensions with size 1
dim = filter(lambda i : i != 1,dim)
return strided.reshape(dim)
Run Code Online (Sandbox Code Playgroud)
slip_window() 归属
我最初在博客页面上发现了这个,现在是一个损坏的链接:
使用 Numpy 高效重叠窗口 - http://www.johnvinyard.com/blog/?p=268
经过一番搜索,它看起来现在位于Zounds github 存储库中。谢谢约翰·维尼亚德。
请注意,这篇文章已经很老了,并且有很多关于滑动窗口、滚动窗口和图像补丁提取的 Q&A。使用 numpy 的 as_strided有很多一次性的功能,但该函数似乎仍然是唯一处理 nd 窗口的函数。scikits sklearn.feature_extraction.image 库似乎经常被引用用于提取或查看图像补丁。