从Python数组中删除完全隔离的单元格?

Rob*_*lor 5 python numpy python-2.6 scipy ndimage

我试图通过删除所有完全隔离的单个单元来减少二进制python数组中的噪声,即如果它们完全被其他"0"包围,则将"1"值单元设置为0.通过使用循环删除大小等于1的blob,我已经能够获得一个有效的解决方案,但对于大型数组来说,这似乎是一个非常低效的解决方案:

import numpy as np
import scipy.ndimage as ndimage
import matplotlib.pyplot as plt    

# Generate sample data
square = np.zeros((32, 32))
square[10:-10, 10:-10] = 1
np.random.seed(12)
x, y = (32*np.random.random((2, 20))).astype(np.int)
square[x, y] = 1

# Plot original data with many isolated single cells
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')

# Assign unique labels
id_regions, number_of_ids = ndimage.label(square, structure=np.ones((3,3)))

# Set blobs of size 1 to 0
for i in xrange(number_of_ids + 1):
    if id_regions[id_regions==i].size == 1:
        square[id_regions==i] = 0

# Plot desired output, with all isolated single cells removed
plt.imshow(square, cmap=plt.cm.gray, interpolation='nearest')
Run Code Online (Sandbox Code Playgroud)

在这种情况下,侵蚀和扩展我的数组将无法工作,因为它也将删除宽度为1的功能.我觉得解决方案位于scipy.ndimage包中的某个地方,但到目前为止我还没能破解它.任何帮助将不胜感激!

Rob*_*lor 6

迟来的感谢 Jaime 和 Kazemakase 的回复。手动邻居检查方法确实删除了所有孤立的补丁,但也删除了一个角(即样本阵列中正方形的右上角)与其他补丁相连的补丁。总面积表完美地工作并且在小样本阵列上非常快,但在较大阵列上变慢。

我最终采用了一种使用 ndimage 的方法,该方法似乎对非常大且稀疏的数组有效(5000 x 5000 数组为 0.91 秒,而总面积表方法为 1.17 秒)。我首先为每个离散区域生成一个标记的唯一 ID 数组,计算每个 ID 的大小,屏蔽大小数组以仅关注大小 == 1 斑点,然后索引原始数组并将 ID 设置为大小 == 1 到 0 :

def filter_isolated_cells(array, struct):
    """ Return array with completely isolated single cells removed
    :param array: Array with completely isolated single cells
    :param struct: Structure array for generating unique regions
    :return: Array with minimum region size > 1
    """

    filtered_array = np.copy(array)
    id_regions, num_ids = ndimage.label(filtered_array, structure=struct)
    id_sizes = np.array(ndimage.sum(array, id_regions, range(num_ids + 1)))
    area_mask = (id_sizes == 1)
    filtered_array[area_mask[id_regions]] = 0
    return filtered_array

# Run function on sample array
filtered_array = filter_isolated_cells(square, struct=np.ones((3,3)))

# Plot output, with all isolated single cells removed
plt.imshow(filtered_array, cmap=plt.cm.gray, interpolation='nearest')
Run Code Online (Sandbox Code Playgroud)

结果: 结果数组