使用scipy.ndimage记忆高效的高斯模糊

Ric*_*ich 3 gis memory-management gaussian scipy

我正在尝试高斯平滑的大型GIS数据集(10000 x 10000阵列).我目前的方法是将整个数组加载到内存中,使其平滑,然后将其写回.它看起来像这样:

big_array = band_on_disk.ReadAsArray()
scipy.ndimage.gaussian_filter(big_array, sigma, output=smoothed_array)
output_band.WriteArray(smoothed_array)
Run Code Online (Sandbox Code Playgroud)

对于大型栅格我得到了一个MemoryError所以我想加载该数组的子块,但我不知道如何处理影响相邻子块的区域的高斯平滑.

有关如何修复上述算法的任何建议,以便在较小的内存占用情况下工作,同时仍能正确平滑整个阵列?

Joh*_*ard 6

尝试使用内存映射文件.

适度的内存使用和快速

如果你能够在内存中安装一个阵列,这个速度非常快:

import numpy as np
from scipy.ndimage import gaussian_filter

# create some fake data, save it to disk, and free up its memory
shape = (10000,10000)
orig = np.random.random_sample(shape)
orig.tofile('orig.dat')
print 'saved original'
del orig

# allocate memory for the smoothed data
smoothed = np.zeros((10000,10000))

# memory-map the original data, so it isn't read into memory all at once
orig = np.memmap('orig.dat', np.float64, 'r', shape=shape)
print 'memmapped'

sigma = 10 # I have no idea what a reasonable value is here
gaussian_filter(orig, sigma, output = smoothed)
# save the smoothed data to disk
smoothed.tofile('smoothed.dat')
Run Code Online (Sandbox Code Playgroud)

内存使用率低,速度慢

如果你不能同时在内存中同时拥有任何一个数组,你可以对原始数组和平滑数组进行内存映射.这个内存使用率非常低,但至少在我的机器上速度太快了.

您将不得不忽略此代码的第一部分,因为它会同时欺骗并创建原始数组,然后将其保存到磁盘.您可以将其替换为代码,以加载您在磁盘上以增量方式构建的数据.

import numpy as np
from scipy.ndimage import gaussian_filter

# create some fake data, save it to disk, and free up its memory
shape = (10000,10000)
orig = np.random.random_sample(shape)
orig.tofile('orig.dat')
print 'saved original'
del orig

# memory-map the original data, so it isn't read into memory all at once
orig = np.memmap('orig.dat', np.float64, 'r', shape=shape)
# create a memory mapped array for the smoothed data
smoothed = np.memmap('smoothed.dat', np.float64, 'w+', shape = shape)
print 'memmapped'

sigma = 10 # I have no idea what a reasonable value is here
gaussian_filter(orig, sigma, output = smoothed)
Run Code Online (Sandbox Code Playgroud)