Python文件Slurp w/endian转换

Foo*_*ofy 5 python struct mmap numpy endianness

最近有人询问如何在python中执行文件slurp,并且接受的答案提示如下:

with open('x.txt') as x: f = x.read()
Run Code Online (Sandbox Code Playgroud)

我将如何执行此操作来读取文件并转换数据的字节序表示?

例如,我有一个1GB的二进制文件,它只是一堆单精度浮点数打包为大端,我想将它转换为小端并转储到一个numpy数组.下面是我为完成此操作而编写的函数以及一些调用它的实际代码.我使用struct.unpackendian转换并试图通过使用来加速一切mmap.

那么我的问题是,我在正确使用啜食与mmapstruct.unpack?有更清洁,更快的方法吗?现在我的作品,但我真的想学习如何更好地做到这一点.

提前致谢!

#!/usr/bin/python
from struct import unpack
import mmap
import numpy as np

def mmapChannel(arrayName,  fileName,  channelNo,  line_count,  sample_count):
    """
    We need to read in the asf internal file and convert it into a numpy array.
    It is stored as a single row, and is binary. Thenumber of lines (rows), samples (columns),
    and channels all come from the .meta text file
    Also, internal format files are packed big endian, but most systems use little endian, so we need
    to make that conversion as well.
    Memory mapping seemed to improve the ingestion speed a bit
    """
    # memory-map the file, size 0 means whole file
    # length = line_count * sample_count * arrayName.itemsize
    print "\tMemory Mapping..."
    with open(fileName, "rb") as f:
        map = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)
        map.seek(channelNo*line_count*sample_count*arrayName.itemsize)

        for i in xrange(line_count*sample_count):
            arrayName[0, i] = unpack('>f', map.read(arrayName.itemsize) )[0]

        # Same method as above, just more verbose for the maintenance programmer.
        #        for i in xrange(line_count*sample_count): #row
        #            be_float = map.read(arrayName.itemsize) # arrayName.itemsize should be 4 for float32
        #            le_float = unpack('>f', be_float)[0] # > for big endian, < for little endian
        #            arrayName[0, i]= le_float

        map.close()
    return arrayName

print "Initializing the Amp HH HV, and Phase HH HV arrays..."
HHamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HHphase = np.ones((1,  line_count*sample_count),  dtype='float32')
HVamp = np.ones((1,  line_count*sample_count),  dtype='float32')
HVphase = np.ones((1,  line_count*sample_count),  dtype='float32')



print "Ingesting HH_Amp..."
HHamp = mmapChannel(HHamp, 'ALPSRP042301700-P1.1__A.img',  0,  line_count,  sample_count)
print "Ingesting HH_phase..."
HHphase = mmapChannel(HHphase, 'ALPSRP042301700-P1.1__A.img',  1,  line_count,  sample_count)
print "Ingesting HV_AMP..."
HVamp = mmapChannel(HVamp, 'ALPSRP042301700-P1.1__A.img',  2,  line_count,  sample_count)
print "Ingesting HV_phase..."
HVphase = mmapChannel(HVphase, 'ALPSRP042301700-P1.1__A.img',  3,  line_count,  sample_count)

print "Reshaping...."
HHamp_orig = HHamp.reshape(line_count, -1)
HHphase_orig = HHphase.reshape(line_count, -1)
HVamp_orig = HVamp.reshape(line_count, -1)
HVphase_orig = HVphase.reshape(line_count, -1)
Run Code Online (Sandbox Code Playgroud)

jfs*_*jfs 7

略有修改@Alex Martelli的回答:

arr = numpy.fromfile(filename, numpy.dtype('>f4'))
# no byteswap is needed regardless of endianess of the machine
Run Code Online (Sandbox Code Playgroud)


Ale*_*lli 6

with open(fileName, "rb") as f:
  arrayName = numpy.fromfile(f, numpy.float32)
arrayName.byteswap(True)
Run Code Online (Sandbox Code Playgroud)

速度和简洁性很难被击败;-).对于byteswap,请参见此处(True参数表示"在适当的位置"); 对于fromfile,请看这里.

这在little-endian机器上工作(因为数据是big-endian,需要byteswap).您可以测试是否有条件地执行byteswap,将最后一行从无条件调用更改为byteswap,例如:

if struct.pack('=f', 2.3) == struct.pack('<f', 2.3):
  arrayName.byteswap(True)
Run Code Online (Sandbox Code Playgroud)

即,对little-endianness的测试条件调用byteswap.

  • numpy.float32具有本机字节顺序,可能不总是big-endian.http://stackoverflow.com/questions/1632673/python-file-slurp-w-endian-conversion/1633525#1633525 (2认同)