将2D numpy数组转换为结构化数组

Question

将2D numpy数组转换为结构化数组

我正在尝试将二维数组转换为带有命名字段的结构化数组.我希望2D数组中的每一行都是结构化数组中的新记录.不幸的是,我所尝试的一切都没有按照我的预期进行.

我开始时:

>>> myarray = numpy.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
 ['World' '3.6' '2']]

Run Code Online (Sandbox Code Playgroud)

我想转换为看起来像这样的东西:

>>> newarray = numpy.array([("Hello",2.5,3),("World",3.6,2)], dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[('Hello', 2.5, 3L) ('World', 3.6000000000000001, 2L)]

Run Code Online (Sandbox Code Playgroud)

我尝试过的:

>>> newarray = myarray.astype([("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)]
 [('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]

>>> newarray = numpy.array(myarray, dtype=[("Col1","S8"),("Col2","f8"),("Col3","i8")])
>>> print newarray
[[('Hello', 0.0, 0L) ('2.5', 0.0, 0L) ('3', 0.0, 0L)]
 [('World', 0.0, 0L) ('3.6', 0.0, 0L) ('2', 0.0, 0L)]]

Run Code Online (Sandbox Code Playgroud)

这两种方法都试图将myarray中的每个条目转换为具有给定dtype的记录,因此插入了额外的零.我无法弄清楚如何将每行转换为记录.

另一种尝试:

>>> newarray = myarray.copy()
>>> newarray.dtype = [("Col1","S8"),("Col2","f8"),("Col3","i8")]
>>> print newarray
[[('Hello', 1.7219343871178711e-317, 51L)]
 [('World', 1.7543139673493688e-317, 50L)]]

Run Code Online (Sandbox Code Playgroud)

这次没有进行实际转换.内存中的现有数据只是被重新解释为新数据类型.

我正在从文本文件中读入我正在开始的数组.数据类型未提前知道,因此我无法在创建时设置dtype.我需要一个高性能和优雅的解决方案,适用于一般情况,因为我会为很多种应用程序进行多次这种类型的转换.

谢谢!

Answer 1

Cur*_*arn 29

您可以使用numpy.core.records.fromarrays "从(平面)数组列表创建记录数组" ,如下所示:

>>> import numpy as np
>>> myarray = np.array([("Hello",2.5,3),("World",3.6,2)])
>>> print myarray
[['Hello' '2.5' '3']
 ['World' '3.6' '2']]


>>> newrecarray = np.core.records.fromarrays(myarray.transpose(), 
                                             names='col1, col2, col3',
                                             formats = 'S8, f8, i8')

>>> print newrecarray
[('Hello', 2.5, 3) ('World', 3.5999999046325684, 2)]

Run Code Online (Sandbox Code Playgroud)

我试图做类似的事情.我发现当numpy从现有的2D数组(使用np.core.records.fromarrays)创建一个结构化数组时,它会将二维数组中的每一列(而不是每一行)视为记录.所以你必须转置它.numpy的这种行为似乎不太直观,但也许有充分的理由.

使用`fromrecords`你可以避免`transpose()` (7认同)
这将创建一个记录数组，而不是结构化的 ndarray。 (4认同)

Answer 2

Rug*_*rra 10

我猜

new_array = np.core.records.fromrecords([("Hello",2.5,3),("World",3.6,2)],
                                        names='Col1,Col2,Col3',
                                        formats='S8,f8,i8')

Run Code Online (Sandbox Code Playgroud)

是你想要的.

这给出了一个记录数组（`np.recarray`）而不是结构化数组（带有`dtype`的np.ndarray`）。 (2认同)

Answer 3

hpa*_*ulj 5

如果数据从元组列表开始，则直接创建结构化数组：

In [228]: alist = [("Hello",2.5,3),("World",3.6,2)]
In [229]: dt = [("Col1","S8"),("Col2","f8"),("Col3","i8")]
In [230]: np.array(alist, dtype=dt)
Out[230]: 
array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
      dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

Run Code Online (Sandbox Code Playgroud)

这里的复杂之处在于元组列表已变成2D字符串数组：

In [231]: arr = np.array(alist)
In [232]: arr
Out[232]: 
array([['Hello', '2.5', '3'],
       ['World', '3.6', '2']], 
      dtype='<U5')

Run Code Online (Sandbox Code Playgroud)

我们可以使用众所周知的zip*方法来“转置”此数组-实际上，我们需要双重转置：

In [234]: list(zip(*arr.T))
Out[234]: [('Hello', '2.5', '3'), ('World', '3.6', '2')]

Run Code Online (Sandbox Code Playgroud)

zip方便地给了我们一个元组列表。现在，我们可以使用所需的dtype重新创建数组：

In [235]: np.array(_, dtype=dt)
Out[235]: 
array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
      dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

Run Code Online (Sandbox Code Playgroud)

接受的答案使用fromarrays：

In [236]: np.rec.fromarrays(arr.T, dtype=dt)
Out[236]: 
rec.array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
          dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

Run Code Online (Sandbox Code Playgroud)

在内部，fromarrays采用一种通用recfunctions方法：创建目标数组，并按字段名称复制值。实际上，它可以：

In [237]: newarr = np.empty(arr.shape[0], dtype=dt)
In [238]: for n, v in zip(newarr.dtype.names, arr.T):
     ...:     newarr[n] = v
     ...:     
In [239]: newarr
Out[239]: 
array([(b'Hello',  2.5, 3), (b'World',  3.6, 2)], 
      dtype=[('Col1', 'S8'), ('Col2', '<f8'), ('Col3', '<i8')])

Run Code Online (Sandbox Code Playgroud)

归档时间：	15 年，5 月前
查看次数：	15200 次
最近记录：	8 年，8 月前