小编mat*_*fux的帖子

以箭头格式编写 numpy 数组的最快方法

我正在寻找numpy使用来存储和检索数组的快速方法pyarrow。我对检索非常满意。.arrow从我的文件中提取包含 1.000.000.000 个整数的列只需不到 1 秒的时间dtype = np.uint16。

import pyarrow as pa\nimport numpy as np\n\ndef write(arr, name):\n    arrays = [pa.array(col) for col in arr]\n    names = [str(i) for i in range(len(arrays))]\n    batch = pa.RecordBatch.from_arrays(arrays, names=names)\n    with pa.OSFile(name, 'wb') as sink:\n        with pa.RecordBatchStreamWriter(sink, batch.schema) as writer:\n            writer.write_batch(batch)\n\ndef read(name):\n    source = pa.memory_map(name, 'r')\n    table = pa.ipc.RecordBatchStreamReader(source).read_all()\n    for i in range(table.num_columns):\n        yield table.column(str(i)).to_numpy()\n\narr = np.random.randint(65535, size=(250, 4000000), dtype=np.uint16)\n\n%%timeit -r 1 -n 1\nwrite(arr, 'test.arrow')\n>>> 25.6 …

Run Code Online (Sandbox Code Playgroud)

python numpy pyarrow

mat*_*fux

lucky-day

11
推荐指数

1
解决办法

3244
查看次数

将 numpy 数组的组名映射到索引的最快方法是什么？

我正在使用激光雷达的 3D 点云。这些点由 numpy 数组给出，如下所示：

points = np.array([[61651921, 416326074, 39805], [61605255, 416360555, 41124], [61664810, 416313743, 39900], [61664837, 416313749, 39910], [61674456, 416316663, 39503], [61651933, 416326074, 39802], [61679969, 416318049, 39500], [61674494, 416316677, 39508], [61651908, 416326079, 39800], [61651908, 416326087, 39802], [61664845, 416313738, 39913], [61674480, 416316668, 39503], [61679996, 416318047, 39510], [61605290, 416360572, 41118], [61605270, 416360565, 41122], [61683939, 416313004, 41052], [61683936, 416313033, 41060], [61679976, 416318044, 39509], [61605279, 416360555, 41109], [61664837, 416313739, 39915], [61674487, 416316666, 39505], [61679961, 416318035, 39503], [61683943, 416313004, 41054], [61683930, 416313042, 41059]])

Run Code Online (Sandbox Code Playgroud)

我想将我的数据分组到大小的立方体中， …

python hash grouping numpy lidar

mat*_*fux

2020 01-09

10
推荐指数

2
解决办法

943
查看次数

将多个数组与 numpy 数组相交的最佳方法是什么？

假设我有一个numpy数组的例子：

import numpy as np
X = np.array([2,5,0,4,3,1])

Run Code Online (Sandbox Code Playgroud)

我还有一个数组列表，例如：

A = [np.array([-2,0,2]), np.array([0,1,2,3,4,5]), np.array([2,5,4,6])]

Run Code Online (Sandbox Code Playgroud)

我只想保留每个列表中也在 X 中的这些项目。我也希望以最有效/最常见的方式做到这一点。

到目前为止我尝试过的解决方案：

X使用排序X.sort()。
使用以下方法查找每个数组的项目位置X：
```
locations = [np.searchsorted(X, n) for n in A]
```
Run Code Online (Sandbox Code Playgroud)

只留下合适的：

masks = [X[locations[i]] == A[i] for i in range(len(A))]
result = [A[i][masks[i]] for i in range(len(A))]

Run Code Online (Sandbox Code Playgroud)

但它不起作用，因为第三个数组的位置超出范围：

locations = [array([0, 0, 2], dtype=int64), array([0, 1, 2, 3, 4, 5], dtype=int64), array([2, 5, 4, 6], dtype=int64)]

Run Code Online (Sandbox Code Playgroud)

如何解决这个问题？

更新

我最终得到了idx[idx==len(Xs)] = 0解决方案。我还注意到答案之间发布了两种不同的方法：转换X为 …

python numpy

mat*_*fux

2020 01-11

7
推荐指数

1
解决办法

8340
查看次数

绘制 networkx.Graph：如何更改节点位置而不是重置每个节点？

我正在做一个项目，我需要创建一个预览nx.Graph()，允许更改节点的位置，用鼠标拖动它们。如果单击特定节点，我当前的代码能够在鼠标每次移动后立即重绘整个图形。但是，这会显着增加延迟。我怎样才能只更新需要的艺术家，它是，点击节点，它的标签文本和相邻的边缘，而不是刷新每个艺术家plt.subplots()？我至少可以得到所有需要搬迁的艺术家的参考吗？

我从显示图表的标准方式开始networkx：

import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
import scipy.spatial

def refresh(G):
    plt.axis((-4, 4, -1, 3))
    nx.draw_networkx_labels(G, pos = nx.get_node_attributes(G, 'pos'),
                                bbox = dict(fc="lightgreen", ec="black", boxstyle="square", lw=3))
    nx.draw_networkx_edges(G, pos = nx.get_node_attributes(G, 'pos'), width=1.0, alpha=0.5)
    plt.show()

nodes = np.array(['A', 'B', 'C', 'D', 'E', 'F', 'G'])
edges = np.array([['A', 'B'], ['A', 'C'], ['B', 'D'], ['B', 'E'], ['C', 'F'], ['C', 'G']])
pos = np.array([[0, 0], [-2, 1], [2, 1], [-3, 2], …

Run Code Online (Sandbox Code Playgroud)

python matplotlib event-handling mouseevent networkx

mat*_*fux

2020 09-11

6
推荐指数

1
解决办法

820
查看次数

从二维 numpy 数组中删除特定行值数组的快速方法

我有一个像这样的二维数组：

a = np.array([[25, 83, 18, 71],
       [75,  7,  0, 85],
       [25, 83, 18, 71],
       [25, 83, 18, 71],
       [75, 48,  8, 43],
       [ 7, 47, 96, 94],
       [ 7, 47, 96, 94],
       [56, 75, 50,  0],
       [19, 49, 92, 57],
       [52, 93, 58,  9]])

Run Code Online (Sandbox Code Playgroud)

我想删除具有特定值的行，例如：

b = np.array([[56, 75, 50,  0], [52, 93, 58,  9], [25, 83, 18, 71]])

Run Code Online (Sandbox Code Playgroud)

在numpy或中执行此操作的最有效方法是什么pandas？预期输出：

np.array([[75,  7,  0, 85],
       [75, 48,  8, 43],
       [ 7, 47, 96, 94],
       [ …

Run Code Online (Sandbox Code Playgroud)

python arrays numpy pandas

mat*_*fux

2020 10-17

6
推荐指数

2
解决办法

422
查看次数

如何对列表列表进行排序并仅保留每个第一个元素的最大第二个元素？

假设我有一些清单：

lst = [[2,6],[1,4],[0,1],[1,1],[2,3],[0,2]]

Run Code Online (Sandbox Code Playgroud)

我想按第一个元素对lst进行排序，并为每个子列表保留按第一个元素分组时具有最大第二个元素的那个。

所以结果将是：

results
>>> [[0,2],[1,4],[2,6]]

Run Code Online (Sandbox Code Playgroud)

有人可以帮助我吗？

python sorting numpy list

Guy*_*Guy

2020 09-07

5
推荐指数

1
解决办法

736
查看次数

标签统计

python ×6

numpy ×5

arrays ×1

event-handling ×1

grouping ×1

hash ×1

lidar ×1

list ×1

matplotlib ×1

mouseevent ×1

networkx ×1

pandas ×1

pyarrow ×1

sorting ×1

以箭头格式编写 numpy 数组的最快方法

将 numpy 数组的组名映射到索引的最快方法是什么？

将多个数组与 numpy 数组相交的最佳方法是什么？

更新

绘制 networkx.Graph：如何更改节点位置而不是重置每个节点？

从二维 numpy 数组中删除特定行值数组的快速方法

如何对列表列表进行排序并仅保留每个第一个元素的最大第二个元素？

标签 统计

小编mat_fux的帖子

标签统计