将重复参数传递给 Numpy 向量化函数的最佳方法

Question

将重复参数传递给 Numpy 向量化函数的最佳方法

Kar*_*tik 5 python optimization numpy geopy

因此，继续讨论@TheBlackCat 和我在这个答案中的讨论，我想知道将参数传递给 Numpy 矢量化函数的最佳方法。所讨论的函数定义如下：

vect_dist_funct = np.vectorize(lambda p1, p2: vincenty(p1, p2).meters)

Run Code Online (Sandbox Code Playgroud)

其中，vincenty来自Geopy 包。

我目前vect_dist_funct以这种方式调用：

def pointer(point, centroid, tree_idx):
    intersect = list(tree_idx.intersection(point))
    if len(intersect) > 0:
        points = pd.Series([point]*len(intersect)).values
        polygons = centroid.loc[intersect].values
        dist = vect_dist_funct(points, polygons)
        return pd.Series(dist, index=intercept, name='Dist').sort_values()
    else:
        return pd.Series(np.nan, index=[0], name='Dist')

points['geometry'].apply(lambda x: pointer(point=x.coords[0], centroid=line['centroid'], tree_idx=tree_idx))

Run Code Online (Sandbox Code Playgroud)

（请参考这里的问题：Labeled datatypes Python）

我的问题与函数内部发生的情况有关pointer。points我转换为 apandas.Series然后获取值（在第四行，就在语句下方）的原因if是使其形状与多边形相同。如果我只是将点称为 aspoints = [point]*len(intersect)或 as points = itertools.repeat(point, len(intersect))，Numpy 会抱怨它“无法将大小为 (n,2) 和大小 (n,) 的数组一起广播”（n 是的长度intersect）。

如果我vect_dist_funct像这样调用：dist = vect_dist_funct(itertools.repeat(points, len(intersect)), polygons)，vincenty抱怨我已经传递了太多参数。我完全不明白两者之间的区别。

请注意，这些是坐标，因此总是成对的。以下是point和polygons的示例：

point = (-104.950752   39.854744) # Passed directly to the function like this.
polygons = array([(-104.21750802451864, 37.84052458697633),
                  (-105.01017084789603, 39.82012158954065),
                  (-105.03965315742742, 40.669867471420886),
                  (-104.90353460825702, 39.837631505433706),
                  (-104.8650601872832, 39.870796282334744)], dtype=object)
           # As returned by statement centroid.loc[intersect].values

Run Code Online (Sandbox Code Playgroud)

在这种情况下最好的调用方式是什么vect_dist_funct，这样我就可以进行矢量化调用，并且 Numpy 和 vincenty 都不会抱怨我传递了错误的参数？此外，还寻求能够最小化存储器消耗并提高速度的技术。目标是计算点到每个多边形质心之间的距离。

Answer 1

The*_*Cat 4

np.vectorize在这里并不能真正帮助你。根据文档：

提供矢量化函数主要是为了方便，而不是为了性能。该实现本质上是一个 for 循环。

事实上，vectorize它会主动伤害你，因为它将输入转换为 numpy 数组，进行不必要且昂贵的类型转换并产生你所看到的错误。使用带有循环的函数会更好for。

lambda对于 to-level 函数，最好使用函数而不是 a ，因为它可以让您拥有文档字符串。

这就是我将如何实现你正在做的事情：

def vect_dist_funct(p1, p2):
    """Apply `vincenty` to `p1` and each element of `p2`.

    Iterate over `p2`, returning `vincenty` with the first argument
    as `p1` and the second as the current element of `p2`.  Returns
    a numpy array where each row is the result of the `vincenty` function
    call for the corresponding element of `p2`.
    """
    return [vincenty(p1, p2i).meters for p2i in p2]

Run Code Online (Sandbox Code Playgroud)

如果您确实想使用vectorize，则可以使用excluded参数来不对p1参数进行向量化，或者更好地设置一个lambda包装vincenty并仅对第二个参数进行向量化：

def vect_dist_funct(p1, p2):
    """Apply `vincenty` to `p1` and each element of `p2`.

    Iterate over `p2`, returning `vincenty` with the first argument
    as `p1` and the second as the current element of `p2`.  Returns
    a list where each value is the result of the `vincenty` function
    call for the corresponding element of `p2`.
    """
    vinc_p = lambda x: vincenty(p1, x)
    return np.vectorize(vinc_p)(p2)

Run Code Online (Sandbox Code Playgroud)

归档时间：	9 年，4 月前
查看次数：	7734 次
最近记录：	9 年，4 月前