如何提高这个微距离Python函数的性能

Question

如何提高这个微距离Python函数的性能

hou*_*oft 5 python optimization numpy scikit-learn

当使用自定义距离度量函数进行聚类算法时,我遇到了性能瓶颈sklearn.

Run Snake Run显示的结果如下:

在此输入图像描述

显然问题在于dbscan_metric功能.该功能看起来非常简单,我不太清楚加速它的最佳方法是:

def dbscan_metric(a,b):
  if a.shape[0] != NUM_FEATURES:
    return np.linalg.norm(a-b)
  else:
    return np.linalg.norm(np.multiply(FTR_WEIGHTS, (a-b)))

Run Code Online (Sandbox Code Playgroud)

任何关于导致它变慢的想法都会受到高度赞赏.

Answer 1

zal*_*rak 1

我不熟悉该函数的作用 - 但是否有可能重复计算？如果是这样，您可以记住该函数：

cache = {}
def dbscan_metric(a,b):

  diff = a - b

  if a.shape[0] != NUM_FEATURES:
    to_calc = diff
  else:
    to_calc = np.multiply(FTR_WEIGHTS, diff)

  if not cache.get(to_calc): cache[to_calc] = np.linalg.norm(to_calc)

  return cache[to_calc]

Run Code Online (Sandbox Code Playgroud)

归档时间：	11 年，7 月前
查看次数：	378 次
最近记录：	11 年，5 月前