矢量在线上的正交投影与numpy产生错误的结果

Flo*_*lin 7 python plot numpy linear-algebra orthogonal

我有350个文档分数,当我绘制它们时,它具有以下形状:

docScores = [(0, 68.62998962), (1, 60.21374512), (2, 54.72480392), 
             (3, 50.71389389), (4, 49.39723969), ...,  
             (345, 28.3756237), (346, 28.37126923), 
             (347, 28.36397934), (348, 28.35762787), (349, 28.34219933)]
Run Code Online (Sandbox Code Playgroud)

我张贴的完整的阵列这里pastebin(它对应于dataPoints下面的代码清单).

分数分布

现在,我最初需要找到elbow point这条L-shape曲线,我发现这要归功于这篇文章.

现在,在下图中,红色矢量p代表肘点.我想找到点x=(?,?)上的矢量(黄星)b,其对应于正交投影pb.

在此输入图像描述

情节上的红点是我得到的(这显然是错误的).我做到了以下几点:

b_hat = b / np.linalg.norm(b)    #unit vector of b
proj_p_onto_b = p.dot(b_hat)*b_hat
red_point = proj_p_onto_b + s
Run Code Online (Sandbox Code Playgroud)

现在,如果投射pb由它的定义开始和结束点,即sx(黄色星号),它遵循proj_p_onto_b = x - s,因此x = proj_p_onto_b + s

我在这里弄错了吗?

编辑:在回答@cxw时,这里是计算肘点的代码:

def findElbowPoint(self, rawDocScores):
    dataPoints = zip(range(0, len(rawDocScores)), rawDocScores)
    s = np.array(dataPoints[0])
    l = np.array(dataPoints[len(dataPoints)-1])
    b_vect = l-s
    b_hat = b_vect/np.linalg.norm(b_vect)
    distances = []
    for scoreVec in dataPoints[1:]:
        p = np.array(scoreVec) - s
        proj = p.dot(b_hat)*b_hat
        d = abs(np.linalg.norm(p - proj)) # orthgonal distance between b and the L-curve
        distances.append((scoreVec[0], scoreVec[1], proj, d))

    elbow_x = max(distances, key=itemgetter(3))[0]
    elbow_y = max(distances, key=itemgetter(3))[1]
    proj = max(distances, key=itemgetter(3))[2]
    max_distance = max(distances, key=itemgetter(3))[3]

    red_point = proj + s
Run Code Online (Sandbox Code Playgroud)

编辑:这是情节的代码:

>>> l_curve_x_values = [x[0] for x in docScores]
>>> l_curve_y_values = [x[1] for x in docScores]
>>> b_line_x_values = [x[0] for x in docScores]
>>> b_line_y_values = np.linspace(s[1], l[1], len(docScores))
>>> p_line_x_values = l_curve_x_values[:elbow_x]
>>> p_line_y_values = np.linspace(s[1], elbow_y, elbow_x)
>>> plt.plot(l_curve_x_values, l_curve_y_values, b_line_x_values, b_line_y_values, p_line_x_values, p_line_y_values)
>>> red_point = proj + s
>>> plt.plot(red_point[0], red_point[1], 'ro')
>>> plt.show()
Run Code Online (Sandbox Code Playgroud)

War*_*ser 4

如果您使用绘图直观地确定解决方案是否正确,则必须在每个轴上使用相同的比例绘制数据,即使用plt.axis('equal')。如果轴的比例不相等,则绘图中线之间的角度会扭曲。