lhk*_*lhk 4 python math regression numpy pca
I'm trying to implement a 2D PCA with numpy. The code is rather simple:
import numpy as np
n=10
d=10
x=np.linspace(0,10,n)
y=x*d
covmat = np.cov([x,y])
print(covmat)
eig_values, eig_vecs = np.linalg.eig(covmat)
largest_index = np.argmax(eig_values)
largest_eig_vec = eig_vecs[largest_index]
Run Code Online (Sandbox Code Playgroud)
The covariance matrix is:
[[ 11.31687243 113.16872428]
[ 113.16872428 1131.6872428 ]]
Run Code Online (Sandbox Code Playgroud)
Then I've got a simple helper method to plot a line (as a series of points) around a given center, in a given direction. This is meant to be used by pyplot, therefore I'm preparing separate lists for the x and y coordinate.
def plot_line(center, dir, num_steps, step_size):
line_x = []
line_y = []
for i in range(num_steps):
dist_from_center = step_size * (i - num_steps / 2)
point_on_line = center + dist_from_center * dir
line_x.append(point_on_line[0])
line_y.append(point_on_line[1])
return (line_x, line_y)
Run Code Online (Sandbox Code Playgroud)
And finally the plot setup:
lines = []
mean_point=np.array([np.mean(x),np.mean(y)])
lines.append(plot_line(mean_point, largest_eig_vec, 200, 0.5))
import matplotlib.pyplot as plt
fig = plt.figure()
ax = fig.add_subplot(111)
ax.scatter(x,y, c="b", marker=".", s=10
)
for line in lines:
ax.plot(line[0], line[1], c="r")
ax.scatter(mean_point[0], mean_point[1], c="y", marker="o", s=20)
plt.axes().set_aspect('equal', 'datalim')
plt.show()
Run Code Online (Sandbox Code Playgroud)
Unfortunately, the PCA doesn't seem to work. Here's the plot:
I'm afraid I've got no idea what went wrong.

The final plot shows that the line fitted by pca is the correct result only it is mirrored at the y axis.
In fact, if I change the x coordinate of the eigenvector, the line is fitted perfectly:
Apparently this is a fundamental problem. Somehow I've misunderstood how to use pca.
Where is my mistake ? Online resources seem to describe PCA exactly as I implemented it. I don't believe I have to categorically mirror my line-fits at the y-axis. It's got to be something else.
您的错误是您正在提取特征向量数组的最后一行。但是,特征向量形成列返回的特征向量数组np.linalg.eig,而不是行。从文档中:
[...]数组a,w和v满足方程式
dot(a[:,:], v[:,i]) = w[i] * v[:,i][对于每个i]
where a是np.linalg.eig应用到的数组,是w特征值的1d数组,v是特征向量的2d数组。因此,列v[:, i]是特征向量。
在这种简单的二维情况下,由于两个特征向量是相互正交的(因为我们以对称矩阵开始)和单位长度是相互正交的(因此将np.linalg.eig它们归一化),所以特征向量数组具有以下两种形式之一
[[ cos(t) sin(t)]
[-sin(t) cos(t)]]
Run Code Online (Sandbox Code Playgroud)
要么
[[ cos(t) sin(t)]
[ sin(t) -cos(t)]]
Run Code Online (Sandbox Code Playgroud)
对于某些实数t,在第一种情况下,读取第一行(例如)而不是第一列将[cos(t), sin(t)]代替[cos(t), -sin(t)]。这解释了您所看到的明显反射。
更换线
largest_eig_vec = eig_vecs[largest_index]
Run Code Online (Sandbox Code Playgroud)
与
largest_eig_vec = eig_vecs[:, largest_index]
Run Code Online (Sandbox Code Playgroud)
并且您应该得到预期的结果。
| 归档时间: |
|
| 查看次数: |
952 次 |
| 最近记录: |