根据这个任务:
我包括了这一行
import from matplotlib.mlab import PCA
Run Code Online (Sandbox Code Playgroud)
但我收到错误消息:
无法从“matplotlib.mlab”导入名称“PCA”
我正在使用 Python3.7,我不知道如何使用 matlab 中的 PCA 函数。新版本的 matplotlib 是否已弃用,或者 PCA 是否包含在另一个库中?
我正在尝试使用以下代码绘制数据的 PCA 屏幕:
\nlibrary(ade4)\ndata("olympic")\npca.olympic <- princomp(olympic$tab)\nplot(1:10, pca.olympic$sdev\xcb\x862, type="b", xlab="# PCs", ylab="Variance of PC",\n main="PCA of Covariance Matrix")\nRun Code Online (Sandbox Code Playgroud)\n当我运行此代码时,我得到以下输出:
\n\n当我应该看到这样的东西时:
\n\n有人可以解释如何解决这个问题吗?
\n我有一些数据,我想做一个PCA情节.然而,前两个主要成分完全是由于3个异常样本(32个中),我想跳过这些,只是从第3个开始绘制主成分.这是可能的,还是我必须做一些计算从数据中减去前两个主成分然后绘制剩余部分?
有了prcomp()函数,我估计了方差百分比的解释
prcomp(env, scale=TRUE)
Run Code Online (Sandbox Code Playgroud)
第二列summary(pca)显示了所有PC的这些值:
PC1 PC2 PC3 PC4 PC5 PC6 PC7
Standard deviation 7.3712 5.8731 2.04668 1.42385 1.13276 0.79209 0.74043
Proportion of Variance 0.5488 0.3484 0.04231 0.02048 0.01296 0.00634 0.00554
Cumulative Proportion 0.5488 0.8972 0.93956 0.96004 0.97300 0.97933 0.98487
Run Code Online (Sandbox Code Playgroud)
现在,我想查找每台PC的特征值:
pca$sdev^2
[1] 5.433409e+01 3.449329e+01 4.188887e+00 2.027337e+00 1.283144e+00
[6] 6.274083e-01 5.482343e-01
Run Code Online (Sandbox Code Playgroud)
但是这些值似乎只是PVE本身的替代表示。那我在做什么错呢?
我已经创建了一个PCA,用于从放置在四个基板上的四个位置上收集的个体进行测量,重复三次.我有性别(男性或女性)和"核型"(三个可能类别的因素)和计算每个人的前两个PC分数.
我想制作一个情节,其中男性和女性有不同的符号,符号的颜色取决于karotype.我用下面的代码创建了一个图,它为我提供了一个符号颜色编码的三个核型,并在男性和女性周围放置了95%的置信度.
如何更改每个性别的符号并保持颜色依赖于karytype?我也想在传说中反映这一点.
最后一个问题.是否可以为每个PC(而不是每个人)添加一个箭头,类似于在排序图中找到的箭头?

样本数据:
test <- structure(list(Location = structure(c(1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L), .Label = c("Kampinge", "Kaseberga", "Molle", "Steninge"
), class = "factor"), Substrate = structure(c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 2L, 2L), .Label = c("Kampinge", "Kaseberga", "Molle",
"Steninge"), class = "factor"), …Run Code Online (Sandbox Code Playgroud) 我正在使用PCA,我在Python中发现了sklearn中的PCA,而在Matlab中发现了pca()产生了不同的结果.这是我正在使用的测试矩阵.
a = np.array([[-1,-1], [-2,-1], [-3, -2], [1,1], [2,1], [3,2]])
Run Code Online (Sandbox Code Playgroud)
对于Python sklearn,我得到了
p = PCA()
print(p.fit_transform(a))
[[-1.38340578 0.2935787 ]
[-2.22189802 -0.25133484]
[-3.6053038 0.04224385]
[ 1.38340578 -0.2935787 ]
[ 2.22189802 0.25133484]
[ 3.6053038 -0.04224385]]
Run Code Online (Sandbox Code Playgroud)
对于Matlab,我得到了
pca(a', 'Centered', false)
[0.2196 0.5340
0.3526 -0.4571
0.5722 0.0768
-0.2196 -0.5340
-0.3526 0.4571
-0.5722 -0.0768]
Run Code Online (Sandbox Code Playgroud)
为什么观察到这种差异?
谢谢Dan的回答.结果现在看起来很合理.但是,如果我使用随机矩阵进行测试,似乎Matlab和Python正在产生的结果不是彼此的标量倍数.为什么会这样?
test matrix a:
[[ 0.36671885 0.77268624 0.94687497]
[ 0.75741855 0.63457672 0.88671836]
[ 0.20818031 0.709373 0.45114135]
[ 0.24488718 0.87400025 0.89382836]
[ 0.16554686 0.74684393 0.08551401]
[ 0.07371664 0.1632872 0.84217978]]
Run Code Online (Sandbox Code Playgroud)
Python结果:
p …Run Code Online (Sandbox Code Playgroud) 为什么pca在Matlab中使用,我无法得到正交主成分矩阵
例如:
A=[3,1,-1;2,4,0;4,-2,-5;11,22,20];
A =
3 1 -1
2 4 0
4 -2 -5
11 22 20
>> W=pca(A)
W =
0.2367 0.9481 -0.2125
0.6731 -0.3177 -0.6678
0.7006 -0.0150 0.7134
>> PCA=A*W
PCA =
0.6826 2.5415 -2.0186
3.1659 0.6252 -3.0962
-3.9026 4.5028 -3.0812
31.4249 3.1383 -2.7616
Run Code Online (Sandbox Code Playgroud)
这里,每列都是主要组成部分.所以,
>> PCA(:,1)'*PCA(:,2)
ans =
84.7625
Run Code Online (Sandbox Code Playgroud)
但主成分矩阵没有相互正交的成分.
我检查了一些材料,它说它们不仅不相关,而且严格正交.但我无法得到理想的结果.谁能告诉我哪里出错了?
谢谢!
我正在尝试使用网格搜索来选择数据的主成分数量,然后再进行线性回归.我很困惑如何制作我想要的主要组件数量的字典.我把我的列表放在param_grid参数中的字典格式中,但我认为我做错了.到目前为止,我已经收到有关包含infs或NaN的数组的警告.
我遵循将线性回归流水线化为PCA的说明:http://scikit-learn.org/stable/auto_examples/plot_digits_pipe.html
ValueError:数组不能包含infs或NaN
我能够在可重现的示例中得到相同的错误,我的真实数据集更大:
import pandas as pd
import numpy as np
from sklearn.decomposition import PCA
from sklearn.linear_model import LinearRegression
from sklearn.pipeline import Pipeline
from sklearn.model_selection import GridSearchCV
df2 = pd.DataFrame({ 'C' : pd.Series(1, index = list(range(8)),dtype = 'float32'),
'D' : np.array([3] * 8,dtype = 'int32'),
'E' : pd.Categorical(["test", "train", "test", "train",
"test", "train", "test", "train"])})
df3 = pd.get_dummies(df2)
lm = LinearRegression()
pipe = [('pca',PCA(whiten=True)),
('clf' ,lm)]
pipe = Pipeline(pipe)
param_grid = {
'pca__n_components': np.arange(2,4)}
X = …Run Code Online (Sandbox Code Playgroud) 我有兴趣从我的数据集的累积 PCA 图中选取前 10 个 PCA 组件。我设法获得了 PCA 图,例如碎石图、配对图等,但对我来说没有多大意义。所以我想从它的累积 PCA 图中选择前 10 个 PCA 图并且我做到了,但是我需要使用这个前 10 个 PCA 组件来对我的原始数据集进行子集化。谁能指出我如何使尝试更准确和更可取?
可重复数据:
persons_df <- data.frame(person1=sample(1:200,20, replace = FALSE),
person2=as.factor(sample(20)),
person3=sample(1:250,20, replace = FALSE),
person4=sample(1:300,20, replace = FALSE),
person5=as.factor(sample(20)),
person6=as.factor(sample(20)))
row.names(persons_df) <-letters[1:20]
Run Code Online (Sandbox Code Playgroud)
我的尝试:
my_pca <- prcomp(t(persons_df), center=TRUE, scale=FALSE)
summary(my_pca)
my_pca_proportionvariances <- cumsum(((my_pca$sdev^2) / (sum(my_pca$sdev^2)))*100)
Run Code Online (Sandbox Code Playgroud)
公共数据集:
由于我在创建上述可复制数据时遇到了一些问题,因此我在这里链接了公共示例数据集
在这里,我需要为 选择前 10 个 PCA 组件persons_df,然后对原始数据进行子集化,然后对其运行简单的线性回归。我怎样才能在这里完成我的方法以实现我的目标?有人能在这里快速指出我吗?任何的想法?
可能重复:
比较R中的svd和princomp
如何在R中使用2种方法(princomp()和相关矩阵的svd)执行PCA
我有一个数据集,如:
438,498,3625,3645,5000,2918,5000,2351,2332,2643,1698,1687,1698,1717,1744,593,502,493,504,445,431,444,440,429,10
438,498,3625,3648,5000,2918,5000,2637,2332,2649,1695,1687,1695,1720,1744,592,502,493,504,449,431,444,443,429,10
438,498,3625,3629,5000,2918,5000,2637,2334,2643,1696,1687,1695,1717,1744,593,502,493,504,449,431,444,446,429,10
437,501,3625,3626,5000,2918,5000,2353,2334,2642,1730,1687,1695,1717,1744,593,502,493,504,449,431,444,444,429,10
438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10
439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10
440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10
444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10
451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20
458,5022,3640,3644,5000,2922,5000,2346,2321,2628,1688,1666,1674,1696,744,590,496,490,498,462,444,458,461,449,20
465,525,3646,3670,5000,2923,5000,2611,2315,2631,1674,1658,1666,1688,735,593,495,488,497,467,449,462,469,457,20
473,533,3652,3676,5000,2925,5000,2607,2310,2623,1669,1651,1659,1684,729,578,496,487,498,469,454,467,476,465,20
481,544,3658,3678,5000,2926,5000,2606,2303,2619,1668,1643,1651,1275,723,581,495,486,497,477,459,472,484,472,20
484,544,3661,3665,5000,2928,5000,2321,2304,5022,1647,1639,1646,1270,757,623,493,484,495,480,461,474,485,476,20
484,532,3669,3662,2945,2926,5000,2326,2306,2620,1648,1639,1646,1270,760,533,493,483,494,507,461,473,486,476,20
482,520,3685,3664,2952,2927,5000,2981,2307,2329,1650,1640,1644,1268,757,533,492,482,492,513,459,474,485,474,20
481,522,3682,3661,2955,2927,2957,2984,1700,2622,1651,1641,1645,1272,761,530,492,482,492,513,462,486,483,473,20
480,525,3694,3664,2948,2926,2950,2995,1697,2619,1651,1642,1646,1269,762,530,493,482,492,516,462,486,483,473,20
481,515,5018,3664,2956,2927,2947,2993,1697,2622,1651,1641,1645,1269,765,592,489,482,495,531,462,499,483,473,20
479,5000,3696,3661,2953,2927,2944,2993,1702,2622,1649,1642,1645,1269,812,588,489,481,491,510,462,481,483,473,20
480,506,5019,3665,2941,2929,2945,2981,1700,2616,1652,1642,1645,1271,814,643,491,480,493,524,461,469,484,473,20
479,5000,5019,3661,2943,2930,2942,2996,1698,2312,1653,1642,1644,1274,811,617,491,479,491,575,461,465,484,473,20
479,5000,5020,3662,2945,2931,2942,2997,1700,2313,1654,1642,1644,1270,908,616,490,478,489,503,460,460,478,473,10
481,508,5021,3660,2954,2936,2946,2966,1705,2313,1654,1643,1643,1270,1689,678,493,477,483,497,467,459,476,473,10
486,510,522,3662,2958,2938,2939,2627,1707,2314,1659,1643,1639,1665,1702,696,516,476,477,547,465,457,470,474,10
479,521,520,3663,2954,2938,2941,2957,1712,2314,1660,1643,1638,1660,1758,688,534,475,475,489,461,456,465,474,10
480,554,521,3664,2954,2938,2941,2632,1715,2313,1660,1643,1637,1656,1761,687,553,475,474,558,462,453,465,476,10
481,511,5023,3665,2954,2937,2941,2627,1707,2312,1660,1641,1636,1655,1756,687,545,475,475,504,463,458,470,477,10
482,528,524,3665,2953,2937,2940,2629,1706,2312,1657,1640,1635,1654,1756,566,549,475,476,505,464,459,468,477,10
Run Code Online (Sandbox Code Playgroud)
所以我这样做:
x <- read.csv("C:\\data_25_1000.txt",header=F,row.names=NULL)
p1 <- princomp(x, cor = TRUE) ## using correlation matrix
p1
Call:
princomp(x = x, cor = TRUE)
Standard deviations:
Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9 Comp.10 Comp.11 Comp.12 Comp.13 Comp.14 Comp.15 Comp.16
1.9800328 1.8321498 1.4147367 1.3045541 1.2016116 1.1708212 1.1424120 …Run Code Online (Sandbox Code Playgroud) 我试图找到一种方法使R的3D PCA可视化更加便携; 我在2D矩阵上运行PCA prcomp().
princomp()和prcomp()?的实际区别?谢谢!
我想使用Matlab的"princomp"函数,但是这个函数给出了一个排序数组中的特征值.这样我就无法找出哪个列对应哪个特征值.对于Matlab,
m = [1,2,3;4,5,6;7,8,9];
[pc,score,latent] = princomp(m);
Run Code Online (Sandbox Code Playgroud)
是相同的
m = [2,1,3;5,4,6;8,7,9];
[pc,score,latent] = princomp(m);
Run Code Online (Sandbox Code Playgroud)
也就是说,交换前两列不会改变任何东西.潜在的结果(特征值)将是:(27,0,0)信息(特征值对应于哪个原始(输入)列)丢失.有没有办法告诉matlab不要对特征值进行排序?
pca ×13
r ×7
matlab ×3
python ×3
scikit-learn ×3
eigenvalue ×2
plot ×2
canvas ×1
html5 ×1
matplotlib ×1
orthogonal ×1
python-3.x ×1
svd ×1