标签: svd

如何分割推荐系统的极端稀疏数据集的训练/测试?

我在现实世界的数据集上使用 CF 算法(SVD)。现在我遇到一个关于数据稀疏问题的问题。这意味着用户/项目评分矩阵的稀疏度约为 0.01%。我以 80/20 将数据分成训练/测试集,我发现测试集中只有少数用户和项目出现在训练集中,所以我可以只使用测试集中的一些评级来计算 RMSE。你能给我一些修复它的建议吗?

recommendation-engine machine-learning svd collaborative-filtering

2
推荐指数
1
解决办法
2212
查看次数

Spark:每个执行程序的核心对应用程序运行时没有影响

我正在测试每个执行器(--executor-cores)的不同核心数对Spark上SVD的运行时间的影响.在--executor-cores固定的情况下,主数据RDD的分区数量是变化的.但是,--executor-cores对于给定数量的RDD分区,对于不同的SVD计算时间似乎没有显着变化.这有点令人困惑.

我的环境是:

  • 具有3个节点的Spark Cluster(每个节点32个内核和32GB内存).每个节点运行1个Worker.
  • spark.max.cores = 96
  • 集群管理器= Standalone
  • 部署模式= client

我已经绘制了结果,--executor-cores = [4, 16]并且可以看出,对于给定的分区大小,分区大小增加时的计算时间之间没有太大差异.所以我的问题是:

  • 设置每个执行程序的核心数有什么影响?
  • 每个执行程序的内核确实对运行时有重大影响,但仅适用于较小的分区大小而不适用于大分区,为什么?
  • 它是否会以任何方式影响并行性(我不确定是这样)?

在此输入图像描述

parallel-processing svd apache-spark apache-spark-mllib

2
推荐指数
1
解决办法
866
查看次数

在聚类中找到异常值的标识

我是机器学习的新手,最近几天正在尝试使用奇异值分解(SVD)。基于x和y值,我使用绘制了以下图matplotlib。我正在检测网络用户的异常活动。在此图中,离群点很少。我想确定谁属于这些异常值。

为了使它更易于理解,我们采用以下数据集。

基于网页访问的原始矩阵。

matrix = mat( [[1,0,0,1,1,0,1,0,1,0], [1,0,0,0,1,0,1,0,1,1],[1,0,1,0,1,0,0,0,1,0],[0,1,1,1,0,1,0,1,0,0],[1,1,0,0,1,0,1,1,1,1],[0,0,1,0,1,1,0,1,0,0],[1,1,0,1,0,1,0,0,1,0],[1,0,0,0,1,0,1,1,1,1],[0,1,1,0,1,0,1,0,0,0],[1,1,0,1,0,1,0,1,1,0]] )
Run Code Online (Sandbox Code Playgroud)

SVD计算后的x,y坐标。

x = [-0.34095692,-0.34044722,-0.27155318,-0.21320583,-0.44657865,-0.19587836, -0.29414279, -0.3948753 ,-0.21655774 , -0.34857087]
y = [0.16305762,0.38554548, 0.10412536, -0.57981103, 0.17927523, -0.22612216, -0.34569697, 0.30463137,0.01301744,-0.42661108]
Run Code Online (Sandbox Code Playgroud)

我想要的是找到谁属于给定的数据点。像明智的大数据集情节中如何找到异常值的标识?希望你理解我的问题。

在此处输入图片说明

python cluster-analysis machine-learning matplotlib svd

1
推荐指数
1
解决办法
4192
查看次数

将matlab图像svd方法转换为opencv


我想在visual studio中用c ++编写一个带opencv的程序.我的代码遵循matlab代码:

close all
clear all
clc

%reading and converting the image
inImage=imread('pic.jpg');
inImageD=double(inImage);

[U,S,V]=svd(inImageD);

% Using different number of singular values (diagonal of S) to compress and
% reconstruct the image
dispEr = [];
numSVals = [];
for N=5:25:300
  % store the singular values in a temporary var
  C = S;

  % discard the diagonal values not required for compression
  C(N+1:end,:)=0;
  C(:,N+1:end)=0;

  % Construct an Image using the selected singular values
   D=U*C*V';


  % display and compute error
  figure; …
Run Code Online (Sandbox Code Playgroud)

opencv svd

1
推荐指数
1
解决办法
990
查看次数

Armadillo中的稀疏svd(C++)

根据http://arma.sourceforge.net/docs.html#part_c,Armadillo支持以下功能:

eig_sym
eig_gen
eigs_sym
eigs_gen
svd
svd_econ
Run Code Online (Sandbox Code Playgroud)

但似乎没有像"svds_econ"这样的函数,它在"稀疏"矩阵上运行并返回奇异值和向量.

有没有办法在Armadillo中实现这个功能?

c++ eigenvalue svd armadillo

1
推荐指数
1
解决办法
725
查看次数

最小二乘法:正规方程与 svd

我尝试编写自己的线性回归代码,遵循正常方程beta = inv(X'X)X'Ylstsq然而,平方误差比中的函数大得多numpy.linalg。有人可以向我解释为什么 SVD 方法(lstsq 使用的)比正规方程更准确吗?谢谢

numpy linear-regression svd least-squares

1
推荐指数
1
解决办法
1482
查看次数

组件数截断SVD

可以通过使用截断的SVD来降低尺寸。它通过截断奇异值分解(SVD)进行线性降维。但是,必须在分解之前选择组件的数量。

n_comp = 25
tfidf_vec = TfidfVectorizer(analyzer="word", max_features=5000, ngram_range=(1,2))
svd = TruncatedSVD(n_components=n_comp, algorithm='arpack')
tfidf_df = tfidf_vec.fit_transform(values)
df = svd.fit_transform(tfidf_df)
Run Code Online (Sandbox Code Playgroud)

如何选择零件数量?

statistics machine-learning svd scikit-learn

1
推荐指数
1
解决办法
2111
查看次数

MATLAB SVD 奇异值排序

SVD 的 MATLAB 文档指出,返回的对角矩阵具有按降序排列的奇异值。有没有办法找出奇异值的自然顺序是什么?我问的原因是因为奇异值对应于与输入矩阵的行相关的维度。

matlab svd

0
推荐指数
1
解决办法
4208
查看次数

有可能扭转svds

是否可以在matlab中反转以下内容:

[U,S,V]=svds(fulldata,columns);
Run Code Online (Sandbox Code Playgroud)

matlab svd

0
推荐指数
1
解决办法
657
查看次数

最适合3D数据的平面

我有我的3D数据X,Y,Z(大小为NxM的矩阵)

我想把它装到我做过的最合适的飞机上:

X = X(isfinite(X));% deleting the NaN because svd Doesn't accept them
Y = Y(isfinite(Y));
Z = Z(isfinite(Z));

G = [X,Y,Z,ones(size(X(:)))];
[u s v] = svd(G,0);
P = v(:,4);
scalar = 2*P./P(1);
P = P./scalar; % supposed to be my plane equation but there is something wrong
Run Code Online (Sandbox Code Playgroud)

然后从X和Y重新计算Z.

Z = -(P(1)*X + P(2)*Y + P(4)) / P(3);
Run Code Online (Sandbox Code Playgroud)

我不知道是什么问题!!

matlab svd

0
推荐指数
1
解决办法
9956
查看次数

如何在数字数组中填充NaN值以应用SVD?

我将两个数据帧合并在一起,这些数据帧具有一些公共列,但是有一些不同的列。我想对合并的数据帧应用奇异值分解(SVD)。但是,填充NaN值会影响结果,在我的情况下,即使用零填充数据也将是错误的,因为有些列的值为零。这是一个例子。有什么办法解决这个问题?

>>> df1 = pd.DataFrame(np.random.rand(6, 4), columns=['A', 'B', 'C', 'D'])
>>> df1
          A         B         C         D
0  0.763144  0.752176  0.601228  0.290276
1  0.632144  0.202513  0.111766  0.317838
2  0.494587  0.318276  0.951354  0.051253
3  0.184826  0.429469  0.280297  0.014895
4  0.236955  0.560095  0.357246  0.302688
5  0.729145  0.293810  0.525223  0.744513
>>> df2 = pd.DataFrame(np.random.rand(6, 4), columns=['A', 'B', 'C', 'E'])
>>> df2
          A         B         C         E
0  0.969758  0.650887  0.821926  0.884600
1  0.657851  0.158992  0.731678  0.841507
2  0.923716  0.524547  0.783581  0.268123
3  0.935014  0.219135  0.152794 …
Run Code Online (Sandbox Code Playgroud)

python numpy svd python-3.x

0
推荐指数
1
解决办法
3139
查看次数

使用princomp()并使用svd()在R中执行pca

可能重复:
比较R中的svd和princomp

如何在R中使用2种方法(princomp()和相关矩阵的svd)执行PCA

我有一个数据集,如:

438,498,3625,3645,5000,2918,5000,2351,2332,2643,1698,1687,1698,1717,1744,593,502,493,504,445,431,444,440,429,10
438,498,3625,3648,5000,2918,5000,2637,2332,2649,1695,1687,1695,1720,1744,592,502,493,504,449,431,444,443,429,10
438,498,3625,3629,5000,2918,5000,2637,2334,2643,1696,1687,1695,1717,1744,593,502,493,504,449,431,444,446,429,10
437,501,3625,3626,5000,2918,5000,2353,2334,2642,1730,1687,1695,1717,1744,593,502,493,504,449,431,444,444,429,10
438,498,3626,3629,5000,2918,5000,2640,2334,2639,1696,1687,1695,1717,1744,592,502,493,504,449,431,444,441,429,10
439,498,3626,3629,5000,2918,5000,2633,2334,2645,1705,1686,1694,1719,1744,589,502,493,504,446,431,444,444,430,10
440,5000,3627,3628,5000,2919,3028,2346,2330,2638,1727,1684,1692,1714,1745,588,501,492,504,451,433,446,444,432,10
444,5021,3631,3634,5000,2919,5000,2626,2327,2638,1698,1680,1688,1709,1740,595,500,491,503,453,436,448,444,436,10
451,5025,3635,3639,5000,2920,3027,2620,2323,2632,1706,1673,1681,1703,753,595,499,491,502,457,440,453,454,442,20
458,5022,3640,3644,5000,2922,5000,2346,2321,2628,1688,1666,1674,1696,744,590,496,490,498,462,444,458,461,449,20
465,525,3646,3670,5000,2923,5000,2611,2315,2631,1674,1658,1666,1688,735,593,495,488,497,467,449,462,469,457,20
473,533,3652,3676,5000,2925,5000,2607,2310,2623,1669,1651,1659,1684,729,578,496,487,498,469,454,467,476,465,20
481,544,3658,3678,5000,2926,5000,2606,2303,2619,1668,1643,1651,1275,723,581,495,486,497,477,459,472,484,472,20
484,544,3661,3665,5000,2928,5000,2321,2304,5022,1647,1639,1646,1270,757,623,493,484,495,480,461,474,485,476,20
484,532,3669,3662,2945,2926,5000,2326,2306,2620,1648,1639,1646,1270,760,533,493,483,494,507,461,473,486,476,20
482,520,3685,3664,2952,2927,5000,2981,2307,2329,1650,1640,1644,1268,757,533,492,482,492,513,459,474,485,474,20
481,522,3682,3661,2955,2927,2957,2984,1700,2622,1651,1641,1645,1272,761,530,492,482,492,513,462,486,483,473,20
480,525,3694,3664,2948,2926,2950,2995,1697,2619,1651,1642,1646,1269,762,530,493,482,492,516,462,486,483,473,20
481,515,5018,3664,2956,2927,2947,2993,1697,2622,1651,1641,1645,1269,765,592,489,482,495,531,462,499,483,473,20
479,5000,3696,3661,2953,2927,2944,2993,1702,2622,1649,1642,1645,1269,812,588,489,481,491,510,462,481,483,473,20
480,506,5019,3665,2941,2929,2945,2981,1700,2616,1652,1642,1645,1271,814,643,491,480,493,524,461,469,484,473,20
479,5000,5019,3661,2943,2930,2942,2996,1698,2312,1653,1642,1644,1274,811,617,491,479,491,575,461,465,484,473,20
479,5000,5020,3662,2945,2931,2942,2997,1700,2313,1654,1642,1644,1270,908,616,490,478,489,503,460,460,478,473,10
481,508,5021,3660,2954,2936,2946,2966,1705,2313,1654,1643,1643,1270,1689,678,493,477,483,497,467,459,476,473,10
486,510,522,3662,2958,2938,2939,2627,1707,2314,1659,1643,1639,1665,1702,696,516,476,477,547,465,457,470,474,10
479,521,520,3663,2954,2938,2941,2957,1712,2314,1660,1643,1638,1660,1758,688,534,475,475,489,461,456,465,474,10
480,554,521,3664,2954,2938,2941,2632,1715,2313,1660,1643,1637,1656,1761,687,553,475,474,558,462,453,465,476,10
481,511,5023,3665,2954,2937,2941,2627,1707,2312,1660,1641,1636,1655,1756,687,545,475,475,504,463,458,470,477,10
482,528,524,3665,2953,2937,2940,2629,1706,2312,1657,1640,1635,1654,1756,566,549,475,476,505,464,459,468,477,10
Run Code Online (Sandbox Code Playgroud)

所以我这样做:

x <- read.csv("C:\\data_25_1000.txt",header=F,row.names=NULL)
p1 <- princomp(x, cor = TRUE)  ## using correlation matrix
p1
Call:
princomp(x = x, cor = TRUE)

    Standard deviations:
       Comp.1    Comp.2    Comp.3    Comp.4    Comp.5    Comp.6    Comp.7    Comp.8    Comp.9   Comp.10   Comp.11   Comp.12   Comp.13   Comp.14   Comp.15   Comp.16 
    1.9800328 1.8321498 1.4147367 1.3045541 1.2016116 1.1708212 1.1424120 …
Run Code Online (Sandbox Code Playgroud)

r svd pca

-1
推荐指数
1
解决办法
6148
查看次数