小编Doe*_*Noe的帖子

哪个没有按预期工作

我有一个包含3列和总共10,000个元素的矩阵.第一列和第二列是索引,第三列是分数.我想根据以下公式对得分列进行标准化:

Normalized_score_i_j = score_i_j / ((sqrt(score_i_i) * (sqrt(score_j_j))

Run Code Online (Sandbox Code Playgroud)

score_i_j =当前得分本身

score_i_i =查看第一列中当前得分的索引,并在数据集中查找在第一列和第二列中都包含该索引的得分

score_j_j =在第二列中查看当前得分的索引,并在数据集中查找在第一列和第二列中都包含该索引的得分

例如,如果df如下:

df <- read.table(text = "
First.Protein,Second.Protein,Score
1,1,25
1,2,90
1,3,82
1,4,19
2,1,90
2,2,99
2,3,76
2,4,79
3,1,82
3,2,76
3,3,91
3,4,33
4,1,28
4,2,11
4,3,99
4,4,50
", header = TRUE, sep = ",")

Run Code Online (Sandbox Code Playgroud)

如果我们正常化这一行:

First.Protein Second.Protein Score
4             3              99

Run Code Online (Sandbox Code Playgroud)

标准化分数为:

得分本身除以得分的sqrt,其First.Protein和Second.Protein指数均为4乘以其First.Protein和Second.Protein指数均为3的得分的sqrt.

因此:

Normalized =  99 / (sqrt(50) * sqrt(91)) = 1.467674

Run Code Online (Sandbox Code Playgroud)

我有下面的代码,但它表现得非常奇怪,并且给我的值根本没有标准化,实际上非常奇怪:

for(i in 1:nrow(Smith_Waterman_Scores))
{
  Smith_Waterman_Scores$Score[i] <- 
    Smith_Waterman_Scores$Score[i] / 
    (sqrt(Smith_Waterman_Scores$Score[which(Smith_Waterman_Scores$First.Protein==Smith_Waterman_Scores$First.Protein[i] & Smith_Waterman_Scores$Second.Protein==Smith_Waterman_Scores$First.Protein[i])])) *
    (sqrt(Smith_Waterman_Scores$Score[which(Smith_Waterman_Scores$First.Protein==Smith_Waterman_Scores$Second.Protein[i] & Smith_Waterman_Scores$Second.Protein==Smith_Waterman_Scores$Second.Protein[i])]))
}

Run Code Online (Sandbox Code Playgroud)

r bioinformatics which

Doe*_*Noe

2016 05-12

3
推荐指数

1
解决办法

122
查看次数

在R中进行矩阵乘法时的非整合数组

我正在尝试在R中实现内核岭回归。

公式为：

alpha <- ((lambda.I + K)^(-1)) * y

Run Code Online (Sandbox Code Playgroud)

Lambda = 0.1。I =与K大小相同的单位矩阵。y是特征向量，具有与K相同的行数。

所以我在R中尝试了这个：

I <- diag(nrow(df_matrix)
lambda <- 0.1
alpha <- (lambda * I + df_matrix) ^ (-1) * df_vector

Run Code Online (Sandbox Code Playgroud)

我收到以下错误

Error in (0.1 * I + df_matrix)^(-1) * df_vector : non-conformable arrays

Run Code Online (Sandbox Code Playgroud)

这是我的数据集上的一些信息

> nrow(df_matrix)
[1] 8222
> ncol(df_matrix)
[1] 8222
> nrow(df_vector)
[1] 8222
> nrow(I)
[1] 8222
> ncol(I)
[1] 8222
> class(df_matrix)
[1] "matrix"
> class(df_vector)
[1] "matrix"

Run Code Online (Sandbox Code Playgroud)

r matrix

Doe*_*Noe

lucky-day

2
推荐指数

1
解决办法

8144
查看次数