Ore*_*hes 1 regex string r matrix
我有一些特殊格式的字符串,代表集合.在R中,我想将它们转换为相似度矩阵.
例如,一个字符串显示1 + 2包含一个集合,3个单独在一个集合中,4,5和6包含一个集合是:
"1+2,3,4+5+6"
Run Code Online (Sandbox Code Playgroud)
对于上面的例子,我希望能够生产
Run Code Online (Sandbox Code Playgroud)[,1] [,2] [,3] [,4] [,5] [,6] [1,] 1 1 0 0 0 0 [2,] 1 1 0 0 0 0 [3,] 0 0 1 0 0 0 [4,] 0 0 0 1 1 1 [5,] 0 0 0 1 1 1 [6,] 0 0 0 1 1 1
看起来这应该是一项非常简单的任务.我该怎么办呢?
这是一种方法:
out <- lapply(unlist(strsplit("1+2,3,4+5+6", ",")), function(x) {
as.numeric(unlist(strsplit(x, "\\+")))
})
x <- table(unlist(out), rep(seq_along(out), sapply(out, length)))
matrix(x %*% t(x), nrow(x))
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] 1 1 0 0 0 0
## [2,] 1 1 0 0 0 0
## [3,] 0 0 1 0 0 0
## [4,] 0 0 0 1 1 1
## [5,] 0 0 0 1 1 1
## [6,] 0 0 0 1 1 1
Run Code Online (Sandbox Code Playgroud)