将字符矩阵转换为 R 中的字符串矩阵

Question

将字符矩阵转换为 R 中的字符串矩阵

我有一个很大的字符矩阵，我想将它转换为一个字符串矩阵，但没有单独遍历每一行，所以我想知道有没有一种聪明的方法可以快速做到这一点，我尝试使用 paste(data[,4 :((i*2)+3)],collapse="")，但是我的问题是它将所有行组合成一个非常大的字符串，而我需要具有与原始矩阵相同的初始行数，每一行包含一列，它是包含该特定行中字符的字符串，换句话说：我想转换矩阵

a=
{
D  E  R  P  G  K  I
S  K  P  A  S  L  N
S  K  P  A  S  L  N
S  K  P  A  S  L  N
S  K  P  A  S  L  N
}

Run Code Online (Sandbox Code Playgroud)

进入

a=
{
 DERPGKI
 SKPASLN
 SKPASLN
 SKPASLN
 SKPASLN
}

Run Code Online (Sandbox Code Playgroud)

Answer 1

A5C*_*2T1 5

apply是一个循环，但在这种情况下它应该仍然非常有效。它的用途是：

apply(x, 1, paste, collapse = "")

Run Code Online (Sandbox Code Playgroud)

或者，您可以尝试：

do.call(paste0, data.frame(x))

Run Code Online (Sandbox Code Playgroud)

这实际上可能更快......

一个可重复的例子（不知道为什么我在这里浪费时间）......

x <- structure(c("D", "S", "S", "S", "S", "E", "K", "K", "K", "K", 
                 "R", "P", "P", "P", "P", "P", "A", "A", "A", "A", 
                 "G", "S", "S", "S", "S", "K", "L", "L", "L", "L", 
                 "I", "N", "N", "N", "N"), .Dim = c(5L, 7L))
x
#      [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "D"  "E"  "R"  "P"  "G"  "K"  "I" 
# [2,] "S"  "K"  "P"  "A"  "S"  "L"  "N" 
# [3,] "S"  "K"  "P"  "A"  "S"  "L"  "N" 
# [4,] "S"  "K"  "P"  "A"  "S"  "L"  "N" 
# [5,] "S"  "K"  "P"  "A"  "S"  "L"  "N"

Run Code Online (Sandbox Code Playgroud)

让我们比较一下选项：

library(microbenchmark)

fun1 <- function(inmat) apply(inmat, 1, paste, collapse = "")
fun2 <- function(inmat) do.call(paste0, data.frame(inmat))

fun1(x)
# [1] "DERPGKI" "SKPASLN" "SKPASLN" "SKPASLN" "SKPASLN"
fun2(x)
# [1] "DERPGKI" "SKPASLN" "SKPASLN" "SKPASLN" "SKPASLN"

microbenchmark(fun1(x), fun2(x))
# Unit: microseconds
#     expr      min        lq    median        uq      max neval
#  fun1(x)   97.634  104.4805  112.0725  117.7735  268.503   100
#  fun2(x) 1258.000 1282.6275 1301.5555 1316.5015 1576.506   100

Run Code Online (Sandbox Code Playgroud)

而且，在更长的数据上。

X <- do.call(rbind, replicate(100000, x, simplify=FALSE))
dim(X)
# [1] 500000      7

microbenchmark(fun1(X), fun2(X), times = 10)
# Unit: milliseconds
#     expr       min        lq    median       uq      max neval
#  fun1(X) 4189.8940 4226.9354 4382.0403 4570.032 4596.983    10
#  fun2(X)  825.9816  835.4351  888.5102 1031.509 1056.832    10

Run Code Online (Sandbox Code Playgroud)

我怀疑在更广泛的数据上，apply仍然会更有效率。

归档时间：	11 年，10 月前
查看次数：	5322 次
最近记录：	11 年，10 月前