如何将字符串拆分为给定长度的子字符串?

Mad*_*Seb 24 string split r

我有一个字符串,如:

"aabbccccdd"

我想将此字符串分解为长度为2的子字符串向量:

"aa" "bb" "cc" "cc" "dd"

GSe*_*See 49

这是一种方式

substring("aabbccccdd", seq(1, 9, 2), seq(2, 10, 2))
#[1] "aa" "bb" "cc" "cc" "dd"
Run Code Online (Sandbox Code Playgroud)

或更一般地说

text <- "aabbccccdd"
substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
#[1] "aa" "bb" "cc" "cc" "dd"
Run Code Online (Sandbox Code Playgroud)

编辑:这要快得多

sst <- strsplit(text, "")[[1]]
out <- paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])
Run Code Online (Sandbox Code Playgroud)

它首先将字符串拆分为字符.然后,它将偶数元素和奇数元素粘贴在一起.

计时

text <- paste(rep(paste0(letters, letters), 1000), collapse="")
g1 <- function(text) {
    substring(text, seq(1, nchar(text)-1, 2), seq(2, nchar(text), 2))
}
g2 <- function(text) {
    sst <- strsplit(text, "")[[1]]
    paste0(sst[c(TRUE, FALSE)], sst[c(FALSE, TRUE)])
}
identical(g1(text), g2(text))
#[1] TRUE
library(rbenchmark)
benchmark(g1=g1(text), g2=g2(text))
#  test replications elapsed relative user.self sys.self user.child sys.child
#1   g1          100  95.451 79.87531    95.438        0          0         0
#2   g2          100   1.195  1.00000     1.196        0          0         0
Run Code Online (Sandbox Code Playgroud)

  • 太棒了!第二个版本真的很快! (2认同)

min*_*nda 11

string <- "aabbccccdd"
# total length of string
num.chars <- nchar(string)

# the indices where each substr will start
starts <- seq(1,num.chars, by=2)

# chop it up
sapply(starts, function(ii) {
  substr(string, ii, ii+1)
})
Run Code Online (Sandbox Code Playgroud)

这使

[1] "aa" "bb" "cc" "cc" "dd"
Run Code Online (Sandbox Code Playgroud)


Sve*_*ein 11

有两种简单的可能性:

s <- "aabbccccdd"
Run Code Online (Sandbox Code Playgroud)
  1. gregexpr并且regmatches:

    regmatches(s, gregexpr(".{2}", s))[[1]]
    # [1] "aa" "bb" "cc" "cc" "dd"
    
    Run Code Online (Sandbox Code Playgroud)
  2. strsplit:

    strsplit(s, "(?<=.{2})", perl = TRUE)[[1]]
    # [1] "aa" "bb" "cc" "cc" "dd"
    
    Run Code Online (Sandbox Code Playgroud)