如何检测字符串中特定字符集的位置范围

Question

如何检测字符串中特定字符集的位置范围

sca*_*der 6 regex string r stringr tidyverse

我有以下顺序：

my_seq <- "----?????-----?V?D????-------???IL??A?---"

Run Code Online (Sandbox Code Playgroud)

我想做的是检测非虚线字符的位置范围。

----?????-----?V?D????-------???IL??A?---
|   |   |     |      |       |       |  
1   5   9    15     22      30      38

Run Code Online (Sandbox Code Playgroud)

最终输出将是一个字符串向量：

out <- c("5-9", "15-22", "30-38")

Run Code Online (Sandbox Code Playgroud)

我怎样才能用 R 实现这一目标？

Answer 1

lov*_*ery 10

stringr请在下面找到使用该库的另一种可能的解决方案

雷普莱克斯

代码

library(stringr)

s <- as.data.frame(str_locate_all(my_seq, "[^-]+")[[1]])
result <- paste(s$start, s$end, sep ="-")

Run Code Online (Sandbox Code Playgroud)

输出

result
#> [1] "5-9"   "15-22" "30-38"

Run Code Online (Sandbox Code Playgroud)

^{由reprex 包于 2022 年 2 月 18 日创建(v2.0.1)}

Answer 2

All*_*ron 6

你可以这样做：

my_seq <- "----?????-----?V?D????-------???IL??A?---"

non_dash <- which(strsplit(my_seq, "")[[1]] != '-')
pos      <- non_dash[c(0, diff(non_dash)) != 1 | c(diff(non_dash), 0) != 1]

apply(matrix(pos, ncol = 2, byrow = TRUE), 1, function(x) paste(x, collapse = "-"))
#> [1] "5-9"   "15-22" "30-38"

Run Code Online (Sandbox Code Playgroud)

^{由reprex 包于 2022 年 2 月 18 日创建(v2.0.1)}

Answer 3

Maë*_*aël 5

受到@lovalery 的精彩回答的启发，base R解决方案是：

g <- gregexpr(pattern = "[^-]+", my_seq)
d <-data.frame(start = unlist(g), 
           end = unlist(g) + attr(g[[1]], "match.length") - 1)
paste(s$start, s$end, sep ="-")
# [1] "1-5"   "11-18" "26-34"

Run Code Online (Sandbox Code Playgroud)

Answer 4

jbl*_*d94 5

底座 R 中的单行utf8ToInt

apply(matrix(which(diff(c(FALSE, utf8ToInt(my_seq) != 45L, FALSE)) != 0) - 0:1, 2), 2, paste, collapse = "-")
#> [1] "5-9"   "15-22" "30-38"

Run Code Online (Sandbox Code Playgroud)

归档时间：	3 年，6 月前
查看次数：	217 次
最近记录：	3 年，6 月前