字符串拆分为R中的最后一个逗号

Jiq*_*ang 9 string split r comma

我不是R的新手,但我对正则表达式相对较新.

类似的问题可以在这里找到.

一个例子是我使用

> strsplit("UK, USA, Germany", ", ")
[[1]]
[1] "UK"      "USA"     "Germany"
Run Code Online (Sandbox Code Playgroud)

但我想得到

[[1]]
[1] "UK, USA"     "Germany"
Run Code Online (Sandbox Code Playgroud)

另一个例子是

> strsplit("London, Washington, D.C., Berlin", ", ")
[[1]]
[1] "London"     "Washington" "D.C."       "Berlin"  
Run Code Online (Sandbox Code Playgroud)

而且我想得到

[[1]]
[1] "London, Washington, D.C."       "Berlin"  
Run Code Online (Sandbox Code Playgroud)

绝对 华盛顿特区不应该分成两部分,只能用最后一个逗号分隔,而不是每个逗号.

我认为一种可行的方法是用其他东西替换最后一个逗号,例如

$, #, *, ...
Run Code Online (Sandbox Code Playgroud)

然后用

strsplit() 
Run Code Online (Sandbox Code Playgroud)

用你替换的那个来分割字符串(确保它是唯一的!),但是如果你能直接使用一些内置函数处理问题,我会更高兴.

那我该怎么办呢?非常感谢

Tyl*_*ker 12

这是一种方法:

strsplit("UK, USA, Germany", ",(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" " Germany"
Run Code Online (Sandbox Code Playgroud)

你可能想要:

strsplit("UK, USA, Germany", ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
Run Code Online (Sandbox Code Playgroud)

如果逗号后面没有空格,它将匹配:

strsplit(c("UK, USA, Germany", "UK, USA,Germany"), ",\\s*(?=[^,]+$)", perl=TRUE)

## [[1]]
## [1] "UK, USA" "Germany"
## 
## [[2]]
## [1] "UK, USA" "Germany"
Run Code Online (Sandbox Code Playgroud)


bar*_*nus 6

您可以使用包中的stri_split功能stringi

x <- "USA,UK,Poland"
stri_split_fixed(x,",") # standard split by comma
[[1]]
[1] "USA"    "UK"     "Poland"

stri_split_fixed(x,",",n = 2) # set the max number of elements
[[1]]
[1] "USA"       "UK,Poland"
Run Code Online (Sandbox Code Playgroud)

不幸的是,没有参数来改变分裂的起点(从开始/结束),但我们可以用另一种方式处理 - 使用 stri_reverse

stri_split_fixed(stri_reverse(x),",",n = 2) #reverse
[[1]]
[1] "dnaloP" "KU,ASU"

stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]]) #reverse back
[1] "Poland" "USA,UK"
stri_reverse(stri_split_fixed(stri_reverse(x),",",n = 2)[[1]])[2:1] #and again :)
[1] "USA,UK" "Poland"
Run Code Online (Sandbox Code Playgroud)