如何仅在第一个数字上拆分字符串

Jes*_*sse 7 regex r strsplit

所以我有一个街道地址的数据集,它们的格式非常不同.例如:

d <- c("street1234", "Street 423", "Long Street 12-14", "Road 18A", "Road 12 - 15", "Road 1/2")
Run Code Online (Sandbox Code Playgroud)

从这里我想创建两列.1. X:街道地址和2. Y:数字+随后的所有内容.像这样:

X           Y
Street      1234
Street      423
Long Street 12-14
Road        18A
Road        12 - 15
Road        1/2
Run Code Online (Sandbox Code Playgroud)

到目前为止,我已尝试过strsplit,并在此处遵循了一些类似的问题,例如:strsplit(d, split = "(?<=[a-zA-Z])(?=[0-9])", perl = T)).我似乎无法找到正确的正则表达式.

任何帮助都非常感谢.先感谢您!

Wik*_*żew 8

字母和数字之间可能有空格,因此\s*在外观之间添加(零个或多个空白符号):

> strsplit(d, split = "(?<=[a-zA-Z])\\s*(?=[0-9])", perl = TRUE)
[[1]]
[1] "street" "1234"  

[[2]]
[1] "Street" "423"   

[[3]]
[1] "Long Street" "12-14"      

[[4]]
[1] "Road" "18A" 

[[5]]
[1] "Road"    "12 - 15"

[[6]]
[1] "Road" "1/2" 
Run Code Online (Sandbox Code Playgroud)

如果你想根据它创建列,你可以利用separatefrom tidyr包:

> library(tidyr)
> separate(data.frame(A = d), col = "A" , into = c("X", "Y"), sep = "(?<=[a-zA-Z])\\s*(?=[0-9])")
            X       Y
1      street    1234
2      Street     423
3 Long Street   12-14
4        Road     18A
5        Road 12 - 15
6        Road     1/2
Run Code Online (Sandbox Code Playgroud)