Des*_*set 3 split r street-address
我想将地址解析(提取)到HouseNumber和Streetname.我以后应该能够将提取的"值"写入新列(商店$ HouseNumber和商店$ Streetname).
所以我想说我有一个名为"商店"的数据框:
> shops
Name city street
1 Something Fakecity New Street 3
2 SomethingOther Fakecity Some-Complicated-Casestreet 1-3
3 SomethingDifferent Fakecity Fake Street 14a
Run Code Online (Sandbox Code Playgroud)
那么有没有办法将街道列分成两个列表,一个是街道名称,一个是房屋号码,包括"1-3","14a"等情况,所以最后,结果可以分配给数据框架和外观.
> shops
Name city Streetname HouseNumber
1 Something Fakecity New Street 3
2 SomethingOther Fakecity Some-Complicated-Casestreet 1-3
3 SomethingDifferent Fakecity Fake Street 14a
Run Code Online (Sandbox Code Playgroud)
示例:Easyfakestreet 5 - > Easyfakestreet,5
由于我的一些街道字符串将具有带连字符的街道地址并且具有非数字组件,因此稍微复杂一些.
示例:
New Street 3 - > ['New Street','3']
Some-Complicated-Casestreet 1-3 - > ['Some-Complicated-Casestreet','1-3']
Fake Street 14a - > ['假街','14a']
我将不胜感激!
这是一个可能的tidyr解决方案
library(tidyr)
extract(df, "street", c("Streetname", "HouseNumber"), "(\\D+)(\\d.*)")
# Name city Streetname HouseNumber
# 1 Something Fakecity New Street 3
# 2 SomethingOther Fakecity Some-Complicated-Casestreet 1-3
# 3 SomethingDifferent Fakecity Fake Street 14a
Run Code Online (Sandbox Code Playgroud)
你可以试试:
shops$Streetname <- gsub("(.+)\\s[^ ]+$","\\1", shops$street)
shops$HousNumber <- gsub(".+\\s([^ ]+)$","\\1", shops$street)
Run Code Online (Sandbox Code Playgroud)
数据
shops$street
#[1] "New Street 3" "Some-Complicated-Casestreet 1-3" "Fake Street 14a"
Run Code Online (Sandbox Code Playgroud)
结果
shops$Streetname
#[1] "New Street" "Some-Complicated-Casestreet" "Fake` Street"
shops$HousNumber
#[1] "3" "1-3" "14a"
Run Code Online (Sandbox Code Playgroud)