我有一个像CSV的文件
Market,CampaignName,Identity
Wells Fargo,Gary IN MetroChicago IL Metro,56
EMC,Los Angeles CA MetroBoston MA Metro,78
Apple,Cupertino CA Metro,68
Run Code Online (Sandbox Code Playgroud)
所需输出到CSV文件,第一行作为标题
Market,City,State,Identity
Wells Fargo,Gary,IN,56
Wells Fargo,Chicago,IL,56
EMC,Los Angeles,CA,78
EMC,Boston,MA,78
Apple,Cupertino,CA,68
res <-
gsub('(.*) ([A-Z]{2})*Metro (.*) ([A-Z]{2}) .*','\\1,\\2:\\3,\\4',
xx$Market)
Run Code Online (Sandbox Code Playgroud)
如何修改上面的正则表达式以获得R中的结果?R的新手,任何帮助表示赞赏.
library(stringr)
xx.to.split <- with(xx, setNames(gsub("Metro", "", as.character(CampaignName)), Market))
do.call(rbind, str_match_all(xx.to.split, "(.+?) ([A-Z]{2}) ?"))[, -1]
Run Code Online (Sandbox Code Playgroud)
生产:
[,1] [,2]
Wells Fargo "Gary" "IN"
Wells Fargo "Chicago" "IL"
EMC "Los Angeles" "CA"
EMC "Boston" "MA"
Apple "Cupertino" "CA"
Run Code Online (Sandbox Code Playgroud)
即使您在每个市场中拥有不同数量的Compaign Name,这也应该有效.不幸的是,我认为基本选项很难实现,因为令人沮丧的是没有gregexec
,尽管如果有人想出一些比较紧凑的东西,我会很好奇.