我有一个包含一些名字的向量.我想在每一行中提取标题,基本上是","(包括空格)和"."之间的所有内容.
> head(combi$Name)
[1] "Braund, Mr. Owen Harris"
[2] "Cumings, Mrs. John Bradley (Florence Briggs Thayer)"
[3] "Heikkinen, Miss. Laina"
[4] "Futrelle, Mrs. Jacques Heath (Lily May Peel)"
[5] "Allen, Mr. William Henry"
[6] "Moran, Mr. James"
Run Code Online (Sandbox Code Playgroud)
我想gsub
可能会有用但我很难找到合适的正则表达式来满足我的需求.
G. *_*eck 10
1)sub Withsub
> sub(".*, ([^.]*)\\..*", "\\1", Name)
[1] "Mr" "Mrs" "Miss" "Mrs" "Mr" "Mr"
Run Code Online (Sandbox Code Playgroud)
1a)子变体这种方法gsub
也有效:
> sub(".*, |\\..*", "", Name)
[1] "Mr" "Mrs" "Miss" "Mrs" "Mr" "Mr"
Run Code Online (Sandbox Code Playgroud)
2)strapplyc或strapplyc
在gusbfn包中使用它可以用更简单的正则表达式完成:
> library(gsubfn)
>
> strapplyc(Name, ", ([^.]*)\\.", simplify = TRUE)
[1] "Mr" "Mrs" "Miss" "Mrs" "Mr" "Mr"
Run Code Online (Sandbox Code Playgroud)
2a)strapplyc变异 这个似乎有最简单的正则表达式.
> library(gsubfn)
>
> sapply(strapplyc(Name, "\\w+"), "[", 2)
[1] "Mr" "Mrs" "Miss" "Mrs" "Mr" "Mr"
Run Code Online (Sandbox Code Playgroud)
3)strsplit第三种方式是使用strsplit
> sapply(strsplit(Name, ", |\\."), "[", 2)
[1] "Mr" "Mrs" "Miss" "Mrs" "Mr" "Mr"
Run Code Online (Sandbox Code Playgroud)
添加其他解决方案 改gsub
到sub
(虽然gsub
作品太).
归档时间: |
|
查看次数: |
774 次 |
最近记录: |