提取字符串中两个符号之间的所有内容

Gia*_*uca 6 regex r gsub

我有一个包含一些名字的向量.我想在每一行中提取标题,基本上是","(包括空格)和"."之间的所有内容.

> head(combi$Name)
[1] "Braund, Mr. Owen Harris"
[2] "Cumings, Mrs. John Bradley (Florence Briggs Thayer)"
[3] "Heikkinen, Miss. Laina"
[4] "Futrelle, Mrs. Jacques Heath (Lily May Peel)"
[5] "Allen, Mr. William Henry"
[6] "Moran, Mr. James"
Run Code Online (Sandbox Code Playgroud)

我想gsub可能会有用但我很难找到合适的正则表达式来满足我的需求.

G. *_*eck 10

1)sub Withsub

> sub(".*, ([^.]*)\\..*", "\\1", Name)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"  
Run Code Online (Sandbox Code Playgroud)

1a)子变体这种方法gsub也有效:

> sub(".*, |\\..*", "", Name)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"  
Run Code Online (Sandbox Code Playgroud)

2)strapplycstrapplyc在gusbfn包中使用它可以用更简单的正则表达式完成:

> library(gsubfn)
>
> strapplyc(Name, ", ([^.]*)\\.", simplify = TRUE)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"  
Run Code Online (Sandbox Code Playgroud)

2a)strapplyc变异 这个似乎有最简单的正则表达式.

> library(gsubfn)
>
> sapply(strapplyc(Name, "\\w+"), "[", 2)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"  
Run Code Online (Sandbox Code Playgroud)

3)strsplit第三种方式是使用strsplit

> sapply(strsplit(Name, ", |\\."), "[", 2)
[1] "Mr"   "Mrs"  "Miss" "Mrs"  "Mr"   "Mr"  
Run Code Online (Sandbox Code Playgroud)

添加其他解决方案 改gsubsub(虽然gsub作品太).