使用正则表达式在R中提取特定长度的单词

Question

使用正则表达式在R中提取特定长度的单词

我有一个代码(我在这里得到):

m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow")

x<- gsub("\\<[a-z]\\{4,10\\}\\>","",m)
x

Run Code Online (Sandbox Code Playgroud)

我试过其他方法,比如

m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow")

x<- gsub("[^(\\b.{4,10}\\b)]","",m)
x

Run Code Online (Sandbox Code Playgroud)

我需要删除长度小于4或大于10的单词.我哪里错了？

Answer 1

ags*_*udy 12

  gsub("\\b[a-zA-Z0-9]{4,10}\\b", "", m) 
 "! # is gr8. I  likewhatishappening ! The  of   is ! the aforementioned  is ! #Wow"

Run Code Online (Sandbox Code Playgroud)

我们来解释正则表达式术语:

\ b匹配一个称为"单词边界"的位置.这个匹配是零长度.
[a-zA-Z0-9]:字母数字
{4,10}:{min,max}

如果你想得到这个的否定,你把它放在()之间,你拿// 1

gsub("([\\b[a-zA-Z0-9]{4,10}\\b])", "//1", m)

Run Code Online (Sandbox Code Playgroud)

"你好!#London是gr8.我真的很喜欢这里的东西!珠穆朗玛峰的alcomb很棒!前面提到的地方很棒!#Wow"

很有趣的是,在2 regexpr中存在4个字母的单词.

归档时间：	13 年，1 月前
查看次数：	4952 次
最近记录：	13 年，1 月前