使用正则表达式在R中提取特定长度的单词

jac*_*ger 10 regex string r

我有一个代码(我在这里得到):

m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow")

x<- gsub("\\<[a-z]\\{4,10\\}\\>","",m)
x
Run Code Online (Sandbox Code Playgroud)

我试过其他方法,比如

m<- c("Hello! #London is gr8. I really likewhatishappening here! The alcomb of Mount Everest is excellent! the aforementioned place is amazing! #Wow")

x<- gsub("[^(\\b.{4,10}\\b)]","",m)
x
Run Code Online (Sandbox Code Playgroud)

我需要删除长度小于4或大于10的单词.我哪里错了?

ags*_*udy 12

  gsub("\\b[a-zA-Z0-9]{4,10}\\b", "", m) 
 "! # is gr8. I  likewhatishappening ! The  of   is ! the aforementioned  is ! #Wow"
Run Code Online (Sandbox Code Playgroud)

我们来解释正则表达式术语:

  1. \ b匹配一个称为"单词边界"的位置.这个匹配是零长度.
  2. [a-zA-Z0-9]:字母数字
  3. {4,10}:{min,max}

如果你想得到这个的否定,你把它放在()之间,你拿// 1

gsub("([\\b[a-zA-Z0-9]{4,10}\\b])", "//1", m) 
Run Code Online (Sandbox Code Playgroud)

"你好!#London是gr8.我真的很喜欢这里的东西!珠穆朗玛峰的alcomb很棒!前面提到的地方很棒!#Wow"

很有趣的是,在2 regexpr中存在4个字母的单词.