我有一个可能有重复字符模式的字符串,例如
'xyzzyxxyzzyxxyzzyx'
Run Code Online (Sandbox Code Playgroud)
我需要编写一个正则表达式,用它最小的重复模式替换这样的字符串:
'xyzzyxxyzzyxxyzzyx' becomes 'xyzzyx',
'abcbaccbaabcbaccbaabcbaccba' becomes 'abcbaccba'
Run Code Online (Sandbox Code Playgroud) 这个问题可能与此问题有关。
不幸的是,那里给出的解决方案不适用于我的数据。
我有以下矢量示例:
example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")
Run Code Online (Sandbox Code Playgroud)
我当然希望没有重复的相同字符串,即:
> result
[1] "Children" "Clothing and shoes" "Education, health and beauty"
Run Code Online (Sandbox Code Playgroud)
那可能吗?
假设以下向量:
x <- c("/default/img/irs/irs/irs/irs/irs/irs/irs/irs/irs/irs/irs/irs/IRS.html/", "something/repeat/repeat_this")
Run Code Online (Sandbox Code Playgroud)
我想检查一下被括起来的单词是否/重复(请注意,/字符串的开头和结尾可能会丢失).我在这里发现了以下辉煌的正则表达式但是(在我删除特殊字符后)我似乎无法修改它以适合我的情况:
grepl("\\b(\\S+?)\\1\\S*\\b", x, perl = TRUE)
# [1] TRUE TRUE
Run Code Online (Sandbox Code Playgroud)
我总是可以在列表上str_split(x, "/")迭代duplicated()函数并使用if()语句,但这样效率非常低.
期望的结果应该是具有TRUE或FALSE(或1和0)的向量.