删除字符串中的重复字符

Hen*_*rro 4 regex r

这个问题可能与此问题有关

不幸的是,那里给出的解决方案不适用于我的数据。

我有以下矢量示例:

example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")
Run Code Online (Sandbox Code Playgroud)

我当然希望没有重复的相同字符串,即:

  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"
Run Code Online (Sandbox Code Playgroud)

那可能吗?

Cat*_*ath 10

您可以sub为此使用,直接捕获pattern零件中所需的位:

sub("(.+)\\1", "\\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"
Run Code Online (Sandbox Code Playgroud)

(.+)允许捕获某种模式并\\1显示您刚刚捕获的内容,因此您要查找的内容是“两次”,然后将其替换为相同的“任何”,但仅一次。

  • 没意识到您可以在模式本身内部使用`\\ 1`!谢谢! (2认同)

Spa*_*man 5

如果所有字符串都重复了,那么它们的长度是需要的两倍,所以取每个字符串的前半部分:

> substr(example, 1, nchar(example)/2)
 [1] "Children"                      "Clothing and shoes"           
 [3] "Education, health and beauty"  "Leisure activities, traveling"
 [5] "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"      
 [9] "Transportation"                "Utility services"             
Run Code Online (Sandbox Code Playgroud)

  • 但是,它的确确实取决于重复。@Cath的解决方案具有以下特性:如果不重复字符串,那么您将获得整个字符串,而不是像我的代码那样返回整个字符串的一半。 (2认同)