删除字符串中的重复字符

Question

删除字符串中的重复字符

这个问题可能与此问题有关。

不幸的是，那里给出的解决方案不适用于我的数据。

我有以下矢量示例：

example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")

Run Code Online (Sandbox Code Playgroud)

我当然希望没有重复的相同字符串，即：

  > result
 [1]   "Children" "Clothing and shoes" "Education, health and beauty"

Run Code Online (Sandbox Code Playgroud)

那可能吗？

Answer 1

Cat*_*ath 10

您可以sub为此使用，直接捕获pattern零件中所需的位：

sub("(.+)\\1", "\\1", example)
 #[1] "Children"                      "Clothing and shoes"            "Education, health and beauty"  "Leisure activities, traveling" "Loans"                        
 #[6] "Loans and financial services"  "Personal transfers"            "Savings and investments"       "Transportation"                "Utility services"

Run Code Online (Sandbox Code Playgroud)

(.+)允许捕获某种模式并\\1显示您刚刚捕获的内容，因此您要查找的内容是“两次”，然后将其替换为相同的“任何”，但仅一次。

没意识到您可以在模式本身内部使用`\\ 1`！谢谢！ (2认同)

Answer 2

Spa*_*man 5

如果所有字符串都重复了，那么它们的长度是需要的两倍，所以取每个字符串的前半部分：

> substr(example, 1, nchar(example)/2)
 [1] "Children"                      "Clothing and shoes"           
 [3] "Education, health and beauty"  "Leisure activities, traveling"
 [5] "Loans"                         "Loans and financial services" 
 [7] "Personal transfers"            "Savings and investments"      
 [9] "Transportation"                "Utility services"

Run Code Online (Sandbox Code Playgroud)

但是，它的确确实取决于重复。@Cath的解决方案具有以下特性：如果不重复字符串，那么您将获得整个字符串，而不是像我的代码那样返回整个字符串的一半。 (2认同)

归档时间：	7 年，2 月前
查看次数：	73 次
最近记录：	7 年，2 月前