这个问题可能与此问题有关。
不幸的是,那里给出的解决方案不适用于我的数据。
我有以下矢量示例:
example<-c("ChildrenChildren", "Clothing and shoesClothing and shoes","Education, health and beautyEducation, health and beauty", "Leisure activities, travelingLeisure activities, traveling","LoansLoans","Loans and financial servicesLoans and financial services" ,"Personal transfersPersonal transfers" ,"Savings and investmentsSavings and investments","TransportationTransportation","Utility servicesUtility services")
Run Code Online (Sandbox Code Playgroud)
我当然希望没有重复的相同字符串,即:
> result
[1] "Children" "Clothing and shoes" "Education, health and beauty"
Run Code Online (Sandbox Code Playgroud)
那可能吗?
Cat*_*ath 10
您可以sub为此使用,直接捕获pattern零件中所需的位:
sub("(.+)\\1", "\\1", example)
#[1] "Children" "Clothing and shoes" "Education, health and beauty" "Leisure activities, traveling" "Loans"
#[6] "Loans and financial services" "Personal transfers" "Savings and investments" "Transportation" "Utility services"
Run Code Online (Sandbox Code Playgroud)
(.+)允许捕获某种模式并\\1显示您刚刚捕获的内容,因此您要查找的内容是“两次”,然后将其替换为相同的“任何”,但仅一次。
如果所有字符串都重复了,那么它们的长度是需要的两倍,所以取每个字符串的前半部分:
> substr(example, 1, nchar(example)/2)
[1] "Children" "Clothing and shoes"
[3] "Education, health and beauty" "Leisure activities, traveling"
[5] "Loans" "Loans and financial services"
[7] "Personal transfers" "Savings and investments"
[9] "Transportation" "Utility services"
Run Code Online (Sandbox Code Playgroud)
| 归档时间: |
|
| 查看次数: |
73 次 |
| 最近记录: |