我有一个名字向量:
> dput(vec_dup)
c("Mark", "Simon", "Marcus", "Greg", "Simon", "Greg", "Marta",
"Marta", "Tim", "Tim", "Greg", "Tom", "Tom", "Greg")
Run Code Online (Sandbox Code Playgroud)
一些名称在此向量中重复。我想向每个字符串添加特定字符_1
, _2
, _3
。添加的数字取决于它出现在向量中的时间以及之前出现的次数。
期望的输出:
vec_output <- c("Mark_1", "Simon_1", "Marcus_1", "Greg_1", "Simon_2", "Greg_2", "Marta_1",
"Marta_2", "Tim_1", "Tim_2", "Greg_3", "Tom_1", "Tom_2", "Greg_4")
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,它不仅仅是关于重复的字符串,因为Marcus
在字符串中只出现一次,并且仍然应该得到_1
. 如何有效地处理数千个字符串?
根据您的要求,您可以使用ave
相同的单词进行分组,并根据每组的顺序粘贴后缀,即
ave(vec_dup, vec_dup, FUN = function(i) paste0(i, '_', seq_along(i)))
#[1] "Mark_1" "Simon_1" "Marcus_1" "Greg_1" "Simon_2" "Greg_2" "Marta_1" "Marta_2" "Tim_1" "Tim_2" "Greg_3" "Tom_1" "Tom_2"
#[14] "Greg_4"
Run Code Online (Sandbox Code Playgroud)
如果您不关心向所有添加后缀而只是区分 dupes ,那么就make.unique
足够了,即
make.unique(vec_dup, sep = '_')
#[1] "Mark" "Simon" "Marcus" "Greg" "Simon_1" "Greg_1" "Marta" "Marta_1" "Tim" "Tim_1" "Greg_2" "Tom" "Tom_1" "Greg_3"
Run Code Online (Sandbox Code Playgroud)