假设我有4个向量:
a <- c("Mark","Kate","Greg", "Mathew")
b <- c("Mark","Tobias","Mary", "Mathew", "Greg")
c <- c("Mary","Chuck","Igor", "Mathew", "Robin", "Tobias")
d <- c("Kate","Mark","Igor", "Greg", "Robin", "Mathew")
Run Code Online (Sandbox Code Playgroud)
我想从这些向量中选择重叠名称,并假设名称必须出现在这4个向量中的至少3个中.当然,我希望能够轻松地使用名称必须存在的向量百分比.
我能intersect以某种方式修改吗?
我认为这会奏效.我们使用该table功能来完成大部分繁重的工作.
find_perc <- function(..., perc = .75){
list_len <- length(list(...)) # how many vectors
tab_it <- table(c(...)) # tabulate all the names
tab_it_perc <- tab_it / list_len # calculate the frequencies
names(tab_it_perc[tab_it_perc >= perc]) # return those with freq >= perc
}
> find_perc(a, b, c, d)
[1] "Greg" "Mark" "Mathew"
> find_perc(a, b, c, d, perc = .5)
[1] "Greg" "Igor" "Kate" "Mark" "Mary" "Mathew" "Robin" "Tobias"
Run Code Online (Sandbox Code Playgroud)