确定满足两个条件的独特观测值,然后去除R.

use*_*642 0 r operators grepl data.table

我的df如下:

data
   names  fruit
7   john  apple
13  john orange
14  john  apple
2   mary orange
5   mary  apple
8   mary orange
10  mary  apple
12  mary  apple
1    tom  apple
6    tom  apple
Run Code Online (Sandbox Code Playgroud)

我想做两件事.首先,计算具有苹果和橙色(即2玛丽和约翰)的独特观察的数量.

之后,我想将它们从我的数据框中删除,这样我就只剩下只有苹果的独特个体.

这就是我尝试过的

toremove<-unique(data[data$fruit=='apple' & data$fruit=='orange',"names"])  ##this part doesn't work, if it had I would have used the below code to remove the names identified
data2<-data[!data$names %in% toremove,]
Run Code Online (Sandbox Code Playgroud)

真的,我想使用grepl,因为我的真实数据比水果更复杂.这是我尝试过的(首先转换为data.table)

data1<-data.table(data1)
z<-data1[,ind := grepl('app.*? & orang.*?', fruit), by='names']  ## this works fine when i just use 'app.*?' but collapses when I try to add the & sign, so I'm making an error with the operator. In addition the by='names' doesn't work out for me, which is important. My plan here was to create an indicator (if an individual has an apple and an orange, then they get an indicator==1 and I would then filter them out on the basis of this indicator). 
Run Code Online (Sandbox Code Playgroud)

因此,总的来说,我的问题在于识别同时拥有苹果和橙子的人.这看起来很简单,所以请随意指导我一个可以教我这个的资源!

期望的输出

names fruit
1   tom apple
6   tom apple
Run Code Online (Sandbox Code Playgroud)

Dav*_*urg 6

如果您只查找带有apples的名称,这是一个简单的data.table方法

setDT(data)[ , if(all(fruit == "apple")) .SD, by = names]
#    names fruit
# 1:   tom apple
# 2:   tom apple
Run Code Online (Sandbox Code Playgroud)

对于同时具有"苹果"和"橙色"计数的独特观察,您可以执行类似的操作

data[, any(fruit == "apple") & any(fruit == "orange"), by = names][, sum(V1)]
## [1] 2 
Run Code Online (Sandbox Code Playgroud)

最后,如果你所寻找的只是一个唯一的用户fruit,你可以尝试使用GH(或)uniqueNdevel版本进行条件化length(unique())

data[, if(uniqueN(fruit) < 2L) .SD, by = names]
#    names fruit
# 1:   tom apple
# 2:   tom apple
Run Code Online (Sandbox Code Playgroud)

  • 因此你可以将第一行调整为类似`setDT(data)[,if(sum(grepl("apple",fruit))== .N).SD,by = names]`以便得到你想要的产量 (2认同)