根据其他列的条件拉取数据帧行的子集

roa*_*dom 5 datatable r subset dataframe

我有dataframe一个如下:

x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
                Type=c("put","call","put","call","call","put","call","put","call","put","call"),
                Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
                Other=sample(20,11))

    Tickers Type Strike Other
 1:       A  put   35.0     6
 2:       A call   37.5     5
 3:       A  put   37.5    13
 4:       B call   10.0    15
 5:       B call   11.0    12
 6:       B  put   11.0     4
 7:       B call   12.0    20
 8:       D  put   40.0     7
 9:       D call   40.0    11
10:       D  put   42.0    10
11:       D call   42.0     1
Run Code Online (Sandbox Code Playgroud)

我正在尝试分析数据的一个子集.我想要的子集是数据,其中tickerstrike是相同的.但是,如果a put和a都call存在,我也只想获取这些数据type.以上面的数据为例,我想返回以下结果:

x[c(2,3,5,6,8:11),]

   Tickers Type Strike Other
1:       A call   37.5     5
2:       A  put   37.5    13
3:       B call   11.0    12
4:       B  put   11.0     4
5:       D  put   40.0     7
6:       D call   40.0    11
7:       D  put   42.0    10
8:       D call   42.0     1
Run Code Online (Sandbox Code Playgroud)

我不确定这样做的最佳方法是什么.我的思维过程是我应该创建另一个列向量

x$id <- paste(x$Tickers,x$Strike,sep="_")
Run Code Online (Sandbox Code Playgroud)

然后使用此向量仅拉出有多个id的值.

x[x$id %in% x$id[duplicated(x$id)],]

   Tickers Type Strike Other     id
1:       A call   37.5     5 A_37.5
2:       A  put   37.5    13 A_37.5
3:       B call   11.0    12   B_11
4:       B  put   11.0     4   B_11
5:       D  put   40.0     7   D_40
6:       D call   40.0    11   D_40
7:       D  put   42.0    10   D_42
8:       D call   42.0     1   D_42
Run Code Online (Sandbox Code Playgroud)

我不确定这是多么有效,因为我的实际数据包含更多行.此外,该解决方案不检查type存在一个put和一个的情况call.

我也道歉,标题的措辞可能会好很多

编辑:::检查了这篇文章找到所有重复的行,包括"具有较小下标的元素"

我也可以使用这个解决方案:

x$id <- paste(x$Tickers,x$Strike,sep="_")
x[duplicated(x$id) | duplicated(x$id,fromLast=T),]
Run Code Online (Sandbox Code Playgroud)

Mik*_* H. 3

你可以尝试这样的事情:

x[, select := (.N >= 2 & all(c("put", "call") %in% unique(Type))), by = .(Tickers, Strike)][which(select)]

#   Tickers Type Strike Other select
#1:       A call   37.5    17   TRUE
#2:       A  put   37.5    16   TRUE
#3:       B call   11.0    11   TRUE
#4:       B  put   11.0    20   TRUE
#5:       D  put   40.0     1   TRUE
#6:       D call   40.0    12   TRUE
#7:       D  put   42.0     6   TRUE
#8:       D call   42.0     2   TRUE
Run Code Online (Sandbox Code Playgroud)

另一个想法可能是合并:

x[x, on = .(Tickers, Strike), select := (length(Type) >= 2 & all(c("put", "call") %in% Type)),by = .EACHI][which(select)]
Run Code Online (Sandbox Code Playgroud)

我不完全确定如何绕过分组操作,因为您想确保每个组都有“调用”和“放置”。我正在考虑使用键,但无法合并“调用”/“放置”方面。