roa*_*dom 5 datatable r subset dataframe
我有dataframe一个如下:
x <- data.table(Tickers=c("A","A","A","B","B","B","B","D","D","D","D"),
Type=c("put","call","put","call","call","put","call","put","call","put","call"),
Strike=c(35,37.5,37.5,10,11,11,12,40,40,42,42),
Other=sample(20,11))
Tickers Type Strike Other
1: A put 35.0 6
2: A call 37.5 5
3: A put 37.5 13
4: B call 10.0 15
5: B call 11.0 12
6: B put 11.0 4
7: B call 12.0 20
8: D put 40.0 7
9: D call 40.0 11
10: D put 42.0 10
11: D call 42.0 1
Run Code Online (Sandbox Code Playgroud)
我正在尝试分析数据的一个子集.我想要的子集是数据,其中ticker和strike是相同的.但是,如果a put和a都call存在,我也只想获取这些数据type.以上面的数据为例,我想返回以下结果:
x[c(2,3,5,6,8:11),]
Tickers Type Strike Other
1: A call 37.5 5
2: A put 37.5 13
3: B call 11.0 12
4: B put 11.0 4
5: D put 40.0 7
6: D call 40.0 11
7: D put 42.0 10
8: D call 42.0 1
Run Code Online (Sandbox Code Playgroud)
我不确定这样做的最佳方法是什么.我的思维过程是我应该创建另一个列向量
x$id <- paste(x$Tickers,x$Strike,sep="_")
Run Code Online (Sandbox Code Playgroud)
然后使用此向量仅拉出有多个id的值.
x[x$id %in% x$id[duplicated(x$id)],]
Tickers Type Strike Other id
1: A call 37.5 5 A_37.5
2: A put 37.5 13 A_37.5
3: B call 11.0 12 B_11
4: B put 11.0 4 B_11
5: D put 40.0 7 D_40
6: D call 40.0 11 D_40
7: D put 42.0 10 D_42
8: D call 42.0 1 D_42
Run Code Online (Sandbox Code Playgroud)
我不确定这是多么有效,因为我的实际数据包含更多行.此外,该解决方案不检查type存在一个put和一个的情况call.
我也道歉,标题的措辞可能会好很多
编辑:::检查了这篇文章找到所有重复的行,包括"具有较小下标的元素"
我也可以使用这个解决方案:
x$id <- paste(x$Tickers,x$Strike,sep="_")
x[duplicated(x$id) | duplicated(x$id,fromLast=T),]
Run Code Online (Sandbox Code Playgroud)
你可以尝试这样的事情:
x[, select := (.N >= 2 & all(c("put", "call") %in% unique(Type))), by = .(Tickers, Strike)][which(select)]
# Tickers Type Strike Other select
#1: A call 37.5 17 TRUE
#2: A put 37.5 16 TRUE
#3: B call 11.0 11 TRUE
#4: B put 11.0 20 TRUE
#5: D put 40.0 1 TRUE
#6: D call 40.0 12 TRUE
#7: D put 42.0 6 TRUE
#8: D call 42.0 2 TRUE
Run Code Online (Sandbox Code Playgroud)
另一个想法可能是合并:
x[x, on = .(Tickers, Strike), select := (length(Type) >= 2 & all(c("put", "call") %in% Type)),by = .EACHI][which(select)]
Run Code Online (Sandbox Code Playgroud)
我不完全确定如何绕过分组操作,因为您想确保每个组都有“调用”和“放置”。我正在考虑使用键,但无法合并“调用”/“放置”方面。