我有一个数据帧如下:
chr leftPos ZScore1 ZScore2 ZScore3 ZScore4
1 24352 34 43 19 43
1 53534 2 1 -1 -9
2 34 -15 7 -9 -18
3 3443 -100 -4 4 -9
3 3445 -100 -1 6 -1
3 3667 5 -5 9 5
3 7882 -8 -9 1 3
Run Code Online (Sandbox Code Playgroud)
我想只保留那些具有相同chr且具有ZScore的相邻列的行在同一方向上的行.换句话说,如果该chr之前或之后的行具有相同的符号(正或负),则应该保留一行.我希望在列名中运行ZS的所有列,以便输出最终只是满足每行标准的行数.
对于一列,代码应该导致:
chr leftPos ZScore
1 24352 34
1 53534 2
3 3443 -100
3 3445 -100
Run Code Online (Sandbox Code Playgroud)
但最终的输出应该是这样的
ZScore1 ZScore2 ZScore3 ZScore4
nrow 4 6 4 4
I have tried bits of code but Im not even really sure how to approach this.
Run Code Online (Sandbox Code Playgroud)
我想我会按照chr进行分组,然后查看上面的行是否与当前行的正面或负面相同,然后查看下面的行是否与当前行的方向相同.然后移动到该chr的下一行.
试试这个包 dplyr
library(dplyr)
Run Code Online (Sandbox Code Playgroud)
数据
df <- data.frame(chr=c(1, 1, 2, 3, 3, 3, 3),
leftPos=c(24352, 53534, 34, 3443, 3445, 3667, 7882),
ZScore=c(34, 2, -15, -100, -100, 5, -8))
Run Code Online (Sandbox Code Playgroud)
码
df %>% group_by(chr) %>%
filter(sign(ZScore)==sign(lag(ZScore)) | sign(ZScore)==sign(lead(ZScore))) %>%
ungroup
Run Code Online (Sandbox Code Playgroud)
使用开发版本的选项data.table(类似于@dimitris_ps 帖子中的方法)。安装开发版本的说明是here
library(data.table)#v1.9.5
na.omit(setDT(df)[, {tmp= sign(ZScore)
.SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')] },
by=chr])
# chr leftPos ZScore
#1: 1 24352 34
#2: 1 53534 2
#3: 3 3443 -100
#4: 3 3445 -100
Run Code Online (Sandbox Code Playgroud)
我们可以创建一个函数
f1 <- function(dat, ZCol){
na.omit(as.data.table(dat)[, {tmp = sign(eval(as.name(ZCol)))
.SD[tmp==shift(tmp) | tmp==shift(tmp, type='lead')]},
by=chr])[, list(.N)]}
nm1 <- paste0('ZScore', 1:4)
setnames(do.call(cbind,lapply(nm1, function(x) f1(df1, x))), nm1)[]
# ZScore1 ZScore2 ZScore3 ZScore4
#1: 4 6 4 4
Run Code Online (Sandbox Code Playgroud)
或者我们可以使用set
res <- as.data.table(matrix(0, ncol=4, nrow=1,
dimnames=list(NULL, nm1)))
for(j in seq_along(nm1)){
set(res, i=NULL, j=j, value=f1(df1,nm1[j]))
}
res
# ZScore1 ZScore2 ZScore3 ZScore4
#1: 4 6 4 4
Run Code Online (Sandbox Code Playgroud)
df <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
leftPos = c(24352L,
53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore = c(34L, 2L,
-15L, -100L, -100L, 5L, -8L)), .Names = c("chr", "leftPos", "ZScore"
), class = "data.frame", row.names = c(NA, -7L))
df1 <- structure(list(chr = c(1L, 1L, 2L, 3L, 3L, 3L, 3L),
leftPos = c(24352L,
53534L, 34L, 3443L, 3445L, 3667L, 7882L), ZScore1 = c(34L, 2L,
-15L, -100L, -100L, 5L, -8L), ZScore2 = c(43L, 1L, 7L, -4L, -1L,
-5L, -9L), ZScore3 = c(19L, -1L, -9L, 4L, 6L, 9L, 1L),
ZScore4 = c(43L,
-9L, -18L, -9L, -1L, 5L, 3L)), .Names = c("chr", "leftPos",
"ZScore1", "ZScore2", "ZScore3", "ZScore4"), class = "data.frame",
row.names = c(NA, -7L))
Run Code Online (Sandbox Code Playgroud)