我有一个类似于这个的大型数据框:
df <- data.frame(dive=factor(sample(c("dive1","dive2"),10,replace=TRUE)),speed=runif(10))
> df
dive speed
1 dive1 0.80668490
2 dive1 0.53349584
3 dive2 0.07571784
4 dive2 0.39518628
5 dive1 0.84557955
6 dive1 0.69121443
7 dive1 0.38124950
8 dive2 0.22536126
9 dive1 0.04704750
10 dive2 0.93561651
Run Code Online (Sandbox Code Playgroud)
我的目标是在另一列等于某个值时平均一列的值,并对所有值重复此值.即在上面的示例中,我想为列speed的每个唯一值返回列的平均值dive.所以当时dive==dive1,平均值speed是这个,依此类推dive.
我们来看以下数据:
dt <- data.table(TICKER=c(rep("ABC",10),"DEF"),
PERIOD=c(rep(as.Date("2010-12-31"),10),as.Date("2011-12-31")),
DATE=as.Date(c("2010-01-05","2010-01-07","2010-01-08","2010-01-09","2010-01-10","2010-01-11","2010-01-13","2010-04-01","2010-04-02","2010-08-03","2011-02-05")),
ID=c(1,2,1,3,1,2,1,1,2,2,1),VALUE=c(1.5,1.3,1.4,1.6,1.4,1.2,1.5,1.7,1.8,1.7,2.3))
setkey(dt,TICKER,PERIOD,ID,DATE)
Run Code Online (Sandbox Code Playgroud)
现在,对于每个股票代码/期间组合,我需要在新列中添加以下内容:
PRIORAVG:每个ID的最新VALUE的平均值,不包括当前ID,只要不超过180天.PREV:来自相同ID的先前值.结果应如下所示:
TICKER PERIOD DATE ID VALUE PRIORAVG PREV
[1,] ABC 2010-12-31 2010-01-05 1 1.5 NA NA
[2,] ABC 2010-12-31 2010-01-08 1 1.4 1.30 1.5
[3,] ABC 2010-12-31 2010-01-10 1 1.4 1.45 1.4
[4,] ABC 2010-12-31 2010-01-13 1 1.5 1.40 1.4
[5,] ABC 2010-12-31 2010-04-01 1 1.7 1.40 1.5
[6,] ABC 2010-12-31 2010-01-07 2 1.3 1.50 NA
[7,] ABC 2010-12-31 2010-01-11 2 1.2 1.50 1.3
[8,] ABC 2010-12-31 …Run Code Online (Sandbox Code Playgroud) 我有一个data.frame,其中每个基因名称都重复,并包含2个条件的值:
df <- data.frame(gene=c("A","A","B","B","C","C"),
condition=c("control","treatment","control","treatment","control","treatment"),
count=c(10, 2, 5, 8, 5, 1),
sd=c(1, 0.2, 0.1, 2, 0.8, 0.1))
gene condition count sd
1 A control 10 1.0
2 A treatment 2 0.2
3 B control 5 0.1
4 B treatment 8 2.0
5 C control 5 0.8
6 C treatment 1 0.1
Run Code Online (Sandbox Code Playgroud)
我想计算治疗后"计数"是否增加或减少,并将它们标记为和/或将它们分组.那是(伪代码):
for each unique(gene) do
if df[geneRow1,3]-df[geneRow2,3] > 0 then gene is "up"
else gene is "down"
Run Code Online (Sandbox Code Playgroud)
这应该是最终的样子(最后一列是可选的):
up-regulated
gene condition count sd regulation
B control 5 0.1 up …Run Code Online (Sandbox Code Playgroud)