相关疑难解决方法(0)

按组计算平均值

我有一个类似于这个的大型数据框:

df <- data.frame(dive=factor(sample(c("dive1","dive2"),10,replace=TRUE)),speed=runif(10))
> df
    dive      speed
1  dive1 0.80668490
2  dive1 0.53349584
3  dive2 0.07571784
4  dive2 0.39518628
5  dive1 0.84557955
6  dive1 0.69121443
7  dive1 0.38124950
8  dive2 0.22536126
9  dive1 0.04704750
10 dive2 0.93561651
Run Code Online (Sandbox Code Playgroud)

我的目标是在另一列等于某个值时平均一列的值,并对所有值重复此值.即在上面的示例中,我想为列speed的每个唯一值返回列的平均值dive.所以当时dive==dive1,平均值speed是这个,依此类推dive.

r dataframe r-faq

52
推荐指数
3
解决办法
8万
查看次数

R:使用data.table:=操作来计算新列

我们来看以下数据:

dt <- data.table(TICKER=c(rep("ABC",10),"DEF"),
        PERIOD=c(rep(as.Date("2010-12-31"),10),as.Date("2011-12-31")),
        DATE=as.Date(c("2010-01-05","2010-01-07","2010-01-08","2010-01-09","2010-01-10","2010-01-11","2010-01-13","2010-04-01","2010-04-02","2010-08-03","2011-02-05")),
        ID=c(1,2,1,3,1,2,1,1,2,2,1),VALUE=c(1.5,1.3,1.4,1.6,1.4,1.2,1.5,1.7,1.8,1.7,2.3))
setkey(dt,TICKER,PERIOD,ID,DATE)
Run Code Online (Sandbox Code Playgroud)

现在,对于每个股票代码/期间组合,我需要在新列中添加以下内容:

  • PRIORAVG:每个ID的最新VALUE的平均值,不包括当前ID,只要不超过180天.
  • PREV:来自相同ID的先前值.

结果应如下所示:

      TICKER     PERIOD       DATE ID VALUE PRIORAVG PREV
 [1,]    ABC 2010-12-31 2010-01-05  1   1.5       NA   NA
 [2,]    ABC 2010-12-31 2010-01-08  1   1.4     1.30  1.5
 [3,]    ABC 2010-12-31 2010-01-10  1   1.4     1.45  1.4
 [4,]    ABC 2010-12-31 2010-01-13  1   1.5     1.40  1.4
 [5,]    ABC 2010-12-31 2010-04-01  1   1.7     1.40  1.5
 [6,]    ABC 2010-12-31 2010-01-07  2   1.3     1.50   NA
 [7,]    ABC 2010-12-31 2010-01-11  2   1.2     1.50  1.3
 [8,]    ABC 2010-12-31 …
Run Code Online (Sandbox Code Playgroud)

r data.table

16
推荐指数
1
解决办法
2万
查看次数

计算数据帧中连续行对之间的差值 - R.

我有一个data.frame,其中每个基因名称都重复,并包含2个条件的值:

df <- data.frame(gene=c("A","A","B","B","C","C"),
condition=c("control","treatment","control","treatment","control","treatment"),
count=c(10, 2, 5, 8, 5, 1), 
sd=c(1, 0.2, 0.1, 2, 0.8, 0.1))

  gene condition count  sd
1    A   control    10 1.0
2    A treatment     2 0.2
3    B   control     5 0.1
4    B treatment     8 2.0
5    C   control     5 0.8
6    C treatment     1 0.1
Run Code Online (Sandbox Code Playgroud)

我想计算治疗后"计数"是否增加或减少,并将它们标记为和/或将它们分组.那是(伪代码):

for each unique(gene) do 
   if df[geneRow1,3]-df[geneRow2,3] > 0 then gene is "up"
       else gene is "down"
Run Code Online (Sandbox Code Playgroud)

这应该是最终的样子(最后一列是可选的):

up-regulated
 gene condition count  sd  regulation
 B    control     5    0.1    up …
Run Code Online (Sandbox Code Playgroud)

r

5
推荐指数
1
解决办法
1956
查看次数

标签 统计

r ×3

data.table ×1

dataframe ×1

r-faq ×1