更改data.table中的多个列

Question

更改data.table中的多个列

我正在寻找一种方法来操作R中的data.table中的多个列.由于我必须动态地处理列以及第二个输入,所以我无法找到答案.

这个想法是通过将所有值除以日期值来索引某个日期的两个或更多系列,例如:

set.seed(132)
# simulate some data
dt <- data.table(date = seq(from = as.Date("2000-01-01"), by = "days", length.out = 10),
                 X1 = cumsum(rnorm(10)),
                 X2 = cumsum(rnorm(10)))

# set a date for the index
indexDate <- as.Date("2000-01-05")

# get the column names to be able to select the columns dynamically
cols <- colnames(dt)
cols <- cols[substr(cols, 1, 1) == "X"]

Run Code Online (Sandbox Code Playgroud)

第1部分:Easy data.frame/apply方法

df <- as.data.frame(dt)
# get the right rownumber for the indexDate
rownum <- max((1:nrow(df))*(df$date==indexDate))

# use apply to iterate over all columns
df[, cols] <- apply(df[, cols], 
                    2, 
                    function(x, i){x / x[i]}, i = rownum)

Run Code Online (Sandbox Code Playgroud)

第2部分:(快速)data.table方法 到目前为止,我的data.table方法如下所示:

for(nam in cols) {
  div <- as.numeric(dt[rownum, nam, with = FALSE])
  dt[ , 
     nam := dt[,nam, with = FALSE] / div,
     with=FALSE]
}

Run Code Online (Sandbox Code Playgroud)

特别是所有with = FALSE看起来都不像data.table一样.

您知道更快/更优雅的方式来执行此操作吗？

任何想法都非常感谢!

Answer 1

akr*_*run 9

一种选择是使用,set因为这涉及多个列.使用的优点set是它可以避免开销 [.data.table并使其更快.

library(data.table)
for(j in cols){
  set(dt, i=NULL, j=j, value= dt[[j]]/dt[[j]][rownum])
}

Run Code Online (Sandbox Code Playgroud)

或者稍微慢一点的选择

dt[, (cols) :=lapply(.SD, function(x) x/x[rownum]), .SDcols=cols]

Run Code Online (Sandbox Code Playgroud)

归档时间：	10 年，8 月前
查看次数：	1649 次
最近记录：	10 年，8 月前