并行化R中的滚动窗口回归

Zac*_*ach 11 finance r quantmod

我正在运行一个非常类似于以下代码的滚动回归:

library(PerformanceAnalytics)
library(quantmod)
data(managers)

FL <- as.formula(Next(HAM1)~HAM1+HAM2+HAM3+HAM4)
MyRegression <- function(df,FL) {
  df <- as.data.frame(df)
  model <- lm(FL,data=df[1:30,])
  predict(model,newdata=df[31,])
}

system.time(Result <- rollapply(managers, 31, FUN="MyRegression",FL,
    by.column = FALSE, align = "right", na.pad = TRUE))
Run Code Online (Sandbox Code Playgroud)

我有一些额外的处理器,所以我试图找到一种方法来并行化滚动窗口.如果这是一个非滚动回归,我可以使用apply系列函数轻松地并行化它...

Rei*_*son 9

显而易见的是使用lm.fit()而不是lm()因为你不会在处理公式等时产生所有开销.

更新:所以,当我明确说明我想说的是明显的,但看似难以实现!

经过一番摆弄后,我想出了这个

library(PerformanceAnalytics)
library(quantmod)
data(managers)
Run Code Online (Sandbox Code Playgroud)

第一阶段是要意识到模型矩阵可以预先构建,所以我们这样做并将其转换回Zoo对象以用于rollapply():

mmat2 <- model.frame(Next(HAM1) ~ HAM1 + HAM2 + HAM3 + HAM4, data = managers, 
                     na.action = na.pass)
mmat2 <- cbind.data.frame(mmat2[,1], Intercept = 1, mmat2[,-1])
mmatZ <- as.zoo(mmat2)
Run Code Online (Sandbox Code Playgroud)

现在我们需要一个函数lm.fit()来完成繁重的工作,而不必在每次迭代时创建设计矩阵:

MyRegression2 <- function(Z) {
    ## store value we want to predict for
    pred <- Z[31, -1, drop = FALSE]
    ## get rid of any rows with NA in training data
    Z <- Z[1:30, ][!rowSums(is.na(Z[1:30,])) > 0, ]
    ## Next() would lag and leave NA in row 30 for response
    ## but we precomputed model matrix, so drop last row still in Z
    Z <- Z[-nrow(Z),]
    ## fit the model
    fit <- lm.fit(Z[, -1, drop = FALSE], Z[,1])
    ## get things we need to predict, in case pivoting turned on in lm.fit
    p <- fit$rank
    p1 <- seq_len(p)
    piv <- fit$qr$pivot[p1]
    ## model coefficients
    beta <- fit$coefficients
    ## this gives the predicted value for row 31 of data passed in
    drop(pred[, piv, drop = FALSE] %*% beta[piv])
}
Run Code Online (Sandbox Code Playgroud)

时间比较:

> system.time(Result <- rollapply(managers, 31, FUN="MyRegression",FL,
+                                 by.column = FALSE, align = "right", 
+                                 na.pad = TRUE))
   user  system elapsed 
  0.925   0.002   1.020 
> 
> system.time(Result2 <- rollapply(mmatZ, 31, FUN = MyRegression2,
+                                  by.column = FALSE,  align = "right",
+                                  na.pad = TRUE))
   user  system elapsed 
  0.048   0.000   0.05
Run Code Online (Sandbox Code Playgroud)

这比原版提供了相当合理的改进.现在检查生成的对象是否相同:

> all.equal(Result, Result2)
[1] TRUE
Run Code Online (Sandbox Code Playgroud)

请享用!