在dplyr中模拟时间序列而不是使用for循环

jeb*_*nes 6 r dplyr tidyverse

所以,虽然laglead在dplyr是伟大的,我想模仿的像人口增长的时间序列.我的旧学校代码看起来像:

tdf <- data.frame(time=1:5, pop=50)
for(i in 2:5){
  tdf$pop[i] = 1.1*tdf$pop[i-1]
}
Run Code Online (Sandbox Code Playgroud)

哪个产生

  time    pop
1    1 50.000
2    2 55.000
3    3 60.500
4    4 66.550
5    5 73.205
Run Code Online (Sandbox Code Playgroud)

我觉得必须有一个dplyrtidyverse办法做到这一点(像我爱我的for循环).

但是,像

tdf <- data.frame(time=1:5, pop=50) %>%
  mutate(pop = 1.1*lag(pop))
Run Code Online (Sandbox Code Playgroud)

这本来是我的第一个猜测就是产生

  time pop
1    1  NA
2    2  55
3    3  55
4    4  55
5    5  55
Run Code Online (Sandbox Code Playgroud)

我觉得我错过了一些明显的东西......它是什么?

注意 - 这是一个简单的例子 - 我的实例使用多个参数,其中许多是时变的(我在不同的GCM场景下模拟预测),因此,tidyverse被证明是将我的模拟结合在一起的强大工具.

ali*_*ire 8

Reduce(或者它的purrr变体,如果你愿意的话)是你想要的还没有cum*编写版本的累积函数:

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = Reduce(function(x, y){x * 1.1}, pop, accumulate = TRUE))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205
Run Code Online (Sandbox Code Playgroud)

或者用purrr,

data.frame(time = 1:5, pop = 50) %>%
    mutate(pop = accumulate(pop, ~.x * 1.1))

##   time    pop
## 1    1 50.000
## 2    2 55.000
## 3    3 60.500
## 4    4 66.550
## 5    5 73.205
Run Code Online (Sandbox Code Playgroud)

  • 假设您正在单独计算每个,使用其他形式的`mutate`,即`mutate_all`,`mutate_if`或`mutate_at`,将函数包装在`funs`中并用`.`替换列名,例如`mutate_all(funs(accumulate(.,〜.x*1.1)))` (2认同)

eip*_*i10 5

如果起始值pop是50,那么pop = 50 * 1.1^(0:4)将给出接下来的四个值.使用您的代码,您可以:

data.frame(time=1:5, pop=50) %>%
  mutate(pop = pop * 1.1^(1:n() - 1))
Run Code Online (Sandbox Code Playgroud)

要么,

base = 50

data.frame(time=1:5) %>%
  mutate(pop = base * 1.1^(1:n()-1))
Run Code Online (Sandbox Code Playgroud)


小智 5

Purrr 的累积函数可以处理时变索引,如果您将它们作为包含所有参数的列表传递给模拟函数。然而,要使其正常工作需要一些争论。这里的技巧是,accumulate() 可以作用于列表以及向量列。您可以使用tidyr函数 Nest() 将列分组到包含当前总体状态和参数的列表向量中,然后对结果列表列使用accumulate()。解释起来有点复杂,所以我提供了一个演示,以恒定增长率或时变随机增长率模拟物流增长。我还提供了一个示例,说明如何使用 dpylr+purrr+tidyr 来模拟给定模型的多个重复。

library(dplyr)
library(purrr)
library(ggplot2)
library(tidyr)

# Declare the population growth function. Note: the first two arguments
# have to be .x (the prior vector of populations and parameters) and .y,
# the current parameter value and population vector. 
# This example function is a Ricker population growth model. 
logistic_growth = function(.x, .y, growth, comp) {
  pop = .x$pop[1]
  growth = .y$growth[1]
  comp  = .y$comp[1]
  # Note: this uses the state from .x, and the parameter values from .y.
  # The first observation will use the first entry in the vector for .x and .y
  new_pop = pop*exp(growth - pop*comp)
  .y$pop[1] = new_pop
  return(.y)
}

# Starting parameters the number of time steps to simulate, initial population size,
# and ecological parameters (growth rate and intraspecific competition rate)
n_steps  = 100
pop_init = 1
growth = 0.5
comp = 0.05

#First test: fixed growth rates
test1 = data_frame(time = 1:n_steps,pop = pop_init, 
                   growth=growth,comp =comp)


# here, the combination of nest() and group_by() split the data into individual 
# time points and then groups all parameters into a new vector called state.
# ungroup() removes the grouping structure, then accumulate runs the function
#on the vector of states. Finally unnest transforms it all back to a
#data frame
out1 = test1 %>%
  group_by(time)%>%
  nest(pop, growth, comp,.key = state)%>%
  ungroup()%>%
  mutate(
    state = accumulate(state,logistic_growth))%>%
  unnest()

# This is the same example, except I drew the growth rates from a normal distribution
# with a mean equal to the mean growth rate and a std. dev. of 0.1
test2 = data_frame(time = 1:n_steps,pop = pop_init, 
                   growth=rnorm(n_steps, growth,0.1),comp=comp)

out2 = test2 %>%
  group_by(time)%>%
  nest(pop, growth, comp,.key = state)%>%
  ungroup()%>%
  mutate(
    state = accumulate(state,logistic_growth))%>%
  unnest()

# This demostrates how to use this approach to simulate replicates using dplyr
# Note the crossing function creates all combinations of its input values
test3 = crossing(rep = 1:10, time = 1:n_steps,pop = pop_init, comp=comp) %>%
  mutate(growth=rnorm(n_steps*10, growth,0.1))

out3 = test3 %>%
  group_by(rep)%>%
  group_by(rep,time)%>%
  nest(pop, growth, comp,.key = state)%>%
  group_by(rep)%>%
  mutate(
    state = accumulate(state,logistic_growth))%>%
  unnest()

print(qplot(time, pop, data=out1)+
        geom_line() +
        geom_point(data= out2, col="red")+
        geom_line(data=out2, col="red")+
        geom_point(data=out3, col="red", alpha=0.1)+
        geom_line(data=out3, col="red", alpha=0.1,aes(group=rep)))
Run Code Online (Sandbox Code Playgroud)