所以,虽然lag和lead在dplyr是伟大的,我想模仿的像人口增长的时间序列.我的旧学校代码看起来像:
tdf <- data.frame(time=1:5, pop=50)
for(i in 2:5){
tdf$pop[i] = 1.1*tdf$pop[i-1]
}
Run Code Online (Sandbox Code Playgroud)
哪个产生
time pop
1 1 50.000
2 2 55.000
3 3 60.500
4 4 66.550
5 5 73.205
Run Code Online (Sandbox Code Playgroud)
我觉得必须有一个dplyr或tidyverse办法做到这一点(像我爱我的for循环).
但是,像
tdf <- data.frame(time=1:5, pop=50) %>%
mutate(pop = 1.1*lag(pop))
Run Code Online (Sandbox Code Playgroud)
这本来是我的第一个猜测就是产生
time pop
1 1 NA
2 2 55
3 3 55
4 4 55
5 5 55
Run Code Online (Sandbox Code Playgroud)
我觉得我错过了一些明显的东西......它是什么?
注意 - 这是一个简单的例子 - 我的实例使用多个参数,其中许多是时变的(我在不同的GCM场景下模拟预测),因此,tidyverse被证明是将我的模拟结合在一起的强大工具.
Reduce(或者它的purrr变体,如果你愿意的话)是你想要的还没有cum*编写版本的累积函数:
data.frame(time = 1:5, pop = 50) %>%
mutate(pop = Reduce(function(x, y){x * 1.1}, pop, accumulate = TRUE))
## time pop
## 1 1 50.000
## 2 2 55.000
## 3 3 60.500
## 4 4 66.550
## 5 5 73.205
Run Code Online (Sandbox Code Playgroud)
或者用purrr,
data.frame(time = 1:5, pop = 50) %>%
mutate(pop = accumulate(pop, ~.x * 1.1))
## time pop
## 1 1 50.000
## 2 2 55.000
## 3 3 60.500
## 4 4 66.550
## 5 5 73.205
Run Code Online (Sandbox Code Playgroud)
如果起始值pop是50,那么pop = 50 * 1.1^(0:4)将给出接下来的四个值.使用您的代码,您可以:
data.frame(time=1:5, pop=50) %>%
mutate(pop = pop * 1.1^(1:n() - 1))
Run Code Online (Sandbox Code Playgroud)
要么,
base = 50
data.frame(time=1:5) %>%
mutate(pop = base * 1.1^(1:n()-1))
Run Code Online (Sandbox Code Playgroud)
小智 5
Purrr 的累积函数可以处理时变索引,如果您将它们作为包含所有参数的列表传递给模拟函数。然而,要使其正常工作需要一些争论。这里的技巧是,accumulate() 可以作用于列表以及向量列。您可以使用tidyr函数 Nest() 将列分组到包含当前总体状态和参数的列表向量中,然后对结果列表列使用accumulate()。解释起来有点复杂,所以我提供了一个演示,以恒定增长率或时变随机增长率模拟物流增长。我还提供了一个示例,说明如何使用 dpylr+purrr+tidyr 来模拟给定模型的多个重复。
library(dplyr)
library(purrr)
library(ggplot2)
library(tidyr)
# Declare the population growth function. Note: the first two arguments
# have to be .x (the prior vector of populations and parameters) and .y,
# the current parameter value and population vector.
# This example function is a Ricker population growth model.
logistic_growth = function(.x, .y, growth, comp) {
pop = .x$pop[1]
growth = .y$growth[1]
comp = .y$comp[1]
# Note: this uses the state from .x, and the parameter values from .y.
# The first observation will use the first entry in the vector for .x and .y
new_pop = pop*exp(growth - pop*comp)
.y$pop[1] = new_pop
return(.y)
}
# Starting parameters the number of time steps to simulate, initial population size,
# and ecological parameters (growth rate and intraspecific competition rate)
n_steps = 100
pop_init = 1
growth = 0.5
comp = 0.05
#First test: fixed growth rates
test1 = data_frame(time = 1:n_steps,pop = pop_init,
growth=growth,comp =comp)
# here, the combination of nest() and group_by() split the data into individual
# time points and then groups all parameters into a new vector called state.
# ungroup() removes the grouping structure, then accumulate runs the function
#on the vector of states. Finally unnest transforms it all back to a
#data frame
out1 = test1 %>%
group_by(time)%>%
nest(pop, growth, comp,.key = state)%>%
ungroup()%>%
mutate(
state = accumulate(state,logistic_growth))%>%
unnest()
# This is the same example, except I drew the growth rates from a normal distribution
# with a mean equal to the mean growth rate and a std. dev. of 0.1
test2 = data_frame(time = 1:n_steps,pop = pop_init,
growth=rnorm(n_steps, growth,0.1),comp=comp)
out2 = test2 %>%
group_by(time)%>%
nest(pop, growth, comp,.key = state)%>%
ungroup()%>%
mutate(
state = accumulate(state,logistic_growth))%>%
unnest()
# This demostrates how to use this approach to simulate replicates using dplyr
# Note the crossing function creates all combinations of its input values
test3 = crossing(rep = 1:10, time = 1:n_steps,pop = pop_init, comp=comp) %>%
mutate(growth=rnorm(n_steps*10, growth,0.1))
out3 = test3 %>%
group_by(rep)%>%
group_by(rep,time)%>%
nest(pop, growth, comp,.key = state)%>%
group_by(rep)%>%
mutate(
state = accumulate(state,logistic_growth))%>%
unnest()
print(qplot(time, pop, data=out1)+
geom_line() +
geom_point(data= out2, col="red")+
geom_line(data=out2, col="red")+
geom_point(data=out3, col="red", alpha=0.1)+
geom_line(data=out3, col="red", alpha=0.1,aes(group=rep)))
Run Code Online (Sandbox Code Playgroud)