Dag*_*ann 4 r time-series decomposition
bfast 包中的函数bfast()应该能够检测长期趋势的断点和季节性成分的变化。下图就是一个例子(来源):

在此图中,子图编号。图 2 显示了检测到的季节性变化,而没有。图3显示了趋势中的断点。
但是,我不明白如何告诉bfast()寻找季节性的变化/断点。我得到的只是长期趋势中的断点。这是一个可重现的示例,通过每周测量季节性变量y(即每年 52 次测量)来模拟 50 年的时间序列:
n_years <- 50
freq <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y)
Run Code Online (Sandbox Code Playgroud)
这些数据显示了数据中恒定的季节性趋势,在第 13 周左右出现年度峰值。现在,让我们介绍第 25 年的季节性变化,将 26-59 年的季节性周期转移到 8 周后:
move_data <- function(data, year, weeks_to_move){
x <- data[data$Year == year, "y"]
c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}
mydata$y_shifted <- mydata$y
for (year in 26:50){
mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}
Run Code Online (Sandbox Code Playgroud)
现在,该变量y_shifted在 1-25 年的第 13 周左右达到年度峰值,在 26-52 年的第 21 周左右达到年度峰值。让我们将其与“未移动”变量进行比较y:
mydata$Phase <- ifelse(mydata$Year <= 25, "Year 1-25", "Year 26-50")
mydata %>%
tidyr::gather("y_variable", "value", y, y_shifted) %>%
ggplot(aes(Week, value, group = Year, color = Phase)) + geom_line() +
facet_grid(.~y_variable)
Run Code Online (Sandbox Code Playgroud)
[![]y 和 y_shifted[3] 的年度周期](https://i.stack.imgur.com/vX9zy.png)
这种季节性的突然转变应该很容易被发现。但是,当我运行 `bfast() 时,它没有检测到任何变化:
y_ts <- ts(mydata$y_shifted, start = c(1,1), frequency = freq)
fit <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)
Run Code Online (Sandbox Code Playgroud)
正如您所看到的,没有检测到季节性变化(上面的子图 2)。残差(子图 4)反映了季节性的变化,如果我们按一年中的某一天绘制残差,这一点就很清楚:
mydata$Residuals <- fit$output[[1]]$Nt
ggplot(mydata, aes(Week, Residuals, group = Year, color = Phase)) + geom_point()
Run Code Online (Sandbox Code Playgroud)
我有一种感觉,我需要更改一些参数或选项,以便寻找bfast()季节性变化,但是是哪一个呢?我无法从文档中挖掘出此信息。
我在测试我的消费者投资组合数据时遇到了同样的问题bfast,但未能找到任何真正的解决方案。我继续深入研究地球传感界的快速文献,这是bfast最早开发和广泛使用的地方。我的读物是,你几乎无能为力,无法让早餐始终适合有用的季节性成分。
几天前,我遇到了Quora关于时间序列分析最佳软件的讨论,发现有一个新的R包Rbeast用于断点检测和时间序列分解。还有一条很好的推文,显示了bfast 和 Rbeast 之间的快速比较。
经过一些实验,我发现Rbeast能够在我的数据和你的数据中查明季节性断点。坦白说,我仍然不知道如何Rbeast运作。中的BEAST算法Rbeast看起来相当复杂,有大量的输出;它没有很好的文档记录,也不像 bfast 那样易于使用。让我展示一下我得到的结果,首先使用您的数据,然后使用第二个人工时间序列。
# The original code to generate your data
n_years <- 50
freq <- 52
y_pattern <- sin(seq(0, 2*pi, length = freq))
y <- rep(y_pattern, n_years) + rnorm(freq*n_years, sd = 0.1)
mydata <- data.frame(Year = rep(1:n_years, each = freq), Week = rep(1:freq, n_years), y)
move_data <- function(data, year, weeks_to_move){
x <- data[data$Year == year, "y"]
c(x[seq(52 - weeks_to_move + 1,52)], x[seq(1, 52 - weeks_to_move)])
}
mydata$y_shifted <- mydata$y
for (year in 26:50){
mydata$y_shifted[mydata$Year == year] <- move_data(mydata, year, weeks_to_move = 8)
}
Run Code Online (Sandbox Code Playgroud)
# You data analyzed by the BEAST algorithm in Rbeast
library(Rbeast)
fit <- beast(mydata$y_shifted, freq=52)
print(fit)
plot(fit)
Run Code Online (Sandbox Code Playgroud)
#####################################################################
# Seasonal Changepoints #
#####################################################################
.-------------------------------------------------------------------.
| Ascii plot of probability distribution for number of chgpts (ncp) |
.-------------------------------------------------------------------.
|Pr(ncp = 0 )=0.000|* |
|Pr(ncp = 1 )=0.999|*********************************************** |
|Pr(ncp = 2 )=0.001|* |
|Pr(ncp = 3 )=0.000|* |
|Pr(ncp = 4 )=0.000|* |
|Pr(ncp = 5 )=0.000|* |
|Pr(ncp = 6 )=0.000|* |
|Pr(ncp = 7 )=0.000|* |
|Pr(ncp = 8 )=0.000|* |
|Pr(ncp = 9 )=0.000|* |
|Pr(ncp = 10)=0.000|* |
.-------------------------------------------------------------------.
| Summary for number of Seasonal ChangePoints (scp) |
.-------------------------------------------------------------------.
|ncp_max = 10 | MaxSeasonKnotNum: A parameter you set |
|ncp_mode = 1 | Pr(ncp= 1)=1.00: There is a 99.9% probability |
| | that the seasonal component has 1 chgnpt(s). |
|ncp_mean = 1.00 | Sum{ncp*Pr(ncp)} for ncp = 0,...,10 |
|ncp_pct10 = 1.00 | 10% percentile for number of changepoints |
|ncp_median = 1.00 | 50% percentile: Median number of changepoints |
|ncp_pct90 = 1.00 | 90% percentile for number of changepoints |
.-------------------------------------------------------------------.
| List of probable seasonal changepoints ranked by probability of |
| occurrence: Please combine the ncp reported above to determine |
| which changepoints below are practically meaningful |
'-------------------------------------------------------------------'
|scp# |time (cp) |prob(cpPr) |
|------------------|---------------------------|--------------------|
|1 |1301.000000 |1.00000 |
.-------------------------------------------------------------------.
Run Code Online (Sandbox Code Playgroud)
精确地检测到了突然的季节变化。Rbeast 还给出了检测季节性和趋势断点的概率(上图 Pr(scp) 和 Pr(tcp) 面板中的红色和绿色曲线)。检测到季节性变化的概率非常高,接近 1.0。数据的趋势是一条平坦的线。它本质上是一个零常数,并且在趋势中找到断点(即 Rbeast 中使用的变化点)的概率也完全接近于零。
一个很酷的功能Rbeast是估计调和季节模型的正弦和余弦阶数。下面,我生成了一个时间序列,其中包含三个季节性部分(即两个中断)以及一个没有中断的倾斜趋势。三个季节段的罪阶不同,分别取1、2、3。
# Generate a sample time series with three seasonal segments
# the sin/cos orders for the three segs are different.
seg1 <- 1:1000
seg2 <- 1001:2000
seg3 <- 2001:3000
new_data <- c( sin(seg1*2*pi/52), 0.6*sin( seg2*2*pi/52*2), 0.3*sin( seg3*2*pi/52*3)) + (1:3000)*0.0002+ rnorm(3000, sd = 0.1)
Run Code Online (Sandbox Code Playgroud)
# Test bfast using new_data
y_ts <- ts(new_data, start = c(1,1), frequency = 52)
fit <- bfast(y_ts, h=.15, season="harmonic", max.iter=20, breaks=3)
plot(fit)
Run Code Online (Sandbox Code Playgroud)

令人惊讶的是,bfast尽管这三个部分在绘制的数据中很容易被发现,但没有检测到任何季节性中断Yt。
# Analyze the new_data time series using `Rbeast`
fit <- beast(new_data, freq=52)
print(fit)
plot(fit)
Run Code Online (Sandbox Code Playgroud)
#####################################################################
# Seasonal Changepoints #
#####################################################################
.-------------------------------------------------------------------.
| Ascii plot of probability distribution for number of chgpts (ncp) |
.-------------------------------------------------------------------.
|Pr(ncp = 0 )=0.000|* |
|Pr(ncp = 1 )=0.000|* |
|Pr(ncp = 2 )=0.969|*********************************************** |
|Pr(ncp = 3 )=0.031|** |
|Pr(ncp = 4 )=0.000|* |
|Pr(ncp = 5 )=0.000|* |
|Pr(ncp = 6 )=0.000|* |
|Pr(ncp = 7 )=0.000|* |
|Pr(ncp = 8 )=0.000|* |
|Pr(ncp = 9 )=0.000|* |
|Pr(ncp = 10)=0.000|* |
.-------------------------------------------------------------------.
| Summary for number of Seasonal ChangePoints (scp) |
.-------------------------------------------------------------------.
|ncp_max = 10 | MaxSeasonKnotNum: A parameter you set |
|ncp_mode = 2 | Pr(ncp= 2)=0.97: There is a 96.9% probability |
| | that the seasonal component has 2 chgnpt(s). |
|ncp_mean = 2.03 | Sum{ncp*Pr(ncp)} for ncp = 0,...,10 |
|ncp_pct10 = 2.00 | 10% percentile for number of changepoints |
|ncp_median = 2.00 | 50% percentile: Median number of changepoints |
|ncp_pct90 = 2.00 | 90% percentile for number of changepoints |
.-------------------------------------------------------------------.
| List of probable seasonal changepoints ranked by probability of |
| occurrence: Please combine the ncp reported above to determine |
| which changepoints below are practically meaningful |
'-------------------------------------------------------------------'
|scp# |time (cp) |prob(cpPr) |
|------------------|---------------------------|--------------------|
|1 |2001.000000 |1.00000 |
|2 |1001.000000 |1.00000 |
|3 |1027.000000 |0.02942 |
.-------------------------------------------------------------------.
Run Code Online (Sandbox Code Playgroud)
以上是Rbeast的结果。两个休息时间和三个季节片段均已恢复。估计季节性谐波阶次的趋势没有中断 Rbeast。在上面的 Order_s 面板中,正确恢复了三个 sin 和 cos 阶。Order_s 曲线还显示了两个季节性中断的位置。