用于在线异常检测的通用时间序列的简单算法

Gia*_*uca 11 math statistics real-time time-series

我正在处理大量的时间序列.这些时间序列基本上是每10分钟一次的网络测量,其中一些是周期性的(即带宽),而另一些则不是(即路由流量).

我想要一个简单的算法来进行在线"异常值检测".基本上,我想在内存(或在磁盘上)保存每个时间序列的整个历史数据,并且我想要检测实时场景中的任何异常值(每次捕获新样本时).实现这些结果的最佳方法是什么?

我目前正在使用移动平均线来消除一些噪音,但接下来是什么?简单的事情,如标准偏差,疯狂,...对整个数据集不能很好地工作(我不能假设时间序列是静止的),我想要一些更"准确"的东西,理想情况下是一个黑盒子,如:

double outlier_detection(double* vector, double value);
Run Code Online (Sandbox Code Playgroud)

其中vector是包含历史数据的double数组,返回值是新样本"value"的异常分数.

Pau*_*l R 9

This is a big and complex subject, and the answer will depend on (a) how much effort you want to invest in this and (b) how effective you want your outlier detection to be. One possible approach is adaptive filtering, which is typically used for applications like noise cancelling headphones, etc. You have a filter which constantly adapts to the input signal, effectively matching its filter coefficients to a hypothetical short term model of the signal source, thereby reducing mean square error output. This then gives you a low level output signal (the residual error) except for when you get an outlier, which will result in a spike, which will be easy to detect (threshold). Read up on adaptive filtering, LMS filters等等,如果你对这种技术很认真.