Jea*_*aul 195 language-agnostic algorithm signal-processing time-series data-analysis
更新:迄今为止 表现最佳的算法就是这个算法.
该问题探讨了用于检测实时时间序列数据中的突然峰值的稳健算法.
请考虑以下数据集:
p = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1 1 1 1.1 0.9 1 1.1 1 1 0.9 1, ...
1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1 1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1 1 3, ...
2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];
Run Code Online (Sandbox Code Playgroud)
(Matlab格式,但它不是关于语言,而是关于算法)
你可以清楚地看到有三个大峰和一些小峰.此数据集是问题所涉及的时间序列数据集类的特定示例.这类数据集有两个一般特征:
我们还假设如下:
对于这种情况,需要构造触发信号的边界值.但是,边界值不能是静态的,必须基于算法实时确定.
我的问题:什么是实时计算这些阈值的好算法?这种情况是否有特定的算法?什么是最知名的算法?
强大的算法或有用的见解都受到高度赞赏.(可以用任何语言回答:它是关于算法的)
Jea*_*aul 273
我构建了一种适用于这些类型数据集的算法.它基于分散原理:如果新的数据点是距离某些移动平均值的给定x个标准偏差,则该算法发信号(也称为z得分).该算法非常稳健,因为它构造了单独的移动平均值和偏差,使得信号不会破坏阈值.因此,无论先前信号的量如何,未来信号都以大致相同的精度被识别.该算法需要3个输入:lag = the lag of the moving window
,threshold = the z-score at which the algorithm signals
和influence = the influence (between 0 and 1) of new signals on the mean and standard deviation
.例如,a lag
的5将使用最后5个观察值来平滑数据.threshold
如果数据点与移动平均值相差3.5个标准差,则3.5的A 将发出信号.influence
0.5的一个信号给出正常数据点具有一半影响的信号.同样,influence
0为完全忽略信号以重新计算新阈值.因此,0的影响是最稳健的选择(但假设平稳性); 将影响选项置于1是最不稳健的.因此,对于非平稳数据,影响选项应该放在介于0和1之间的位置.
它的工作原理如下:
伪代码
# Let y be a vector of timeseries data of at least length lag+2
# Let mean() be a function that calculates the mean
# Let std() be a function that calculates the standard deviaton
# Let absolute() be the absolute value function
# Settings (the ones below are examples: choose what is best for your data)
set lag to 5; # lag 5 for the smoothing functions
set threshold to 3.5; # 3.5 standard deviations for signal
set influence to 0.5; # between 0 and 1, where 1 is normal influence, 0.5 is half
# Initialise variables
set signals to vector 0,...,0 of length of y; # Initialize signal results
set filteredY to y(1),...,y(lag) # Initialize filtered series
set avgFilter to null; # Initialize average filter
set stdFilter to null; # Initialize std. filter
set avgFilter(lag) to mean(y(1),...,y(lag)); # Initialize first value
set stdFilter(lag) to std(y(1),...,y(lag)); # Initialize first value
for i=lag+1,...,t do
if absolute(y(i) - avgFilter(i-1)) > threshold*stdFilter(i-1) then
if y(i) > avgFilter(i-1) then
set signals(i) to +1; # Positive signal
else
set signals(i) to -1; # Negative signal
end
# Make influence lower
set filteredY(i) to influence*y(i) + (1-influence)*filteredY(i-1);
else
set signals(i) to 0; # No signal
set filteredY(i) to y(i);
end
# Adjust the filters
set avgFilter(i) to mean(filteredY(i-lag),...,filteredY(i));
set stdFilter(i) to std(filteredY(i-lag),...,filteredY(i));
end
Run Code Online (Sandbox Code Playgroud)
可以在下面找到为数据选择好参数的经验法则.
可以在此处找到此演示的Matlab代码.要使用该演示,只需运行它并通过单击上方图表自行创建时间序列.在绘制lag
观察次数后,算法开始工作.
对于原始问题,此算法在使用以下设置时将提供以下输出lag = 30, threshold = 5, influence = 0
:
lag
:滞后参数决定了数据的平滑程度以及算法对数据长期平均值变化的适应性.数据越稳定,您应该包含的滞后越多(这应该会提高算法的稳健性).如果您的数据包含随时间变化的趋势,您应该考虑您希望算法适应这些趋势的速度.也就是说,如果你把lag
它放在10,那么在将算法的阈值调整为长期平均值的任何系统变化之前需要10个"周期".因此,请lag
根据数据的趋势行为以及您希望算法的自适应性选择参数.
influence
:此参数确定信号对算法检测阈值的影响.如果置为0,则信号对阈值没有影响,使得基于阈值来检测未来信号,该阈值利用不受过去信号影响的平均值和标准偏差来计算.考虑这个问题的另一种方法是,如果你将影响力置于0,你就会隐含地假设平稳性(即无论有多少信号,时间序列总是在长期内恢复到相同的平均值).如果不是这种情况,则应将影响参数置于介于0和1之间的某个位置,具体取决于信号可以系统地影响数据的时变趋势的程度.例如,如果信号导致时间序列的长期平均值的结构性中断,则影响参数应该被置高(接近1),因此阈值可以快速调整到这些变化.
threshold
:阈值参数是移动平均值的标准偏差数,高于该值时算法将新数据点分类为信号.例如,如果新数据点高于移动平均值4.0标准偏差且阈值参数设置为3.5,则算法将数据点识别为信号.应根据预期的信号数设置此参数.例如,如果您的数据是正态分布的,则阈值(或:z分数)为3.5对应于0.00047的信令概率(来自此表),这意味着您希望每2128个数据点(1/0.00047)发出一次信号.因此,阈值直接影响算法的灵敏度,从而也影响算法发出信号的频率.检查您自己的数据并确定一个合理的阈值,使算法在您需要时发出信号(此处可能需要一些试错,以达到您的目的的良好阈值).
警告:上面的代码每次运行时都会循环遍历所有数据点.实现此代码时,请确保将信号的计算拆分为单独的函数(没有循环).然后当一个新的数据点到达时,更新filteredY
,avgFilter
然后stdFilter
一次.每当有新的数据点(如上例所示)时,不要重新计算所有数据的信号,这将是非常低效和缓慢的!
修改算法的其他方法(可能的改进)包括:
influence
为mean和std 创建一个单独的参数(在此Swift转换中完成)Perkins,P.,Heber,S.(2018).利用基于Z分数的峰值检测算法识别核糖体停留位点.IEEE第8届生物和医学科学计算进展国际会议(ICCABS),ISBN:978-1-5386-8520-4.
Moore,J.,Goffin,P.,Meyer,M.,Lundrigan,P.,Patwari,N.,Sward,K.,&Wiese,J.(2018).通过传感,注释和可视化空气质量数据来管理家庭环境.关于交互式,移动,可穿戴和普适技术的ACM会议记录,2(3),128.
Lo,O.,Buchanan,WJ,Griffiths,P.和Macfarlane,R.(2018),Distance Measurement Methods for Improved Insider Threat Detection,Security and Communication Networks,Vol.2018年,文章ID 5906368.
Scirea,M.(2017).情感音乐的产生及其对运动员体验的影响.博士论文,哥本哈根IT大学,数字设计.
Scirea,M.,Eklund,P.,Togelius,J.,&Risi,S.(2017).Primal-improv:走向共同进化的音乐即兴创作.计算机科学与电子工程(CEEC),2017年(第172-177页).IEEE.
Willems,P.(2017).情绪控制了老年人的情感氛围,硕士论文,特温特大学.
Catalbas,MC,Cegovnik,T.,Sodnik,J.和Gulten,A.(2017).基于扫视眼运动的驾驶员疲劳检测,第10届国际电气和电子工程会议(ELECO),第913-917页.
Ciocirdel,GD和Varga,M.(2016).基于维基百科网页浏览的选举预测.项目论文,Vrije Universiteit Amsterdam.
如果你在某个地方使用这个功能,请相信我或这个答案.如果您对此算法有任何疑问,请在下面的评论中发布或在LinkedIn上与我联系.
R K*_*lev 37
这是平滑的z-score算法的Python
/ numpy
实现(参见上面的答案).你可以在这里找到要点.
#!/usr/bin/env python
# Implementation of algorithm from https://stackoverflow.com/a/22640362/6029703
import numpy as np
import pylab
def thresholding_algo(y, lag, threshold, influence):
signals = np.zeros(len(y))
filteredY = np.array(y)
avgFilter = [0]*len(y)
stdFilter = [0]*len(y)
avgFilter[lag - 1] = np.mean(y[0:lag])
stdFilter[lag - 1] = np.std(y[0:lag])
for i in range(lag, len(y)):
if abs(y[i] - avgFilter[i-1]) > threshold * stdFilter [i-1]:
if y[i] > avgFilter[i-1]:
signals[i] = 1
else:
signals[i] = -1
filteredY[i] = influence * y[i] + (1 - influence) * filteredY[i-1]
avgFilter[i] = np.mean(filteredY[(i-lag+1):i+1])
stdFilter[i] = np.std(filteredY[(i-lag+1):i+1])
else:
signals[i] = 0
filteredY[i] = y[i]
avgFilter[i] = np.mean(filteredY[(i-lag+1):i+1])
stdFilter[i] = np.std(filteredY[(i-lag+1):i+1])
return dict(signals = np.asarray(signals),
avgFilter = np.asarray(avgFilter),
stdFilter = np.asarray(stdFilter))
Run Code Online (Sandbox Code Playgroud)
下面是对同一数据集的测试,该数据集产生的原始答案与R
/Matlab
# Data
y = np.array([1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1])
# Settings: lag = 30, threshold = 5, influence = 0
lag = 30
threshold = 5
influence = 0
# Run algo with settings from above
result = thresholding_algo(y, lag=lag, threshold=threshold, influence=influence)
# Plot result
pylab.subplot(211)
pylab.plot(np.arange(1, len(y)+1), y)
pylab.plot(np.arange(1, len(y)+1),
result["avgFilter"], color="cyan", lw=2)
pylab.plot(np.arange(1, len(y)+1),
result["avgFilter"] + threshold * result["stdFilter"], color="green", lw=2)
pylab.plot(np.arange(1, len(y)+1),
result["avgFilter"] - threshold * result["stdFilter"], color="green", lw=2)
pylab.subplot(212)
pylab.step(np.arange(1, len(y)+1), result["signals"], color="red", lw=2)
pylab.ylim(-1.5, 1.5)
pylab.show()
Run Code Online (Sandbox Code Playgroud)
aha*_*aha 23
一种方法是基于以下观察来检测峰值:
它通过等到上升趋势结束来避免误报.它并不完全是"实时",因为它将错过一个dt的峰值.可以通过要求比较余量来控制灵敏度.在噪声检测和检测时间延迟之间存在折衷.您可以通过添加更多参数来丰富模型:
其中dt和m是控制灵敏度与时间延迟的参数
这就是你提到的算法:
这是在python中重现绘图的代码:
import numpy as np
import matplotlib.pyplot as plt
input = np.array([ 1. , 1. , 1. , 1. , 1. , 1. , 1. , 1.1, 1. , 0.8, 0.9,
1. , 1.2, 0.9, 1. , 1. , 1.1, 1.2, 1. , 1.5, 1. , 3. ,
2. , 5. , 3. , 2. , 1. , 1. , 1. , 0.9, 1. , 1. , 3. ,
2.6, 4. , 3. , 3.2, 2. , 1. , 1. , 1. , 1. , 1. ])
signal = (input > np.roll(input,1)) & (input > np.roll(input,-1))
plt.plot(input)
plt.plot(signal.nonzero()[0], input[signal], 'ro')
plt.show()
Run Code Online (Sandbox Code Playgroud)
通过设置m = 0.5
,您可以获得更清晰的信号,只有一个误报:
ckl*_*lin 17
在信号处理中,峰值检测通常通过小波变换完成.您基本上对时间序列数据进行离散小波变换.返回的细节系数中的过零点将对应于时间序列信号中的峰值.您可以在不同的细节系数级别检测到不同的峰值振幅,从而为您提供多级分辨率.
del*_*ica 13
适用于实时流的 Python 版本(不会在每个新数据点到达时重新计算所有数据点)。您可能想要调整类函数返回的内容 - 出于我的目的,我只需要信号。
import numpy as np
class real_time_peak_detection():
def __init__(self, array, lag, threshold, influence):
self.y = list(array)
self.length = len(self.y)
self.lag = lag
self.threshold = threshold
self.influence = influence
self.signals = [0] * len(self.y)
self.filteredY = np.array(self.y).tolist()
self.avgFilter = [0] * len(self.y)
self.stdFilter = [0] * len(self.y)
self.avgFilter[self.lag - 1] = np.mean(self.y[0:self.lag]).tolist()
self.stdFilter[self.lag - 1] = np.std(self.y[0:self.lag]).tolist()
def thresholding_algo(self, new_value):
self.y.append(new_value)
i = len(self.y) - 1
self.length = len(self.y)
if i < self.lag:
return 0
elif i == self.lag:
self.signals = [0] * len(self.y)
self.filteredY = np.array(self.y).tolist()
self.avgFilter = [0] * len(self.y)
self.stdFilter = [0] * len(self.y)
self.avgFilter[self.lag] = np.mean(self.y[0:self.lag]).tolist()
self.stdFilter[self.lag] = np.std(self.y[0:self.lag]).tolist()
return 0
self.signals += [0]
self.filteredY += [0]
self.avgFilter += [0]
self.stdFilter += [0]
if abs(self.y[i] - self.avgFilter[i - 1]) > self.threshold * self.stdFilter[i - 1]:
if self.y[i] > self.avgFilter[i - 1]:
self.signals[i] = 1
else:
self.signals[i] = -1
self.filteredY[i] = self.influence * self.y[i] + (1 - self.influence) * self.filteredY[i - 1]
self.avgFilter[i] = np.mean(self.filteredY[(i - self.lag):i])
self.stdFilter[i] = np.std(self.filteredY[(i - self.lag):i])
else:
self.signals[i] = 0
self.filteredY[i] = self.y[i]
self.avgFilter[i] = np.mean(self.filteredY[(i - self.lag):i])
self.stdFilter[i] = np.std(self.filteredY[(i - self.lag):i])
return self.signals[i]
Run Code Online (Sandbox Code Playgroud)
jbm*_*jbm 11
我们试图在我们的数据集上使用平滑的z-score算法,这会导致过敏或过度敏感(取决于参数的调整方式),几乎没有中间地面.在我们网站的交通信号中,我们观察到一个代表日常周期的低频基线,即使有最好的参数(如下所示),它仍然落后于第4天,因为大多数数据点被认为是异常.
在原始z-score算法的基础上,我们提出了一种通过反向过滤来解决这个问题的方法.修改后的算法的详细信息及其在电视商业交通归因中的应用发布在我们的团队博客上.
S. *_*ber 10
在计算拓扑中,持久同源性的概念导致了一种有效 - 快速的分类数 - 解决方案.它不仅可以检测峰,还可以自然地量化峰的"重要性",从而可以选择对您来说重要的峰.
算法摘要. 在一维设置(时间序列,实值信号)中,可以通过下图轻松描述算法:
将功能图(或其子级别集)视为景观,并考虑从水平无穷大开始降低水位(或在此图中为1.8).当水平降低时,在当地最大岛屿上弹出.在当地的最低点,这些岛屿融合在一起.这个想法的一个细节是,晚些时候出现的岛屿被合并到更古老的岛屿.岛屿的"持久性"是它的出生时间减去它的死亡时间.蓝色条的长度描绘了持久性,这是上面提到的峰的"重要性".
效率. 在函数值排序之后,找到一个在线性时间内运行的实现并不是很难 - 事实上它是一个单一的简单循环.因此,这种实施应该在实践中快速实施,并且也很容易实施.
引用. 可以在此处找到整个故事的写作以及来自持久同源性(计算代数拓扑中的字段)的动机的参考:https: //www.sthu.org/blog/13-perstopology-peakdetection/index.html
发现GH Palshikar 在时间序列中用于峰值检测的简单算法中的另一种算法.
算法如下:
algorithm peak1 // one peak detection algorithms that uses peak function S1
input T = x1, x2, …, xN, N // input time-series of N points
input k // window size around the peak
input h // typically 1 <= h <= 3
output O // set of peaks detected in T
begin
O = empty set // initially empty
for (i = 1; i < n; i++) do
// compute peak function value for each of the N points in T
a[i] = S1(k,i,xi,T);
end for
Compute the mean m' and standard deviation s' of all positive values in array a;
for (i = 1; i < n; i++) do // remove local peaks which are “small” in global context
if (a[i] > 0 && (a[i] – m') >( h * s')) then O = O + {xi};
end if
end for
Order peaks in O in terms of increasing index in T
// retain only one peak out of any set of peaks within distance k of each other
for every adjacent pair of peaks xi and xj in O do
if |j – i| <= k then remove the smaller value of {xi, xj} from O
end if
end for
end
Run Code Online (Sandbox Code Playgroud)
k
和h
事先例:
Matlab
和R
翻译MATLAB 代码
function [signals,avgFilter,stdFilter] = ThresholdingAlgo(y,lag,threshold,influence)
% Initialise signal results
signals = zeros(length(y),1);
% Initialise filtered series
filteredY = y(1:lag+1);
% Initialise filters
avgFilter(lag+1,1) = mean(y(1:lag+1));
stdFilter(lag+1,1) = std(y(1:lag+1));
% Loop over all datapoints y(lag+2),...,y(t)
for i=lag+2:length(y)
% If new value is a specified number of deviations away
if abs(y(i)-avgFilter(i-1)) > threshold*stdFilter(i-1)
if y(i) > avgFilter(i-1)
% Positive signal
signals(i) = 1;
else
% Negative signal
signals(i) = -1;
end
% Make influence lower
filteredY(i) = influence*y(i)+(1-influence)*filteredY(i-1);
else
% No signal
signals(i) = 0;
filteredY(i) = y(i);
end
% Adjust the filters
avgFilter(i) = mean(filteredY(i-lag:i));
stdFilter(i) = std(filteredY(i-lag:i));
end
% Done, now return results
end
Run Code Online (Sandbox Code Playgroud)
例子:
% Data
y = [1 1 1.1 1 0.9 1 1 1.1 1 0.9 1 1.1 1 1 0.9 1 1 1.1 1 1,...
1 1 1.1 0.9 1 1.1 1 1 0.9 1 1.1 1 1 1.1 1 0.8 0.9 1 1.2 0.9 1,...
1 1.1 1.2 1 1.5 1 3 2 5 3 2 1 1 1 0.9 1,...
1 3 2.6 4 3 3.2 2 1 1 0.8 4 4 2 2.5 1 1 1];
% Settings
lag = 30;
threshold = 5;
influence = 0;
% Get results
[signals,avg,dev] = ThresholdingAlgo(y,lag,threshold,influence);
figure; subplot(2,1,1); hold on;
x = 1:length(y); ix = lag+1:length(y);
area(x(ix),avg(ix)+threshold*dev(ix),'FaceColor',[0.9 0.9 0.9],'EdgeColor','none');
area(x(ix),avg(ix)-threshold*dev(ix),'FaceColor',[1 1 1],'EdgeColor','none');
plot(x(ix),avg(ix),'LineWidth',1,'Color','cyan','LineWidth',1.5);
plot(x(ix),avg(ix)+threshold*dev(ix),'LineWidth',1,'Color','green','LineWidth',1.5);
plot(x(ix),avg(ix)-threshold*dev(ix),'LineWidth',1,'Color','green','LineWidth',1.5);
plot(1:length(y),y,'b');
subplot(2,1,2);
stairs(signals,'r','LineWidth',1.5); ylim([-1.5 1.5]);
Run Code Online (Sandbox Code Playgroud)
R代码
ThresholdingAlgo <- function(y,lag,threshold,influence) {
signals <- rep(0,length(y))
filteredY <- y[0:lag]
avgFilter <- NULL
stdFilter <- NULL
avgFilter[lag] <- mean(y[0:lag], na.rm=TRUE)
stdFilter[lag] <- sd(y[0:lag], na.rm=TRUE)
for (i in (lag+1):length(y)){
if (abs(y[i]-avgFilter[i-1]) > threshold*stdFilter[i-1]) {
if (y[i] > avgFilter[i-1]) {
signals[i] <- 1;
} else {
signals[i] <- -1;
}
filteredY[i] <- influence*y[i]+(1-influence)*filteredY[i-1]
} else {
signals[i] <- 0
filteredY[i] <- y[i]
}
avgFilter[i] <- mean(filteredY[(i-lag):i], na.rm=TRUE)
stdFilter[i] <- sd(filteredY[(i-lag):i], na.rm=TRUE)
}
return(list("signals"=signals,"avgFilter"=avgFilter,"stdFilter"=stdFilter))
}
Run Code Online (Sandbox Code Playgroud)
例子:
# Data
y <- c(1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1)
lag <- 30
threshold <- 5
influence <- 0
# Run algo with lag = 30, threshold = 5, influence = 0
result <- ThresholdingAlgo(y,lag,threshold,influence)
# Plot result
par(mfrow = c(2,1),oma = c(2,2,0,0) + 0.1,mar = c(0,0,2,1) + 0.2)
plot(1:length(y),y,type="l",ylab="",xlab="")
lines(1:length(y),result$avgFilter,type="l",col="cyan",lwd=2)
lines(1:length(y),result$avgFilter+threshold*result$stdFilter,type="l",col="green",lwd=2)
lines(1:length(y),result$avgFilter-threshold*result$stdFilter,type="l",col="green",lwd=2)
plot(result$signals,type="S",col="red",ylab="",xlab="",ylim=c(-1.5,1.5),lwd=2)
Run Code Online (Sandbox Code Playgroud)
此代码(两种语言)将为原始问题的数据产生以下结果:
Matlab
演示代码(点击创建数据)
function [] = RobustThresholdingDemo()
%% SPECIFICATIONS
lag = 5; % lag for the smoothing
threshold = 3.5; % number of st.dev. away from the mean to signal
influence = 0.3; % when signal: how much influence for new data? (between 0 and 1)
% 1 is normal influence, 0.5 is half
%% START DEMO
DemoScreen(30,lag,threshold,influence);
end
function [signals,avgFilter,stdFilter] = ThresholdingAlgo(y,lag,threshold,influence)
signals = zeros(length(y),1);
filteredY = y(1:lag+1);
avgFilter(lag+1,1) = mean(y(1:lag+1));
stdFilter(lag+1,1) = std(y(1:lag+1));
for i=lag+2:length(y)
if abs(y(i)-avgFilter(i-1)) > threshold*stdFilter(i-1)
if y(i) > avgFilter(i-1)
signals(i) = 1;
else
signals(i) = -1;
end
filteredY(i) = influence*y(i)+(1-influence)*filteredY(i-1);
else
signals(i) = 0;
filteredY(i) = y(i);
end
avgFilter(i) = mean(filteredY(i-lag:i));
stdFilter(i) = std(filteredY(i-lag:i));
end
end
% Demo screen function
function [] = DemoScreen(n,lag,threshold,influence)
figure('Position',[200 100,1000,500]);
subplot(2,1,1);
title(sprintf(['Draw data points (%.0f max) [settings: lag = %.0f, '...
'threshold = %.2f, influence = %.2f]'],n,lag,threshold,influence));
ylim([0 5]); xlim([0 50]);
H = gca; subplot(2,1,1);
set(H, 'YLimMode', 'manual'); set(H, 'XLimMode', 'manual');
set(H, 'YLim', get(H,'YLim')); set(H, 'XLim', get(H,'XLim'));
xg = []; yg = [];
for i=1:n
try
[xi,yi] = ginput(1);
catch
return;
end
xg = [xg xi]; yg = [yg yi];
if i == 1
subplot(2,1,1); hold on;
plot(H, xg(i),yg(i),'r.');
text(xg(i),yg(i),num2str(i),'FontSize',7);
end
if length(xg) > lag
[signals,avg,dev] = ...
ThresholdingAlgo(yg,lag,threshold,influence);
area(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),...
'FaceColor',[0.9 0.9 0.9],'EdgeColor','none');
area(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),...
'FaceColor',[1 1 1],'EdgeColor','none');
plot(xg(lag+1:end),avg(lag+1:end),'LineWidth',1,'Color','cyan');
plot(xg(lag+1:end),avg(lag+1:end)+threshold*dev(lag+1:end),...
'LineWidth',1,'Color','green');
plot(xg(lag+1:end),avg(lag+1:end)-threshold*dev(lag+1:end),...
'LineWidth',1,'Color','green');
subplot(2,1,2); hold on; title('Signal output');
stairs(xg(lag+1:end),signals(lag+1:end),'LineWidth',2,'Color','blue');
ylim([-2 2]); xlim([0 50]); hold off;
end
subplot(2,1,1); hold on;
for j=2:i
plot(xg([j-1:j]),yg([j-1:j]),'r'); plot(H,xg(j),yg(j),'r.');
text(xg(j),yg(j),num2str(j),'FontSize',7);
end
end
end
Run Code Online (Sandbox Code Playgroud)
以下是Golang中Smoothed z-score算法(上图)的实现.它假设一片[]int16
(PCM 16位样本).你可以在这里找到一个要点.
/*
Settings (the ones below are examples: choose what is best for your data)
set lag to 5; # lag 5 for the smoothing functions
set threshold to 3.5; # 3.5 standard deviations for signal
set influence to 0.5; # between 0 and 1, where 1 is normal influence, 0.5 is half
*/
// ZScore on 16bit WAV samples
func ZScore(samples []int16, lag int, threshold float64, influence float64) (signals []int16) {
//lag := 20
//threshold := 3.5
//influence := 0.5
signals = make([]int16, len(samples))
filteredY := make([]int16, len(samples))
for i, sample := range samples[0:lag] {
filteredY[i] = sample
}
avgFilter := make([]int16, len(samples))
stdFilter := make([]int16, len(samples))
avgFilter[lag] = Average(samples[0:lag])
stdFilter[lag] = Std(samples[0:lag])
for i := lag + 1; i < len(samples); i++ {
f := float64(samples[i])
if float64(Abs(samples[i]-avgFilter[i-1])) > threshold*float64(stdFilter[i-1]) {
if samples[i] > avgFilter[i-1] {
signals[i] = 1
} else {
signals[i] = -1
}
filteredY[i] = int16(influence*f + (1-influence)*float64(filteredY[i-1]))
avgFilter[i] = Average(filteredY[(i - lag):i])
stdFilter[i] = Std(filteredY[(i - lag):i])
} else {
signals[i] = 0
filteredY[i] = samples[i]
avgFilter[i] = Average(filteredY[(i - lag):i])
stdFilter[i] = Std(filteredY[(i - lag):i])
}
}
return
}
// Average a chunk of values
func Average(chunk []int16) (avg int16) {
var sum int64
for _, sample := range chunk {
if sample < 0 {
sample *= -1
}
sum += int64(sample)
}
return int16(sum / int64(len(chunk)))
}
Run Code Online (Sandbox Code Playgroud)
这是基于之前发布的Groovy 答案的实际 Java 实现。(我知道已经发布了 Groovy 和 Kotlin 实现,但是对于像我这样只学过 Java 的人来说,弄清楚如何在其他语言和 Java 之间进行转换是一个真正的麻烦)。
(结果与其他人的图表相符)
算法实现
import java.util.ArrayList;
import java.util.Collections;
import java.util.HashMap;
import java.util.List;
import org.apache.commons.math3.stat.descriptive.SummaryStatistics;
public class SignalDetector {
public HashMap<String, List> analyzeDataForSignals(List<Double> data, int lag, Double threshold, Double influence) {
// init stats instance
SummaryStatistics stats = new SummaryStatistics();
// the results (peaks, 1 or -1) of our algorithm
List<Integer> signals = new ArrayList<Integer>(Collections.nCopies(data.size(), 0));
// filter out the signals (peaks) from our original list (using influence arg)
List<Double> filteredData = new ArrayList<Double>(data);
// the current average of the rolling window
List<Double> avgFilter = new ArrayList<Double>(Collections.nCopies(data.size(), 0.0d));
// the current standard deviation of the rolling window
List<Double> stdFilter = new ArrayList<Double>(Collections.nCopies(data.size(), 0.0d));
// init avgFilter and stdFilter
for (int i = 0; i < lag; i++) {
stats.addValue(data.get(i));
}
avgFilter.set(lag - 1, stats.getMean());
stdFilter.set(lag - 1, Math.sqrt(stats.getPopulationVariance())); // getStandardDeviation() uses sample variance
stats.clear();
// loop input starting at end of rolling window
for (int i = lag; i < data.size(); i++) {
// if the distance between the current value and average is enough standard deviations (threshold) away
if (Math.abs((data.get(i) - avgFilter.get(i - 1))) > threshold * stdFilter.get(i - 1)) {
// this is a signal (i.e. peak), determine if it is a positive or negative signal
if (data.get(i) > avgFilter.get(i - 1)) {
signals.set(i, 1);
} else {
signals.set(i, -1);
}
// filter this signal out using influence
filteredData.set(i, (influence * data.get(i)) + ((1 - influence) * filteredData.get(i - 1)));
} else {
// ensure this signal remains a zero
signals.set(i, 0);
// ensure this value is not filtered
filteredData.set(i, data.get(i));
}
// update rolling average and deviation
for (int j = i - lag; j < i; j++) {
stats.addValue(filteredData.get(j));
}
avgFilter.set(i, stats.getMean());
stdFilter.set(i, Math.sqrt(stats.getPopulationVariance()));
stats.clear();
}
HashMap<String, List> returnMap = new HashMap<String, List>();
returnMap.put("signals", signals);
returnMap.put("filteredData", filteredData);
returnMap.put("avgFilter", avgFilter);
returnMap.put("stdFilter", stdFilter);
return returnMap;
} // end
}
Run Code Online (Sandbox Code Playgroud)
主要方法
import java.text.DecimalFormat;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
public class Main {
public static void main(String[] args) throws Exception {
DecimalFormat df = new DecimalFormat("#0.000");
ArrayList<Double> data = new ArrayList<Double>(Arrays.asList(1d, 1d, 1.1d, 1d, 0.9d, 1d, 1d, 1.1d, 1d, 0.9d, 1d,
1.1d, 1d, 1d, 0.9d, 1d, 1d, 1.1d, 1d, 1d, 1d, 1d, 1.1d, 0.9d, 1d, 1.1d, 1d, 1d, 0.9d, 1d, 1.1d, 1d, 1d,
1.1d, 1d, 0.8d, 0.9d, 1d, 1.2d, 0.9d, 1d, 1d, 1.1d, 1.2d, 1d, 1.5d, 1d, 3d, 2d, 5d, 3d, 2d, 1d, 1d, 1d,
0.9d, 1d, 1d, 3d, 2.6d, 4d, 3d, 3.2d, 2d, 1d, 1d, 0.8d, 4d, 4d, 2d, 2.5d, 1d, 1d, 1d));
SignalDetector signalDetector = new SignalDetector();
int lag = 30;
double threshold = 5;
double influence = 0;
HashMap<String, List> resultsMap = signalDetector.analyzeDataForSignals(data, lag, threshold, influence);
// print algorithm params
System.out.println("lag: " + lag + "\t\tthreshold: " + threshold + "\t\tinfluence: " + influence);
System.out.println("Data size: " + data.size());
System.out.println("Signals size: " + resultsMap.get("signals").size());
// print data
System.out.print("Data:\t\t");
for (double d : data) {
System.out.print(df.format(d) + "\t");
}
System.out.println();
// print signals
System.out.print("Signals:\t");
List<Integer> signalsList = resultsMap.get("signals");
for (int i : signalsList) {
System.out.print(df.format(i) + "\t");
}
System.out.println();
// print filtered data
System.out.print("Filtered Data:\t");
List<Double> filteredDataList = resultsMap.get("filteredData");
for (double d : filteredDataList) {
System.out.print(df.format(d) + "\t");
}
System.out.println();
// print running average
System.out.print("Avg Filter:\t");
List<Double> avgFilterList = resultsMap.get("avgFilter");
for (double d : avgFilterList) {
System.out.print(df.format(d) + "\t");
}
System.out.println();
// print running std
System.out.print("Std filter:\t");
List<Double> stdFilterList = resultsMap.get("stdFilter");
for (double d : stdFilterList) {
System.out.print(df.format(d) + "\t");
}
System.out.println();
System.out.println();
for (int i = 0; i < signalsList.size(); i++) {
if (signalsList.get(i) != 0) {
System.out.println("Point " + i + " gave signal " + signalsList.get(i));
}
}
}
}
Run Code Online (Sandbox Code Playgroud)
结果
lag: 30 threshold: 5.0 influence: 0.0
Data size: 74
Signals size: 74
Data: 1.000 1.000 1.100 1.000 0.900 1.000 1.000 1.100 1.000 0.900 1.000 1.100 1.000 1.000 0.900 1.000 1.000 1.100 1.000 1.000 1.000 1.000 1.100 0.900 1.000 1.100 1.000 1.000 0.900 1.000 1.100 1.000 1.000 1.100 1.000 0.800 0.900 1.000 1.200 0.900 1.000 1.000 1.100 1.200 1.000 1.500 1.000 3.000 2.000 5.000 3.000 2.000 1.000 1.000 1.000 0.900 1.000 1.000 3.000 2.600 4.000 3.000 3.200 2.000 1.000 1.000 0.800 4.000 4.000 2.000 2.500 1.000 1.000 1.000
Signals: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 0.000 1.000 1.000 1.000 1.000 1.000 0.000 0.000 0.000 0.000 0.000 0.000 1.000 1.000 1.000 1.000 1.000 1.000 0.000 0.000 0.000 1.000 1.000 1.000 1.000 0.000 0.000 0.000
Filtered Data: 1.000 1.000 1.100 1.000 0.900 1.000 1.000 1.100 1.000 0.900 1.000 1.100 1.000 1.000 0.900 1.000 1.000 1.100 1.000 1.000 1.000 1.000 1.100 0.900 1.000 1.100 1.000 1.000 0.900 1.000 1.100 1.000 1.000 1.100 1.000 0.800 0.900 1.000 1.200 0.900 1.000 1.000 1.100 1.200 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.900 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 0.800 0.800 0.800 0.800 0.800 1.000 1.000 1.000
Avg Filter: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 1.003 1.003 1.007 1.007 1.003 1.007 1.010 1.003 1.000 0.997 1.003 1.003 1.003 1.000 1.003 1.010 1.013 1.013 1.013 1.010 1.010 1.010 1.010 1.010 1.007 1.010 1.010 1.003 1.003 1.003 1.007 1.007 1.003 1.003 1.003 1.000 1.000 1.007 1.003 0.997 0.983 0.980 0.973 0.973 0.970
Std filter: 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.000 0.060 0.060 0.063 0.063 0.060 0.063 0.060 0.071 0.073 0.071 0.080 0.080 0.080 0.077 0.080 0.087 0.085 0.085 0.085 0.083 0.083 0.083 0.083 0.083 0.081 0.079 0.079 0.080 0.080 0.080 0.077 0.077 0.075 0.075 0.075 0.073 0.073 0.063 0.071 0.080 0.078 0.083 0.089 0.089 0.086
Point 45 gave signal 1
Point 47 gave signal 1
Point 48 gave signal 1
Point 49 gave signal 1
Point 50 gave signal 1
Point 51 gave signal 1
Point 58 gave signal 1
Point 59 gave signal 1
Point 60 gave signal 1
Point 61 gave signal 1
Point 62 gave signal 1
Point 63 gave signal 1
Point 67 gave signal 1
Point 68 gave signal 1
Point 69 gave signal 1
Point 70 gave signal 1
Run Code Online (Sandbox Code Playgroud)
这个问题看起来类似于我在混合/嵌入式系统课程中遇到的问题,但这与在传感器的输入有噪声时检测故障有关.我们使用卡尔曼滤波器来估计/预测系统的隐藏状态,然后使用统计分析来确定故障发生的可能性.我们使用线性系统,但存在非线性变体.我记得这种方法具有令人惊讶的自适应性,但它需要一个系统动力学模型.
以下是此答案中平滑的z-score算法的C++实现
std::vector<int> smoothedZScore(std::vector<float> input)
{
//lag 5 for the smoothing functions
int lag = 5;
//3.5 standard deviations for signal
float threshold = 3.5;
//between 0 and 1, where 1 is normal influence, 0.5 is half
float influence = .5;
if (input.size() <= lag + 2)
{
std::vector<int> emptyVec;
return emptyVec;
}
//Initialise variables
std::vector<int> signals(input.size(), 0.0);
std::vector<float> filteredY(input.size(), 0.0);
std::vector<float> avgFilter(input.size(), 0.0);
std::vector<float> stdFilter(input.size(), 0.0);
std::vector<float> subVecStart(input.begin(), input.begin() + lag);
avgFilter[lag] = mean(subVecStart);
stdFilter[lag] = stdDev(subVecStart);
for (size_t i = lag + 1; i < input.size(); i++)
{
if (std::abs(input[i] - avgFilter[i - 1]) > threshold * stdFilter[i - 1])
{
if (input[i] > avgFilter[i - 1])
{
signals[i] = 1; //# Positive signal
}
else
{
signals[i] = -1; //# Negative signal
}
//Make influence lower
filteredY[i] = influence* input[i] + (1 - influence) * filteredY[i - 1];
}
else
{
signals[i] = 0; //# No signal
filteredY[i] = input[i];
}
//Adjust the filters
std::vector<float> subVec(filteredY.begin() + i - lag, filteredY.begin() + i);
avgFilter[i] = mean(subVec);
stdFilter[i] = stdDev(subVec);
}
return signals;
}
Run Code Online (Sandbox Code Playgroud)
#include <iostream>
#include <vector>
#include <algorithm>
#include <unordered_map>
#include <cmath>
#include <iterator>
#include <numeric>
using namespace std;
typedef long double ld;
typedef unsigned int uint;
typedef std::vector<ld>::iterator vec_iter_ld;
/**
* Overriding the ostream operator for pretty printing vectors.
*/
template<typename T>
std::ostream &operator<<(std::ostream &os, std::vector<T> vec) {
os << "[";
if (vec.size() != 0) {
std::copy(vec.begin(), vec.end() - 1, std::ostream_iterator<T>(os, " "));
os << vec.back();
}
os << "]";
return os;
}
/**
* This class calculates mean and standard deviation of a subvector.
* This is basically stats computation of a subvector of a window size qual to "lag".
*/
class VectorStats {
public:
/**
* Constructor for VectorStats class.
*
* @param start - This is the iterator position of the start of the window,
* @param end - This is the iterator position of the end of the window,
*/
VectorStats(vec_iter_ld start, vec_iter_ld end) {
this->start = start;
this->end = end;
this->compute();
}
/**
* This method calculates the mean and standard deviation using STL function.
* This is the Two-Pass implementation of the Mean & Variance calculation.
*/
void compute() {
ld sum = std::accumulate(start, end, 0.0);
uint slice_size = std::distance(start, end);
ld mean = sum / slice_size;
std::vector<ld> diff(slice_size);
std::transform(start, end, diff.begin(), [mean](ld x) { return x - mean; });
ld sq_sum = std::inner_product(diff.begin(), diff.end(), diff.begin(), 0.0);
ld std_dev = std::sqrt(sq_sum / slice_size);
this->m1 = mean;
this->m2 = std_dev;
}
ld mean() {
return m1;
}
ld standard_deviation() {
return m2;
}
private:
vec_iter_ld start;
vec_iter_ld end;
ld m1;
ld m2;
};
/**
* This is the implementation of the Smoothed Z-Score Algorithm.
* This is direction translation of /sf/answers/1584825371/.
*
* @param input - input signal
* @param lag - the lag of the moving window
* @param threshold - the z-score at which the algorithm signals
* @param influence - the influence (between 0 and 1) of new signals on the mean and standard deviation
* @return a hashmap containing the filtered signal and corresponding mean and standard deviation.
*/
unordered_map<string, vector<ld>> z_score_thresholding(vector<ld> input, int lag, ld threshold, ld influence) {
unordered_map<string, vector<ld>> output;
uint n = (uint) input.size();
vector<ld> signals(input.size());
vector<ld> filtered_input(input.begin(), input.end());
vector<ld> filtered_mean(input.size());
vector<ld> filtered_stddev(input.size());
VectorStats lag_subvector_stats(input.begin(), input.begin() + lag);
filtered_mean[lag - 1] = lag_subvector_stats.mean();
filtered_stddev[lag - 1] = lag_subvector_stats.standard_deviation();
for (int i = lag; i < n; i++) {
if (abs(input[i] - filtered_mean[i - 1]) > threshold * filtered_stddev[i - 1]) {
signals[i] = (input[i] > filtered_mean[i - 1]) ? 1.0 : -1.0;
filtered_input[i] = influence * input[i] + (1 - influence) * filtered_input[i - 1];
} else {
signals[i] = 0.0;
filtered_input[i] = input[i];
}
VectorStats lag_subvector_stats(filtered_input.begin() + (i - lag), filtered_input.begin() + i);
filtered_mean[i] = lag_subvector_stats.mean();
filtered_stddev[i] = lag_subvector_stats.standard_deviation();
}
output["signals"] = signals;
output["filtered_mean"] = filtered_mean;
output["filtered_stddev"] = filtered_stddev;
return output;
};
int main() {
vector<ld> input = {1.0, 1.0, 1.1, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0, 0.9, 1.0, 1.1, 1.0, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0,
1.0, 1.0, 1.0, 1.1, 0.9, 1.0, 1.1, 1.0, 1.0, 0.9, 1.0, 1.1, 1.0, 1.0, 1.1, 1.0, 0.8, 0.9, 1.0,
1.2, 0.9, 1.0, 1.0, 1.1, 1.2, 1.0, 1.5, 1.0, 3.0, 2.0, 5.0, 3.0, 2.0, 1.0, 1.0, 1.0, 0.9, 1.0,
1.0, 3.0, 2.6, 4.0, 3.0, 3.2, 2.0, 1.0, 1.0, 0.8, 4.0, 4.0, 2.0, 2.5, 1.0, 1.0, 1.0};
int lag = 30;
ld threshold = 5.0;
ld influence = 0.0;
unordered_map<string, vector<ld>> output = z_score_thresholding(input, lag, threshold, influence);
cout << output["signals"] << endl;
}
Run Code Online (Sandbox Code Playgroud)
我想我会为其他人提供我的算法的 Julia 实现。要点可以在这里找到
using Statistics
using Plots
function SmoothedZscoreAlgo(y, lag, threshold, influence)
# Julia implimentation of http://stackoverflow.com/a/22640362/6029703
n = length(y)
signals = zeros(n) # init signal results
filteredY = copy(y) # init filtered series
avgFilter = zeros(n) # init average filter
stdFilter = zeros(n) # init std filter
avgFilter[lag - 1] = mean(y[1:lag]) # init first value
stdFilter[lag - 1] = std(y[1:lag]) # init first value
for i in range(lag, stop=n-1)
if abs(y[i] - avgFilter[i-1]) > threshold*stdFilter[i-1]
if y[i] > avgFilter[i-1]
signals[i] += 1 # postive signal
else
signals[i] += -1 # negative signal
end
# Make influence lower
filteredY[i] = influence*y[i] + (1-influence)*filteredY[i-1]
else
signals[i] = 0
filteredY[i] = y[i]
end
avgFilter[i] = mean(filteredY[i-lag+1:i])
stdFilter[i] = std(filteredY[i-lag+1:i])
end
return (signals = signals, avgFilter = avgFilter, stdFilter = stdFilter)
end
# Data
y = [1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1]
# Settings: lag = 30, threshold = 5, influence = 0
lag = 30
threshold = 5
influence = 0
results = SmoothedZscoreAlgo(y, lag, threshold, influence)
upper_bound = results[:avgFilter] + threshold * results[:stdFilter]
lower_bound = results[:avgFilter] - threshold * results[:stdFilter]
x = 1:length(y)
yplot = plot(x,y,color="blue", label="Y",legend=:topleft)
yplot = plot!(x,upper_bound, color="green", label="Upper Bound",legend=:topleft)
yplot = plot!(x,results[:avgFilter], color="cyan", label="Average Filter",legend=:topleft)
yplot = plot!(x,lower_bound, color="green", label="Lower Bound",legend=:topleft)
signalplot = plot(x,results[:signals],color="red",label="Signals",legend=:topleft)
plot(yplot,signalplot,layout=(2,1),legend=:topleft)
Run Code Online (Sandbox Code Playgroud)
在@ Jean-Paul提出的解决方案之后,我在C#中实现了他的算法
public class ZScoreOutput
{
public List<double> input;
public List<int> signals;
public List<double> avgFilter;
public List<double> filtered_stddev;
}
public static class ZScore
{
public static ZScoreOutput StartAlgo(List<double> input, int lag, double threshold, double influence)
{
// init variables!
int[] signals = new int[input.Count];
double[] filteredY = new List<double>(input).ToArray();
double[] avgFilter = new double[input.Count];
double[] stdFilter = new double[input.Count];
var initialWindow = new List<double>(filteredY).Skip(0).Take(lag).ToList();
avgFilter[lag - 1] = Mean(initialWindow);
stdFilter[lag - 1] = StdDev(initialWindow);
for (int i = lag; i < input.Count; i++)
{
if (Math.Abs(input[i] - avgFilter[i - 1]) > threshold * stdFilter[i - 1])
{
signals[i] = (input[i] > avgFilter[i - 1]) ? 1 : -1;
filteredY[i] = influence * input[i] + (1 - influence) * filteredY[i - 1];
}
else
{
signals[i] = 0;
filteredY[i] = input[i];
}
// Update rolling average and deviation
var slidingWindow = new List<double>(filteredY).Skip(i - lag).Take(lag+1).ToList();
var tmpMean = Mean(slidingWindow);
var tmpStdDev = StdDev(slidingWindow);
avgFilter[i] = Mean(slidingWindow);
stdFilter[i] = StdDev(slidingWindow);
}
// Copy to convenience class
var result = new ZScoreOutput();
result.input = input;
result.avgFilter = new List<double>(avgFilter);
result.signals = new List<int>(signals);
result.filtered_stddev = new List<double>(stdFilter);
return result;
}
private static double Mean(List<double> list)
{
// Simple helper function!
return list.Average();
}
private static double StdDev(List<double> values)
{
double ret = 0;
if (values.Count() > 0)
{
double avg = values.Average();
double sum = values.Sum(d => Math.Pow(d - avg, 2));
ret = Math.Sqrt((sum) / (values.Count() - 1));
}
return ret;
}
}
Run Code Online (Sandbox Code Playgroud)
用法示例:
var input = new List<double> {1.0, 1.0, 1.1, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0, 0.9, 1.0,
1.1, 1.0, 1.0, 0.9, 1.0, 1.0, 1.1, 1.0, 1.0, 1.0, 1.0, 1.1, 0.9, 1.0, 1.1, 1.0, 1.0, 0.9,
1.0, 1.1, 1.0, 1.0, 1.1, 1.0, 0.8, 0.9, 1.0, 1.2, 0.9, 1.0, 1.0, 1.1, 1.2, 1.0, 1.5, 1.0,
3.0, 2.0, 5.0, 3.0, 2.0, 1.0, 1.0, 1.0, 0.9, 1.0, 1.0, 3.0, 2.6, 4.0, 3.0, 3.2, 2.0, 1.0,
1.0, 0.8, 4.0, 4.0, 2.0, 2.5, 1.0, 1.0, 1.0};
int lag = 30;
double threshold = 5.0;
double influence = 0.0;
var output = ZScore.StartAlgo(input, lag, threshold, influence);
Run Code Online (Sandbox Code Playgroud)
这是@ Jean-Paul的 Smoothed Z分数的C实现,用于Arduino微控制器,用于获取加速度计读数并确定撞击的方向是来自左侧还是右侧。由于此设备返回了反弹信号,因此效果非常好。这是设备对峰值检测算法的输入-显示了来自右侧的影响,之后是左侧的影响。您可以看到初始峰值,然后是传感器的振荡。
#include <stdio.h>
#include <math.h>
#include <string.h>
#define SAMPLE_LENGTH 1000
float stddev(float data[], int len);
float mean(float data[], int len);
void thresholding(float y[], int signals[], int lag, float threshold, float influence);
void thresholding(float y[], int signals[], int lag, float threshold, float influence) {
memset(signals, 0, sizeof(float) * SAMPLE_LENGTH);
float filteredY[SAMPLE_LENGTH];
memcpy(filteredY, y, sizeof(float) * SAMPLE_LENGTH);
float avgFilter[SAMPLE_LENGTH];
float stdFilter[SAMPLE_LENGTH];
avgFilter[lag - 1] = mean(y, lag);
stdFilter[lag - 1] = stddev(y, lag);
for (int i = lag; i < SAMPLE_LENGTH; i++) {
if (fabsf(y[i] - avgFilter[i-1]) > threshold * stdFilter[i-1]) {
if (y[i] > avgFilter[i-1]) {
signals[i] = 1;
} else {
signals[i] = -1;
}
filteredY[i] = influence * y[i] + (1 - influence) * filteredY[i-1];
} else {
signals[i] = 0;
}
avgFilter[i] = mean(filteredY + i-lag, lag);
stdFilter[i] = stddev(filteredY + i-lag, lag);
}
}
float mean(float data[], int len) {
float sum = 0.0, mean = 0.0;
int i;
for(i=0; i<len; ++i) {
sum += data[i];
}
mean = sum/len;
return mean;
}
float stddev(float data[], int len) {
float the_mean = mean(data, len);
float standardDeviation = 0.0;
int i;
for(i=0; i<len; ++i) {
standardDeviation += pow(data[i] - the_mean, 2);
}
return sqrt(standardDeviation/len);
}
int main() {
printf("Hello, World!\n");
int lag = 100;
float threshold = 5;
float influence = 0;
float y[]= {1,1,1.1,1,0.9,1,1,1.1,1,0.9,1,1.1,1,1,0.9,1,1,1.1,1,1,1,1,1.1,0.9,1,1.1,1,1,0.9,
....
1,1.1,1,1,1.1,1,0.8,0.9,1,1.2,0.9,1,1,1.1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3, 2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1,1.2,1,1.5,1,3,2,5,3,2,1,1,1,0.9,1,1,3,
2.6,4,3,3.2,2,1,1,0.8,4,4,2,2.5,1,1,1}
int signal[SAMPLE_LENGTH];
thresholding(y, signal, lag, threshold, influence);
return 0;
}
Run Code Online (Sandbox Code Playgroud)
她的影响力= 0的结果
不太好,但是影响= 1
很好
这是我尝试根据已接受的答案为“平滑 z 得分算法”创建 Ruby 解决方案:
module ThresholdingAlgoMixin
def mean(array)
array.reduce(&:+) / array.size.to_f
end
def stddev(array)
array_mean = mean(array)
Math.sqrt(array.reduce(0.0) { |a, b| a.to_f + ((b.to_f - array_mean) ** 2) } / array.size.to_f)
end
def thresholding_algo(lag: 5, threshold: 3.5, influence: 0.5)
return nil if size < lag * 2
Array.new(size, 0).tap do |signals|
filtered = Array.new(self)
initial_slice = take(lag)
avg_filter = Array.new(lag - 1, 0.0) + [mean(initial_slice)]
std_filter = Array.new(lag - 1, 0.0) + [stddev(initial_slice)]
(lag..size-1).each do |idx|
prev = idx - 1
if (fetch(idx) - avg_filter[prev]).abs > threshold * std_filter[prev]
signals[idx] = fetch(idx) > avg_filter[prev] ? 1 : -1
filtered[idx] = (influence * fetch(idx)) + ((1-influence) * filtered[prev])
end
filtered_slice = filtered[idx-lag..prev]
avg_filter[idx] = mean(filtered_slice)
std_filter[idx] = stddev(filtered_slice)
end
end
end
end
Run Code Online (Sandbox Code Playgroud)
以及示例用法:
test_data = [
1, 1, 1.1, 1, 0.9, 1, 1, 1.1, 1, 0.9, 1, 1.1, 1, 1, 0.9, 1,
1, 1.1, 1, 1, 1, 1, 1.1, 0.9, 1, 1.1, 1, 1, 0.9, 1, 1.1, 1,
1, 1.1, 1, 0.8, 0.9, 1, 1.2, 0.9, 1, 1, 1.1, 1.2, 1, 1.5,
1, 3, 2, 5, 3, 2, 1, 1, 1, 0.9, 1, 1, 3, 2.6, 4, 3, 3.2, 2,
1, 1, 0.8, 4, 4, 2, 2.5, 1, 1, 1
].extend(ThresholdingAlgoMixin)
puts test_data.thresholding_algo.inspect
# Output: [
# 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
# 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, -1, 0, 0, 0,
# 0, 0, 0, 0, 0, 0, 1, 0, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 1,
# 1, 1, 0, 0, 0, -1, -1, 0, 0, 0, 0, 0, 0, 0, 0
# ]
Run Code Online (Sandbox Code Playgroud)
归档时间: |
|
查看次数: |
136007 次 |
最近记录: |