比较两个时间序列（模拟结果）

Question

比较两个时间序列（模拟结果）

mat*_*tth 7 python numpy time-series matplotlib timeserieschart

我想对模拟模型进行单元测试，为此，我运行一次模拟并将结果（时间序列）作为参考存储在 csv 文件中（请参阅此处的示例）。现在，当我更改模型时，我再次运行模拟，将新结果也存储为 csv 文件，然后比较结果。

结果通常不是 100% 相同，示例图如下所示：
参考结果以黑色绘制，新结果以绿色绘制。
两者的差异在第二个图中以蓝色绘制。
可以看出，在一个步骤中，差异可以变得任意高，而其他地方的差异几乎为零。

因此，我更愿意使用不同的算法进行比较，而不仅仅是将两者相减，但我只能以图形方式描述我的想法：绘制参考线两次时，首先使用具有高线宽的浅色，然后再次使用深色颜色和小线宽，那么它看起来就像在中心线周围有一个粉红色的管子。

请注意，在一个步骤中，管不仅在纵坐标方向上，而且在横坐标方向上。在进行比较时，我想知道绿线是否留在粉红色管内。

现在我的问题是：我不想使用图形比较两个时间序列，而是使用 python 脚本。一定已经有这样的东西了，但我找不到它，因为我缺少正确的词汇，我相信。有任何想法吗？在 numpy、scipy 或类似的东西中是否有类似的东西？还是我必须自己写比较？

附加问题：当脚本说这两个系列不够相似时，我想按上述方式绘制它（使用 matplotlib），但线宽必须以其他单位定义，而不是我通常用来定义线宽的单位.

Answer 1

nor*_*ok2 3

我在这里假设您的问题可以通过假设您的函数必须接近具有相同支撑点的另一个函数（例如管的中心）来简化，然后允许一定数量的不连续性。然后，与用于L^2范数的典型函数相比，我将实现一种不同的函数离散化（例如，参见此处的一些参考）。

基本上，在连续情况下，L^2范数放宽了两个函数处处接近的约束，并允许它在有限数量的点上不同，称为奇点。这是有效的，因为有无数个点可以计算积分，并且有限数量的点不会产生影响。

然而，由于这里没有连续函数，而只有它们的离散化，因此朴素的方法将不起作用，因为任何奇点都可能对最终积分值产生重大影响。

因此，您可以做的是逐点检查两个函数是否接近（在一定的容差范围内）并允许最多num_exceptions点关闭。

import numpy as np

def is_close_except(arr1, arr2, num_exceptions=0.01, **kwargs):
    # if float, calculate as percentage of number of points
    if isinstance(num_exceptions, float):
        num_exceptions = int(len(arr1) * num_exceptions)
    num = len(arr1) - np.sum(np.isclose(arr1, arr2, **kwargs))
    return num <= num_exceptions

Run Code Online (Sandbox Code Playgroud)

相比之下，标准L^2范数离散化会导致类似这样的集成（和标准化）度量：

import numpy as np

def is_close_l2(arr1, arr2, **kwargs):
    norm1 = np.sum(arr1 ** 2)
    norm2 = np.sum(arr2 ** 2)
    norm = np.sum((arr1 - arr2) ** 2)
    return np.isclose(2 * norm / (norm1 + norm2), 0.0, **kwargs)

Run Code Online (Sandbox Code Playgroud)

然而，对于任意大的峰值，这将失败，除非您设置的容差比基本上任何结果都“接近”的大。

请注意，kwargs如果您想为其选项指定附加公差约束np.isclose()或其他选项，则使用。

作为测试，您可以运行：

import numpy as np
import numpy.random

np.random.seed(0)

num = 1000
snr = 100
n_peaks = 5
x = np.linspace(-10, 10, num)
# generate ground truth
y = np.sin(x)
# distributed noise
y2 = y + np.random.random(num) / snr
# distributed noise + peaks
y3 = y + np.random.random(num) / snr
peak_positions = [np.random.randint(num) for _ in range(n_peaks)]
for i in peak_positions:
    y3[i] += np.random.random() * snr

# for distributed noise, both work with a 1/snr tolerance
is_close_l2(y, y2, atol=1/snr)
# output: True
is_close_except(y, y2, atol=1/snr)
# output: True

# for peak noise, since n_peaks < num_exceptions, this works
is_close_except(y, y3, atol=1/snr)
# output: True
# and if you allow 0 exceptions, than it fails, as expected
is_close_except(y, y3, num_exceptions=0, atol=1/snr)
# output: False

# for peak noise, this fails because the contribution from the peaks
# in the integral is much larger than the contribution from the rest
is_close_l2(y, y3, atol=1/snr)
# output: False

Run Code Online (Sandbox Code Playgroud)

对于这个问题还有其他涉及高等数学的方法（例如傅立叶或小波变换），但我会坚持使用最简单的方法。

编辑（更新）：

但是，如果工作假设不成立或者您不喜欢，例如因为两个函数具有不同的采样或者它们是通过非内射关系描述的。在这种情况下，您可以使用 (x, y) 数据跟踪管的中心，并计算距目标（管中心）的欧几里得距离，并检查该距离是否逐点小于允许的最大值（管尺寸）：

import numpy as np

# assume it is something with shape (N, 2) meaning (x, y)
target = ...

# assume it is something with shape (M, 2) meaning again (x, y)
trajectory = ...

# calculate the distance minimum distance between each point
# of the trajectory and the target
def is_close_trajectory(trajectory, target, max_dist):
    dist = np.zeros(trajectory.shape[0])
    for i in range(len(dist)):
        dist[i] = np.min(np.sqrt(
            (target[:, 0] - trajectory[i, 0]) ** 2 +
            (target[:, 1] - trajectory[i, 1]) ** 2))
    return np.all(dist < max_dist)

# same as above but faster and more memory-hungry
def is_close_trajectory2(trajectory, target, max_dist):
    dist = np.min(np.sqrt(
        (target[:, np.newaxis, 0] - trajectory[np.newaxis, :, 0]) ** 2 +
        (target[:, np.newaxis, 1] - trajectory[np.newaxis, :, 1]) ** 2),
        axis=1)
    return np.all(dist < max_dist)

Run Code Online (Sandbox Code Playgroud)

这种灵活性的代价是，这将是一个显着变慢或占用内存的函数。

归档时间：	8 年，1 月前
查看次数：	4714 次
最近记录：	4 年，3 月前