使用掩码 Python 去趋势

Question

使用掩码 Python 去趋势

我有一个文件，我阅读如下：

data1 = np.loadtxt('lc1.out') 
x = data1[:, 0]
y = data1[:, 1]

Run Code Online (Sandbox Code Playgroud)

我想消除它，我在这里找到了一个非常有用的链接。

model = np.polyfit(x, y, 2)
predicted = np.polyval(model, x)

Run Code Online (Sandbox Code Playgroud)

无论如何，我想掩盖一部分数据，例如我将只使用掩码外的点进行拟合。例如，我只想使用低于 639.5 和大于 641.5 的数据和二阶多项式拟合。

我有使用ma.masked_outside(x, 639.5, 641.5)这样的想法，因为它很容易仅将掩码外的元素保存在数组中......但我不明白如何使用polyfit.

Answer 1

blu*_*lub 5

除非出于性能原因或进一步使用掩码，否则在您的用例中使用掩码数组可能不是一个硬道理。所以我将展示如何使用和不使用掩码数组。

但是让我们首先在没有屏蔽的情况下去除趋势以获得参考：

二阶多项式去趋势而不屏蔽

import numpy as np
import matplotlib.pyplot as plt

data1 = np.loadtxt('lc1.out')
x, y = data1.T

fig = plt.figure()

plt.subplot(2, 1, 1)
plt.title('polyfit, original data set')
plt.plot(x, y, 'c.')

coeff = np.polyfit(x, y, 2)

# no need to use the original x values here just for visualizing the polynomial
x_poly = np.linspace(x.min(), x.max())
y_poly = np.polyval(coeff, x_poly)
plt.plot(x_poly, y_poly, 'r-', linewidth=3)

mid = len(x_poly) // 2
plt.annotate('y = {:.7g} x\xB2 + {:.7g} x + {:.7g}'.format(*coeff),
             (x_poly[mid], y_poly[mid]), (0, 48), textcoords='offset points',
             arrowprops={'arrowstyle': '->'}, horizontalalignment='center')

plt.subplot(2, 1, 2)
plt.title('detrended')

# we need the original x values here, so we can remove the trend from all points
trend = np.polyval(coeff, x)
# note that simply subtracting the trend might not be enough for other data sets
plt.plot(x, y - trend, 'b.')
fig.show()

Run Code Online (Sandbox Code Playgroud)

记下多项式的系数。

二阶多项式去趋势，选择重要点

我们可以简单地创建仅包含所需点的新x和y数组。这里出错的可能性较小。

这分 3 个步骤。首先，我们在感兴趣的数组上使用比较运算符，这会产生一个布尔数组，在比较为真的索引处具有“真”值，而在其他地方则为“假”值。

然后我们将 bool 数组放入 'np.where()'，这会生成一个包含所有索引号的数组作为值，其中 bool 数组具有 'True' 值，即它回答了以下问题：“我的数组在哪里？ ”

最后，我们仔细阅读 Numpy 的高级索引并将我们的索引结果数组作为索引应用于x和y数组，这会过滤掉所有不需要的索引。

import numpy as np
import matplotlib.pyplot as plt

data1 = np.loadtxt('lc1.out')
x, y = data1.T
select = np.where((x < 640.75) | (x > 641.25))
x_selection = x[select]  # numpy advanced indexing
y_selection = y[select]  # numpy advanced indexing

fig = plt.figure()

plt.subplot(2, 1, 1)
plt.title('polyfit, selecting significant points')
plt.plot(x_selection, y_selection, 'c.')

coeff = np.polyfit(x_selection, y_selection, 2)

# no need to use the original x values here just for visualizing the polynomial
x_poly = np.linspace(x_selection.min(), x_selection.max())
y_poly = np.polyval(coeff, x_poly)
plt.plot(x_poly, y_poly, 'r-', linewidth=3)

mid = len(x_poly) // 2
plt.annotate('y = {:.7g} x\xB2 + {:.7g} x + {:.7g}'.format(*coeff),
             (x_poly[mid], y_poly[mid]), (0, 48), textcoords='offset points',
             arrowprops={'arrowstyle': '->'}, horizontalalignment='center')

plt.subplot(2, 1, 2)
plt.title('detrended')

# we need the original x values here, so we can remove the trend from all points
trend = np.polyval(coeff, x)
# note that simply subtracting the trend might not be enough for other data sets
plt.plot(x, y - trend, 'b.')
fig.show()

Run Code Online (Sandbox Code Playgroud)

正如预期的那样，现在的系数不同了。

二阶多项式去趋势，屏蔽不需要的点

当然我们也可以使用掩码数组。注意反向逻辑：屏蔽点是我们不想要的点。在示例数据中，我们不需要区间内的点，我们使用ma.masked_inside().

如果出于性能原因我们想避免创建原始数组的完整副本，我们可以使用关键字copy=False。将原始数组设为只读可以避免我们通过改变原始数组而意外更改掩码数组中的值。

对于掩码数组，我们需要使用子模块中的polyfit()函数版本numpy.ma，它会正确忽略x的掩码版本以及未掩码的伴随数组y 中不需要的值。如果我们不这样做，我们就会得到错误的结果。请注意，这是一个容易犯的错误，因此如果我们可以提供帮助，我们可能希望坚持使用另一种方法。

import numpy as np
import numpy.ma as ma
import matplotlib.pyplot as plt

data1 = np.loadtxt('lc1.out')
x, y = data1.T
x.flags.writeable = False  # safety measure, as we don't copy
x_masked = ma.masked_inside(x, 640.75, 641.25, copy=False)

fig = plt.figure()

plt.subplot(2, 1, 1)
plt.title('polyfit, masking unwanted points')
plt.plot(x_masked, y, 'c.')

coeff = ma.polyfit(x_masked, y, 2)

# no need to use the original x values here just for visualizing the polynomial
x_poly = np.linspace(x_masked.min(), x_masked.max())
y_poly = np.polyval(coeff, x_poly)
plt.plot(x_poly, y_poly, 'r-', linewidth=3)

mid = len(x_poly) // 2
plt.annotate('y = {:.7g} x\xB2 + {:.7g} x + {:.7g}'.format(*coeff),
             (x_poly[mid], y_poly[mid]), (0, 48), textcoords='offset points',
             arrowprops={'arrowstyle': '->'}, horizontalalignment='center')

plt.subplot(2, 1, 2)
plt.title('detrended')

# we need the original x values here, so we can remove the trend from all points
trend = np.polyval(coeff, x)
# note that simply subtracting the trend might not be enough for other data sets
plt.plot(x, y - trend, 'b.')
fig.show()

Run Code Online (Sandbox Code Playgroud)

系数与另一种方法中的相同，这很好。如果我们错误地使用np.polyfit()，我们最终会得到与未屏蔽参考中相同的系数。

归档时间：	8 年，10 月前
查看次数：	1755 次
最近记录：	8 年，10 月前