是否有一个术语可以找到近似曲线的最小 N 个点集？

Question

是否有一个术语可以找到近似曲线的最小 N 个点集？

Mor*_*ton 7 python math matlab terminology numpy

我花了一些时间回答如何离散连续函数以避免产生噪音（见图），在整个过程中，我感觉自己正在重新发明一辆自行车。

本质上，问题是：

给定一个曲线函数 - 对于任意一个x，您都可以获得y。
您想要使用具有精确点的分段线性函数来近似曲线N，基于一些误差度量，例如到曲线的距离，或者最小化曲线下面积的绝对差（感谢@QuangHoang指出这些是不同的）。

下面是我使用 20 个点近似得到的曲线示例：

问题：我已经使用重复二分法对其进行了编码。有我可以使用的图书馆吗？对于这种问题类型，是否有一个我未能通过谷歌搜索出来的好术语？这是否可以推广到更广泛的问题集？

编辑：根据要求，我是这样做的： Google Colab

数据：

import numpy as np
from scipy.signal import gaussian

N_MOCK = 2000

# A nice-ish mock distribution
xs = np.linspace(-10.0, 10.0, num=N_MOCK)
sigmoid = 1 / (1 + np.exp(-xs))
gauss = gaussian(N_MOCK, std=N_MOCK / 10)
ys = gauss - sigmoid + 1
xs += 10
xs /= 20

Run Code Online (Sandbox Code Playgroud)

绘图：

import matplotlib.pyplot as plt


def plot_graph(cont_time, cont_array, disc_time, disc_array, plot_name):
    """A simplified version of the provided plotting function"""
    
    # Setting Axis properties and titles
    fig, ax = plt.subplots(figsize=(20, 4))
    ax.set_title(plot_name)

    # Plotting stuff
    ax.plot(cont_time, cont_array, label="Continuous", color='#0000ff')
    ax.plot(disc_time, disc_array, label="Discrete",   color='#00ff00')

    fig.legend(loc="upper left", bbox_to_anchor=(0,1), bbox_transform=ax.transAxes)

Run Code Online (Sandbox Code Playgroud)

这是我解决这个问题的方法，但我希望有一个更标准的方法：

import warnings
warnings.simplefilter('ignore', np.RankWarning)


def line_error(x0, y0, x1, y1, ideal_line, integral_points=100):
    """Assume a straight line between (x0,y0)->(x1,p1). Then sample the perfect line multiple times and compute the distance."""
    straight_line = np.poly1d(np.polyfit([x0, x1], [y0, y1], 1))
    xs = np.linspace(x0, x1, num=integral_points)
    ys = straight_line(xs)

    perfect_ys = ideal_line(xs)
    
    err = np.abs(ys - perfect_ys).sum() / integral_points * (x1 - x0)  # Remove (x1 - x0) to only look at avg errors
    return err


def discretize_bisect(xs, ys, bin_count):
    """Returns xs and ys of discrete points"""
    # For a large number of datapoints, without loss of generality you can treat xs and ys as bin edges
    # If it gives bad results, you can edges in many ways, e.g. with np.polyline or np.histogram_bin_edges
    ideal_line = np.poly1d(np.polyfit(xs, ys, 50))
    
    new_xs = [xs[0], xs[-1]]
    new_ys = [ys[0], ys[-1]]
    
    while len(new_xs) < bin_count:
        
        errors = []
        for i in range(len(new_xs)-1):
            err = line_error(new_xs[i], new_ys[i], new_xs[i+1], new_ys[i+1], ideal_line)
            errors.append(err)

        max_segment_id = np.argmax(errors)
        new_x = (new_xs[max_segment_id] + new_xs[max_segment_id+1]) / 2
        new_y = ideal_line(new_x)
        new_xs.insert(max_segment_id+1, new_x)
        new_ys.insert(max_segment_id+1, new_y)

    return new_xs, new_ys

Run Code Online (Sandbox Code Playgroud)

跑步：

BIN_COUNT = 25

new_xs, new_ys = discretize_bisect(xs, ys, BIN_COUNT)

plot_graph(xs, ys, new_xs, new_ys, f"Discretized and Continuous comparison, N(cont) = {N_MOCK}, N(disc) = {BIN_COUNT}")
print("Bin count:", len(new_xs))

Run Code Online (Sandbox Code Playgroud)

注意：虽然我更喜欢numpy，但答案可以是任何语言的库，或者数学术语的名称。请不要编写大量代码，因为我自己已经这样做了:)

Answer 1

sar*_*ema 1

\n
对于这种问题类型，是否有一个我未能通过谷歌搜索出来的好术语？这是否可以推广到更广泛的问题集？
\n

\n

我知道这个问题是预期改进（EI）或贝叶斯优化（archive.org 上的永久链接）。给定一个昂贵的黑盒函数，您希望找到其全局最大值，该算法会生成检查该最大值的下一个位置。

\n

乍一看，这与您的问题不同。您正在寻找一种用少量样本来近似曲线的方法，而 EI 提供了函数最有可能出现最大值的位置。但这两个问题都是等价的，只要您用尽可能少的点来最小化误差函数（当您向近似值添加另一个样本时，误差函数就会改变）。

\n

我相信这是原始研究论文。

\n

\n
琼斯、唐纳德和舍劳、马蒂亚斯和韦尔奇、威廉。（1998）。昂贵的黑盒功能的高效全局优化。全局优化杂志。13. 455-492。10.1023/A：1008306431147。
\n

\n

从第 1 节开始：

\n

\n
[...]该技术通常需要所有竞争方法中最少的函数评估。这是可能的，因为对于典型的工程函数，\n通常无法在设计空间中的大距离上相当准确地进行插值和外推。直观上，该方法能够\xe2\x80\x98看到\xe2\x80\x99数据中明显的趋势或模式\xe2\x80\x98\xe2\x80\x98跳到结论\xe2\x80\x99而不必一步步移动-沿着某个轨迹前进。
\n

\n

至于为什么它有效：

\n

\n
[...]响应面方法提供了基于进一步搜索的预期改进的可靠停止规则。这样的停止规则是可能的，因为统计模型提供了未采样点 \xe2\x80\x93 处函数 \xe2\x80\x99s\n值的置信区间以及这些置信度的 \xe2\x80\x98reasonableness\xe2\x80\x99间隔\n可以通过模型验证技术进行检查。
\n

\n

归档时间：	4 年，2 月前
查看次数：	194 次
最近记录：	4 年前