将高斯混合模型拟合到单个特征数据的正确方法是什么？

Question

将高斯混合模型拟合到单个特征数据的正确方法是什么？

Mic*_*alm 0 python machine-learning reshape scikit-learn mixture-model

data 是一维数据数组。

data = [0.0, 7000.0, 0.0, 7000.0, -400.0, 0.0, 7000.0, -400.0, -7400.0, 7000.0, -400.0, -7000.0, -7000.0, 0.0, 0.0, 0.0, -7000.0, 7000.0, 7000.0, 7000.0, 0.0, -7000.0, 6600.0, -7400.0, -400.0, 6600.0, -400.0, -400.0, 6600.0, 6600.0, 6600.0, 7000.0, 6600.0, -7000.0, 0.0, 0.0, -7000.0, -7400.0, 6600.0, -400.0, 7000.0, -7000.0, -7000.0, 0.0, 0.0, -400.0, -7000.0, -7000.0, 7000.0, 7000.0, 0.0, -7000.0, 0.0, 0.0, 6600.0, 6600.0, 6600.0, -7400.0, -400.0, -2000.0, -7000.0, -400.0, -7400.0, 7000.0, 0.0, -7000.0, -7000.0, 0.0, -400.0, -7400.0, -7400.0, 0.0, 0.0, 0.0, -400.0, -400.0, -400.0, -400.0, 6600.0, 0.0, -400.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, -400.0, -400.0, 0.0, 0.0, -400.0, -400.0, 0.0, -400.0, 0.0, -400.0]

Run Code Online (Sandbox Code Playgroud)

我想将一些高斯曲线拟合到这些数据中并绘制它们。

如果我跑

import numpy as np
from sklearn import mixture

x = np.array(data)
clf = mixture.GaussianMixture(n_components=2, covariance_type='full')
clf.fit(x)

Run Code Online (Sandbox Code Playgroud)

我收到错误

ValueError: Expected n_samples >= n_components but got n_components = 2, n_samples = 1

Run Code Online (Sandbox Code Playgroud)

和

DeprecationWarning: Passing 1d arrays as data is deprecated in 0.17 and will raise ValueError in 0.19. Reshape your data either using X.reshape(-1, 1) if your data has a single feature or X.reshape(1, -1) if it contains a single sample.

Run Code Online (Sandbox Code Playgroud)

好吧...我可以忍受这个。警告告诉我该怎么做。但是，如果我跑

x = np.array(data).reshape(-1,1)
clf = mixture.GaussianMixture(n_components=2, covariance_type='full')
clf.fit(x)

Run Code Online (Sandbox Code Playgroud)

我收到错误

ValueError: Expected the input data X have 1 features, but got 32000 features

Run Code Online (Sandbox Code Playgroud)

我究竟做错了什么？什么是正确的方法？

编辑：

我刚刚意识到我误读了错误消息。不是fit()下雨错误，而是score_samples()。

之后我试图绘制高斯图。

x = np.linspace(-8000,8000,32000)
y = clf.score_samples(x)

plt.plot(x, y)
plt.show()

Run Code Online (Sandbox Code Playgroud)

所以x似乎是问题所在。但是，两者都没有x.reshape(-1,1)帮助，也没有x.reshape(1,-1)。

Answer 1

Mic*_*alm 5

我自己发现了错误。正如我在编辑中所述，不是fit()引发错误，而是score_samples().

这两个函数都需要一个多维数组。

工作代码：

data = np.array(data).reshape(-1,1)
clf = mixture.GaussianMixture(n_components=1, covariance_type='full')
clf.fit(data)

x = np.array(np.linspace(-8000,8000,32000)).reshape(-1,1)
y = clf.score_samples(x)

plt.plot(x, y)
plt.show()

Run Code Online (Sandbox Code Playgroud)

归档时间：	8 年，11 月前
查看次数：	4291 次
最近记录：	8 年，11 月前