我有一个多类分类任务.当我基于scikit示例运行我的脚本时如下:
classifier = OneVsRestClassifier(GradientBoostingClassifier(n_estimators=70, max_depth=3, learning_rate=.02))
y_pred = classifier.fit(X_train, y_train).predict(X_test)
cnf_matrix = confusion_matrix(y_test, y_pred)
Run Code Online (Sandbox Code Playgroud)
我收到此错误:
File "C:\ProgramData\Anaconda2\lib\site-packages\sklearn\metrics\classification.py", line 242, in confusion_matrix
raise ValueError("%s is not supported" % y_type)
ValueError: multilabel-indicator is not supported
Run Code Online (Sandbox Code Playgroud)
我试图传递labels=classifier.classes_
给confusion_matrix()
,但它没有帮助.
y_test和y_pred如下:
y_test =
array([[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0],
[0, 1, 0, 0, 0, 0],
...,
[0, 0, 0, 0, 0, 1],
[0, 0, 0, 1, 0, 0],
[0, 0, 0, 0, 1, 0]]) …
Run Code Online (Sandbox Code Playgroud) 我想为线性模型计算AIC以比较它们的复杂性.我做了如下:
regr = linear_model.LinearRegression()
regr.fit(X, y)
aic_intercept_slope = aic(y, regr.coef_[0] * X.as_matrix() + regr.intercept_, k=1)
def aic(y, y_pred, k):
resid = y - y_pred.ravel()
sse = sum(resid ** 2)
AIC = 2*k - 2*np.log(sse)
return AIC
Run Code Online (Sandbox Code Playgroud)
但是我收到一个divide by zero encountered in log
错误.
我已经看到了用于聚类的高斯混合的 Scikit-Learn 示例。在这个例子(以及这个模型的其他例子)中,看起来数据总是有两个维度:
plt.scatter(X[:, 0], X[:, 1], s=10, color=colors[y_pred])
Run Code Online (Sandbox Code Playgroud)
我使用以下脚本( X 实际上有超过 2 列),
clf = mixture.GaussianMixture(n_components=2, covariance_type='full')
clf.fit(X)
y_pred = clf.predict(X)
colors = np.array(list(islice(cycle(['#377eb8', '#ff7f00', '#4daf4a',
'#f781bf', '#a65628', '#984ea3',
'#999999', '#e41a1c', '#dede00']),
int(max(y_pred) + 1))))
plt.scatter(X[:, 0], X[:, 1], s=10, color=colors[y_pred])
plt.show()
Run Code Online (Sandbox Code Playgroud)
如何可视化集群和数据点?
基于这个问题在此处输入链接描述,我使用 statsmodels 在 python 中实现方差分析。我的数据在 Pandas DataFrame 中并且country
是一个分类变量。
def anova(data):
mod = ols('C(country) ~ playerRank+playerGames', data=data).fit()
aov_table = sm.stats.anova_lm(mod, typ=2)
print aov_table
Run Code Online (Sandbox Code Playgroud)
当我使用上述功能时,它显示:
File "<ipython-input-32-e77ae8a55692>", line 1, in <module>
aov_table = sm.stats.anova_lm(mod, typ=2)
File "C:\ProgramData\Anaconda2\lib\site-packages\statsmodels\stats\anova.py", line 326, in anova_lm
return anova_single(model, **kwargs)
File "C:\ProgramData\Anaconda2\lib\site-packages\statsmodels\stats\anova.py", line 83, in anova_single
robust)
File "C:\ProgramData\Anaconda2\lib\site-packages\statsmodels\stats\anova.py", line 178, in anova2_lm_single
cov = _get_covariance(model, None)
File "C:\ProgramData\Anaconda2\lib\site-packages\statsmodels\stats\anova.py", line 15, in _get_covariance
return model.cov_params()
File "C:\ProgramData\Anaconda2\lib\site-packages\statsmodels\base\wrapper.py", line 95, in wrapper
obj = data.wrap_output(func(results, …
Run Code Online (Sandbox Code Playgroud) 在这里我问了如何在线性模型中计算 AIC。如果我用LinearRegression()
方法替换方法linear_model.OLS
以获得 AIC,那么如何计算 OLS 线性模型的斜率和截距?
import statsmodels.formula.api as smf
regr = smf.OLS(y, X, hasconst=True).fit()
Run Code Online (Sandbox Code Playgroud) min_samples_split must be at least 2 or in (0, 1], got 1
我定义了一个二元分类器,如下所示:我用“gbc”方法(梯度提升分类器)调用它,并在最后一行中得到错误。featuresClasses 是一个数据框,featureLabels 是特征列表。
Binary_classifier(method, featureLabels, featuresClasses):
membershipIds = list(set(featuresClasses['membershipId']))
n_membershipIds = len(membershipIds)
index_rand = np.random.permutation(n_membershipIds)
test_size = int(0.3 * n_membershipIds)
membershipIds_test = list(itemgetter(*index_rand[:test_size])(membershipIds))
membershipIds_train = list(itemgetter(*index_rand[test_size+1:])(membershipIds))
data_test = featuresClasses[featuresClasses['membershipId'].isin(membershipIds_test)]
data_train = featuresClasses[featuresClasses['membershipId'].isin(membershipIds_train)]
data_test = data_test[data_test['standing'].isin([0, 1])]
data_train = data_train[data_train['standing'].isin([0, 1])]
X_test = data_test[featureLabels].as_matrix()
y_test = data_test['standing'].values.astype(int)
X_train = data_train[featureLabels].as_matrix()
y_train = data_train['standing'].values.astype(int)
# -------------------------- Run classifier
print 'Binary classification by', method
if method == 'svm':
classifier = svm.SVC(kernel='linear', …
Run Code Online (Sandbox Code Playgroud)