标签: scikit-learn

将两个高斯拟合在较少表达的双峰数据上

我试图在双峰分布数据上拟合两个高斯,但大多数优化器总是根据开始猜测给出错误的结果,如下所示

我也尝试GMMscikit-learn,这并没有太大的帮助.我想知道我可能做错了什么以及什么是更好的方法,以便我们可以测试和拟合双峰数据.使用curve_fit数据的示例代码之一如下

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def gauss(x,mu,sigma,A):
    return A*np.exp(-(x-mu)**2/2/sigma**2)

def bimodal(x,mu1,sigma1,A1,mu2,sigma2,A2):
    return gauss(x,mu1,sigma1,A1)+gauss(x,mu2,sigma2,A2)

def rmse(p0):
    mu1,sigma1,A1,mu2,sigma2,A2 =p0
    y_sim = bimodal(x,mu1,sigma1,A1,mu2,sigma2,A2)
    rms = np.sqrt((y-y_sim)**2/len(y))

data = pd.read_csv('data.csv')
x, y = data.index, data['24hr'].values

expected=(400,720,500,700,774,150)

params,cov=curve_fit(bimodal,x,y,expected, maxfev=100000)
sigma=np.sqrt(np.diag(cov))
plt.plot(x,bimodal(x,*params),color='red',lw=3,label='model')
plt.plot(x,y,label='data')
plt.legend()
print(params,'\n',sigma)
Run Code Online (Sandbox Code Playgroud)

python curve-fitting scipy python-3.x scikit-learn

0
推荐指数
1
解决办法
1087
查看次数

如何在Python中可视化回归树

我希望可视化使用scikit learn中的任何集合方法构建的回归树(gradientboosting regressor,random forest regressor,bagging regressor). 我已经看过这个问题了,这个问题 涉及分类树.但是这些问题需要"树"方法,这在SKLearn的回归模型中是不可用的.

但它似乎没有产生结果.我遇到了问题,因为.tree这些树的回归版本没有方法(该方法仅适用于分类版本).我想要一个类似于的输出,但是基于sci kit学习构造的树.

我已经探索了与对象相关的方法,但却无法产生答案.

python machine-learning decision-tree random-forest scikit-learn

0
推荐指数
1
解决办法
4911
查看次数

Scikit学习OneHotEncoder拟合和变换错误:ValueError:X的形状与拟合期间不同

下面是我的代码。

我知道为什么在转换过程中会发生错误。这是因为在拟合和变换过程中要素列表不匹配。我该如何解决?我如何才能将其余所有功能都设为0?

之后,我想将其用于SGD分类器的部分拟合。

Jupyter QtConsole 4.3.1
Python 3.6.2 |Anaconda custom (64-bit)| (default, Sep 21 2017, 18:29:43) 
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.

import pandas as pd
from sklearn.preprocessing import OneHotEncoder

input_df = pd.DataFrame(dict(fruit=['Apple', 'Orange', 'Pine'], 
                             color=['Red', 'Orange','Green'],
                             is_sweet = [0,0,1],
                             country=['USA','India','Asia']))
input_df
Out[1]: 
    color country   fruit  is_sweet
0     Red     USA   Apple         0
1  Orange   India  Orange         0
2   Green    Asia    Pine         1



filtered_df = input_df.apply(pd.to_numeric, errors='ignore')
filtered_df.info() …
Run Code Online (Sandbox Code Playgroud)

python machine-learning pandas scikit-learn one-hot-encoding

0
推荐指数
1
解决办法
2581
查看次数

sklearn.preprocessing.normalize中的norm参数

sklearn文档中,“规范”可以是

norm : ‘l1’, ‘l2’, or ‘max’, optional (‘l2’ by default)

The norm to use to normalize each non zero sample (or each non-zero feature if axis is 0).
Run Code Online (Sandbox Code Playgroud)

而且,我认真阅读有关规范化的用户文档,但对于“ l1”,“ l2”或“ max”的含义仍然不太清楚。

谁能清除这些东西?

python machine-learning normalization scikit-learn

0
推荐指数
1
解决办法
3656
查看次数

线性回归的单一预测

实现线性回归如下:

from sklearn.linear_model import LinearRegression

x = [1,2,3,4,5,6,7]
y = [1,2,1,3,2.5,2,5]

# Create linear regression object
regr = LinearRegression()

# Train the model using the training sets
regr.fit([x], [y])

# print(x)
regr.predict([[1, 2000, 3, 4, 5, 26, 7]])
Run Code Online (Sandbox Code Playgroud)

产生:

array([[1. , 2. , 1. , 3. , 2.5, 2. , 5. ]])
Run Code Online (Sandbox Code Playgroud)

在利用预测功能时,为什么不能利用单个x值来进行预测?

regr.predict([[2000]])

返回:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
<ipython-input-3-3a8b477f5103> in <module>()
     11 
     12 # print(x)
---> 13 regr.predict([[2000]])

/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in predict(self, X)
    254             Returns predicted …
Run Code Online (Sandbox Code Playgroud)

python regression linear-regression scikit-learn

0
推荐指数
1
解决办法
2940
查看次数

keras:返回model.summary()与scikit学习包装器

在使用keras时,我了解到使用包装器会对keras和scikit学习api请求产生不利影响。我对同时拥有这两种解决方案感兴趣。

变体1:scikit包装

from keras.wrappers.scikit_learn import KerasClassifier

    def model():
        model = Sequential()
        model.add(Dense(10, input_dim=4, activation='relu'))
        model.add(Dense(3, activation='softmax'))
        model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
        return model

estimator = KerasClassifier(build_fn=model, epochs=100, batch_size=5)
model.fit(X, y)
Run Code Online (Sandbox Code Playgroud)

->这使我可以打印scikit命令,例如precision_score()或category_report()。但是,model.summary()不起作用:

AttributeError:“ KerasClassifier”对象没有属性“ summary”

形式2:无包装

model = Sequential()
model.add(Dense(10, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=100, batch_size=5)
Run Code Online (Sandbox Code Playgroud)

->这使我可以打印model.summary()而不是scikit命令。

ValueError:不允许使用y的混合类型,类型为{'multiclass','multilabel-indicator'}

有办法同时使用两者吗?

python summary wrapper scikit-learn keras

0
推荐指数
1
解决办法
1157
查看次数

我在RandomSearchCV中不断收到AttributeError

x_tu = data_cls_tu.iloc[:,1:].values
y_tu = data_cls_tu.iloc[:,0].values

classifier = DecisionTreeClassifier()
parameters = [{"max_depth": [3,None],
               "min_samples_leaf": np.random.randint(1,9),
               "criterion": ["gini","entropy"]}]
randomcv = RandomizedSearchCV(estimator=classifier, param_distributions=parameters,
                              scoring='accuracy', cv=10, n_jobs=-1,
                              random_state=0)
randomcv.fit(x_tu, y_tu)



---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-17-fa8376cb54b8> in <module>()
     11                               scoring='accuracy', cv=10, n_jobs=-1,
     12                               random_state=0)
---> 13 randomcv.fit(x_tu, y_tu)

~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
    616         n_splits = cv.get_n_splits(X, y, groups)
    617         # Regenerate parameter iterable for each fit
--> 618         candidate_params = list(self._get_param_iterator())
    619         n_candidates = len(candidate_params)
    620         if self.verbose …
Run Code Online (Sandbox Code Playgroud)

machine-learning scikit-learn data-science sklearn-pandas jupyter-notebook

0
推荐指数
1
解决办法
1014
查看次数

如何修复NameError:未定义名称'X_train'?

我正在运行多标签分类1的[代码]。如何修复未定义“ X_train”的NameError。下面给出了python代码。

import scipy
from scipy.io import arff
data, meta = scipy.io.arff.loadarff('./yeast/yeast-train.arff')
from sklearn.datasets import make_multilabel_classification

# this will generate a random multi-label dataset
X, y = make_multilabel_classification(sparse = True, n_labels = 20,
return_indicator = 'sparse', allow_unlabeled = False)

# using binary relevance
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.naive_bayes import GaussianNB

# initialize binary relevance multi-label classifier
# with a gaussian naive bayes base classifier
classifier = BinaryRelevance(GaussianNB())

# train
classifier.fit(X_train, y_train)

# predict
predictions = classifier.predict(X_test)

from …
Run Code Online (Sandbox Code Playgroud)

python machine-learning scikit-learn multilabel-classification scikit-multilearn

0
推荐指数
1
解决办法
3552
查看次数

使用一个完全受过培训的文件分类,另一个完全受过测试的文件分类

我正在尝试进行分类,其中一个文件完全是培训,另一个文件完全是测试。这是可能的?我试过了:

import pandas
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn import cross_validation
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer, HashingVectorizer, CountVectorizer, TfidfTransformer
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score

#csv file from train
df = pd.read_csv('data_train.csv', sep = ',')

#csv file from test
df_test = pd.read_csv('data_test.csv', sep = …
Run Code Online (Sandbox Code Playgroud)

python machine-learning python-3.x scikit-learn text-classification

0
推荐指数
1
解决办法
282
查看次数

确保在scikit中随机森林分类中的正确操作顺序学习

我想确保我的机器学习的操作顺序是正确的:

from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel
from sklearn.grid_search import GridSearchCV

# 1. Initialize model
model = RandomForestClassifier(5000)

# 2. Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target

# 3. Remove unimportant features
model = SelectFromModel(model, threshold=0.5).estimator

# 4. cross validate model on the important features
k_fold = KFold(n=len(data), n_folds=10, shuffle=True)
for k, (train, test) in enumerate(k_fold):
    self.model.fit(data[train], target[train])

# 5. grid search for best parameters
param_grid = {
    'n_estimators': [1000, 2500, …
Run Code Online (Sandbox Code Playgroud)

python pandas scikit-learn

-1
推荐指数
1
解决办法
737
查看次数