我试图在双峰分布数据上拟合两个高斯,但大多数优化器总是根据开始猜测给出错误的结果,如下所示

我也尝试GMM从scikit-learn,这并没有太大的帮助.我想知道我可能做错了什么以及什么是更好的方法,以便我们可以测试和拟合双峰数据.使用curve_fit和数据的示例代码之一如下
import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
def gauss(x,mu,sigma,A):
return A*np.exp(-(x-mu)**2/2/sigma**2)
def bimodal(x,mu1,sigma1,A1,mu2,sigma2,A2):
return gauss(x,mu1,sigma1,A1)+gauss(x,mu2,sigma2,A2)
def rmse(p0):
mu1,sigma1,A1,mu2,sigma2,A2 =p0
y_sim = bimodal(x,mu1,sigma1,A1,mu2,sigma2,A2)
rms = np.sqrt((y-y_sim)**2/len(y))
data = pd.read_csv('data.csv')
x, y = data.index, data['24hr'].values
expected=(400,720,500,700,774,150)
params,cov=curve_fit(bimodal,x,y,expected, maxfev=100000)
sigma=np.sqrt(np.diag(cov))
plt.plot(x,bimodal(x,*params),color='red',lw=3,label='model')
plt.plot(x,y,label='data')
plt.legend()
print(params,'\n',sigma)
Run Code Online (Sandbox Code Playgroud) 我希望可视化使用scikit learn中的任何集合方法构建的回归树(gradientboosting regressor,random forest regressor,bagging regressor). 我已经看过这个问题了,这个问题 涉及分类树.但是这些问题需要"树"方法,这在SKLearn的回归模型中是不可用的.
但它似乎没有产生结果.我遇到了问题,因为.tree这些树的回归版本没有方法(该方法仅适用于分类版本).我想要一个类似于此的输出,但是基于sci kit学习构造的树.
我已经探索了与对象相关的方法,但却无法产生答案.
python machine-learning decision-tree random-forest scikit-learn
下面是我的代码。
我知道为什么在转换过程中会发生错误。这是因为在拟合和变换过程中要素列表不匹配。我该如何解决?我如何才能将其余所有功能都设为0?
之后,我想将其用于SGD分类器的部分拟合。
Jupyter QtConsole 4.3.1
Python 3.6.2 |Anaconda custom (64-bit)| (default, Sep 21 2017, 18:29:43)
Type 'copyright', 'credits' or 'license' for more information
IPython 6.1.0 -- An enhanced Interactive Python. Type '?' for help.
import pandas as pd
from sklearn.preprocessing import OneHotEncoder
input_df = pd.DataFrame(dict(fruit=['Apple', 'Orange', 'Pine'],
color=['Red', 'Orange','Green'],
is_sweet = [0,0,1],
country=['USA','India','Asia']))
input_df
Out[1]:
color country fruit is_sweet
0 Red USA Apple 0
1 Orange India Orange 0
2 Green Asia Pine 1
filtered_df = input_df.apply(pd.to_numeric, errors='ignore')
filtered_df.info() …Run Code Online (Sandbox Code Playgroud)python machine-learning pandas scikit-learn one-hot-encoding
在sklearn文档中,“规范”可以是
norm : ‘l1’, ‘l2’, or ‘max’, optional (‘l2’ by default)
The norm to use to normalize each non zero sample (or each non-zero feature if axis is 0).
Run Code Online (Sandbox Code Playgroud)
而且,我认真阅读了有关规范化的用户文档,但对于“ l1”,“ l2”或“ max”的含义仍然不太清楚。
谁能清除这些东西?
实现线性回归如下:
from sklearn.linear_model import LinearRegression
x = [1,2,3,4,5,6,7]
y = [1,2,1,3,2.5,2,5]
# Create linear regression object
regr = LinearRegression()
# Train the model using the training sets
regr.fit([x], [y])
# print(x)
regr.predict([[1, 2000, 3, 4, 5, 26, 7]])
Run Code Online (Sandbox Code Playgroud)
产生:
array([[1. , 2. , 1. , 3. , 2.5, 2. , 5. ]])
Run Code Online (Sandbox Code Playgroud)
在利用预测功能时,为什么不能利用单个x值来进行预测?
试 regr.predict([[2000]])
返回:
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
<ipython-input-3-3a8b477f5103> in <module>()
11
12 # print(x)
---> 13 regr.predict([[2000]])
/usr/local/lib/python3.6/dist-packages/sklearn/linear_model/base.py in predict(self, X)
254 Returns predicted …Run Code Online (Sandbox Code Playgroud) 在使用keras时,我了解到使用包装器会对keras和scikit学习api请求产生不利影响。我对同时拥有这两种解决方案感兴趣。
变体1:scikit包装
from keras.wrappers.scikit_learn import KerasClassifier
def model():
model = Sequential()
model.add(Dense(10, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
return model
estimator = KerasClassifier(build_fn=model, epochs=100, batch_size=5)
model.fit(X, y)
Run Code Online (Sandbox Code Playgroud)
->这使我可以打印scikit命令,例如precision_score()或category_report()。但是,model.summary()不起作用:
AttributeError:“ KerasClassifier”对象没有属性“ summary”
形式2:无包装
model = Sequential()
model.add(Dense(10, input_dim=4, activation='relu'))
model.add(Dense(3, activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=100, batch_size=5)
Run Code Online (Sandbox Code Playgroud)
->这使我可以打印model.summary()而不是scikit命令。
ValueError:不允许使用y的混合类型,类型为{'multiclass','multilabel-indicator'}
有办法同时使用两者吗?
x_tu = data_cls_tu.iloc[:,1:].values
y_tu = data_cls_tu.iloc[:,0].values
classifier = DecisionTreeClassifier()
parameters = [{"max_depth": [3,None],
"min_samples_leaf": np.random.randint(1,9),
"criterion": ["gini","entropy"]}]
randomcv = RandomizedSearchCV(estimator=classifier, param_distributions=parameters,
scoring='accuracy', cv=10, n_jobs=-1,
random_state=0)
randomcv.fit(x_tu, y_tu)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
<ipython-input-17-fa8376cb54b8> in <module>()
11 scoring='accuracy', cv=10, n_jobs=-1,
12 random_state=0)
---> 13 randomcv.fit(x_tu, y_tu)
~\Anaconda3\lib\site-packages\sklearn\model_selection\_search.py in fit(self, X, y, groups, **fit_params)
616 n_splits = cv.get_n_splits(X, y, groups)
617 # Regenerate parameter iterable for each fit
--> 618 candidate_params = list(self._get_param_iterator())
619 n_candidates = len(candidate_params)
620 if self.verbose …Run Code Online (Sandbox Code Playgroud) machine-learning scikit-learn data-science sklearn-pandas jupyter-notebook
我正在运行多标签分类1的[代码]。如何修复未定义“ X_train”的NameError。下面给出了python代码。
import scipy
from scipy.io import arff
data, meta = scipy.io.arff.loadarff('./yeast/yeast-train.arff')
from sklearn.datasets import make_multilabel_classification
# this will generate a random multi-label dataset
X, y = make_multilabel_classification(sparse = True, n_labels = 20,
return_indicator = 'sparse', allow_unlabeled = False)
# using binary relevance
from skmultilearn.problem_transform import BinaryRelevance
from sklearn.naive_bayes import GaussianNB
# initialize binary relevance multi-label classifier
# with a gaussian naive bayes base classifier
classifier = BinaryRelevance(GaussianNB())
# train
classifier.fit(X_train, y_train)
# predict
predictions = classifier.predict(X_test)
from …Run Code Online (Sandbox Code Playgroud) python machine-learning scikit-learn multilabel-classification scikit-multilearn
我正在尝试进行分类,其中一个文件完全是培训,另一个文件完全是测试。这是可能的?我试过了:
import pandas
import numpy as np
import pandas as pd
from sklearn import metrics
from sklearn import cross_validation
from sklearn.pipeline import Pipeline
from sklearn.metrics import confusion_matrix
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import cross_val_score
from sklearn.model_selection import train_test_split
from sklearn.metrics import precision_recall_fscore_support as score
from sklearn.feature_extraction.text import TfidfVectorizer, HashingVectorizer, CountVectorizer, TfidfTransformer
from sklearn.metrics import precision_score, recall_score, confusion_matrix, classification_report, accuracy_score, f1_score
#csv file from train
df = pd.read_csv('data_train.csv', sep = ',')
#csv file from test
df_test = pd.read_csv('data_test.csv', sep = …Run Code Online (Sandbox Code Playgroud) python machine-learning python-3.x scikit-learn text-classification
我想确保我的机器学习的操作顺序是正确的:
from sklearn import datasets
from sklearn.ensemble import RandomForestClassifier
from sklearn.feature_selection import SelectFromModel
from sklearn.grid_search import GridSearchCV
# 1. Initialize model
model = RandomForestClassifier(5000)
# 2. Load dataset
iris = datasets.load_iris()
X, y = iris.data, iris.target
# 3. Remove unimportant features
model = SelectFromModel(model, threshold=0.5).estimator
# 4. cross validate model on the important features
k_fold = KFold(n=len(data), n_folds=10, shuffle=True)
for k, (train, test) in enumerate(k_fold):
self.model.fit(data[train], target[train])
# 5. grid search for best parameters
param_grid = {
'n_estimators': [1000, 2500, …Run Code Online (Sandbox Code Playgroud) scikit-learn ×10
python ×9
pandas ×2
python-3.x ×2
data-science ×1
keras ×1
regression ×1
scipy ×1
summary ×1
wrapper ×1