我在Ubuntu 14上工作.我安装了python3和pip3.当我尝试使用pip3时,我有这个错误
Traceback (most recent call last):
File "/usr/local/bin/pip3", line 6, in <module>
from pkg_resources import load_entry_point
File "/usr/local/lib/python3.5/dist-packages/pkg_resources/__init__.py", line 70, i
n <module>
import packaging.version
ImportError: No module named 'packaging'
Run Code Online (Sandbox Code Playgroud)
有人知道这是什么问题吗?
非常感谢
我有一个要求,需要将数据帧列的行转换为列,但是在 GROUPBY 之后我遇到了问题。下面是一组 3 个用户,其类型可以在 type1 到 type6 之间。
user_id1 type4
user_id1 type6
user_id1 type1
user_id1 type2
user_id1 type1
user_id1 type6
user_id2 type1
user_id2 type2
user_id2 type2
user_id2 type1
user_id2 type3
user_id2 type4
user_id2 type5
user_id2 type6
user_id2 type2
user_id2 type6
user_id3 type1
user_id3 type2
user_id3 type3
user_id3 type2
Run Code Online (Sandbox Code Playgroud)
我期望的输出是 -
user_id type1 type2 type3 type4 type5 type6
user_id1 2 1 0 1 0 2
user_id2 2 3 1 1 1 2
user_id3 1 2 1 0 0 0
Run Code Online (Sandbox Code Playgroud)
我尝试对类型进行 …
sklearn在我的计算机上已经工作半年了,我已经停止使用它了,现在却没有。我在program.py中的import语句遇到问题:
from sklearn import tree
Run Code Online (Sandbox Code Playgroud)
看起来真的很乱:
Traceback (most recent call last): File "E:/DecisionModel.py", line 1, in <module>
from sklearn import tree File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\__init__.py", line 57, in <module>
from .base import clone File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\base.py", line 12, in <module>
from .utils.fixes import signature File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\utils\__init__.py", line 11, in <module>
from .validation import (as_float_array, File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\utils\validation.py", line 18, in <module>
from ..utils.fixes import signature File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\sklearn\utils\fixes.py", line 403, in <module>
from scipy.stats import rankdata File "C:\Users\AppData\Local\Programs\Python\Python35\lib\site-packages\scipy\stats\__init__.py", line 344, in <module>
from .stats import …Run Code Online (Sandbox Code Playgroud) 我正在尝试使用 R(版本 3.3.3)和 h2o 中的深度学习(版本 3.10.5.1)构建堆叠集成模型来预测商家流失。响应变量是二进制的。目前,我正在尝试运行代码以使用网格搜索开发的前 5 个模型构建堆叠集成模型。但是,当代码运行时,我得到 java.lang.NullPointerException 错误,输出如下:
java.lang.NullPointerException
at hex.StackedEnsembleModel.checkAndInheritModelProperties(StackedEnsembleModel.java:265)
at hex.ensemble.StackedEnsemble$StackedEnsembleDriver.computeImpl(StackedEnsemble.java:115)
at hex.ModelBuilder$Driver.compute2(ModelBuilder.java:173)
at water.H2O$H2OCountedCompleter.compute(H2O.java:1349)
at jsr166y.CountedCompleter.exec(CountedCompleter.java:468)
at jsr166y.ForkJoinTask.doExec(ForkJoinTask.java:263)
at jsr166y.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:974)
at jsr166y.ForkJoinPool.runWorker(ForkJoinPool.java:1477)
at jsr166y.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:104)
Run Code Online (Sandbox Code Playgroud)
下面是我用来进行超参数网格搜索和构建集成模型的代码:
hyper_params <- list(
activation=c("Rectifier","Tanh","Maxout","RectifierWithDropout","TanhWithDropout","MaxoutWithDropout"),
hidden=list(c(50,50),c(30,30,30),c(32,32,32,32,32),c(64,64,64,64,64),c(100,100,100,100,100)),
input_dropout_ratio=seq(0,0.2,0.05),
l1=seq(0,1e-4,1e-6),
l2=seq(0,1e-4,1e-6),
rho = c(0.9,0.95,0.99,0.999),
epsilon=c(1e-10,1e-09,1e-08,1e-07,1e-06,1e-05,1e-04)
)
search_criteria <- list(
strategy = "RandomDiscrete",
max_runtime_secs = 3600,
max_models = 100,
seed=1234,
stopping_metric="misclassification",
stopping_tolerance=0.01,
stopping_rounds=5
)
dl_ensemble_grid <- h2o.grid(
hyper_params = hyper_params,
search_criteria = search_criteria,
algorithm="deeplearning",
grid_id = "final_grid_ensemble_dl",
x=predictors,
y=response,
training_frame = h2o.rbind(train, valid, test), …Run Code Online (Sandbox Code Playgroud) 我正在努力从我的RandomForestRegressor中提取功能的重要性,我得到了:
AttributeError:“ GridSearchCV”对象没有属性“ feature_importances_”。
有人知道为什么没有属性吗?根据文档,应该存在此属性?
完整代码:
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import GridSearchCV
#Running a RandomForestRegressor GridSearchCV to tune the model.
parameter_candidates = {
'n_estimators' : [650, 700, 750, 800],
'min_samples_leaf' : [1, 2, 3],
'max_depth' : [10, 11, 12],
'min_samples_split' : [2, 3, 4, 5, 6]
}
RFR_regr = RandomForestRegressor()
CV_RFR_regr = GridSearchCV(estimator=RFR_regr, param_grid=parameter_candidates, n_jobs=5, verbose=2)
CV_RFR_regr.fit(X_train, y_train)
#Predict with testing set
y_pred = CV_RFR_regr.predict(X_test)
#Extract feature importances
importances = CV_RFR_regr.feature_importances_
Run Code Online (Sandbox Code Playgroud) python feature-extraction random-forest scikit-learn grid-search
我使用 sklearn 应用带有 K-fold 的决策树,有人可以帮助我显示它的平均分数。下面是我的代码:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import KFold
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import confusion_matrix,classification_report
dta=pd.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/blood-transfusion/transfusion.data")
X=dta.drop("whether he/she donated blood in March 2007",axis=1)
X=X.values # convert dataframe to numpy array
y=dta["whether he/she donated blood in March 2007"]
y=y.values # convert dataframe to numpy array
kf = KFold(n_splits=10)
KFold(n_splits=10, random_state=None, shuffle=False)
clf_tree=DecisionTreeClassifier()
for train_index, test_index in kf.split(X):
X_train, X_test = X[train_index], X[test_index]
y_train, y_test = y[train_index], y[test_index] …Run Code Online (Sandbox Code Playgroud) 对来自 Keras 模型的 Multiclass 输出使用自定义评分会为 cross_val_score 或 GridSearchCV 返回相同的错误,如下所示(它在 Iris 上,因此您可以直接运行它进行测试):
import numpy as np
from sklearn import datasets
from sklearn.model_selection import train_test_split, cross_val_score, GridSearchCV
from keras.models import Sequential
from keras.layers import Dense
from keras.utils import to_categorical
from keras.wrappers.scikit_learn import KerasClassifier
iris = datasets.load_iris()
X= iris.data
Y = to_categorical(iris.target)
X_train, X_test, Y_train, Y_test = train_test_split(X, Y, train_size=0.8, random_state=1000)
def create_model(optimizer='rmsprop'):
model = Sequential()
model.add(Dense(8,activation='relu',input_shape = (4,)))
model.add(Dense(3,activation='softmax'))
model.compile(optimizer = optimizer,
loss='categorical_crossentropy',
metrics=['accuracy'])
return model
model = KerasClassifier(build_fn=create_model,
epochs=10,
batch_size=5,
verbose=0) …Run Code Online (Sandbox Code Playgroud) 假设我们有一个 Pandas 数据框和一个 scikit-learn 模型,并使用该数据框进行训练(拟合)。有没有办法进行逐行预测?用例是使用 sklearn 模型使用预测函数填充数据框中的空值。
我预计这可以使用 pandas apply 函数(轴=1)实现,但我不断收到维度错误。
使用 Pandas 版本“0.22.0”和 sklearn 版本“0.19.1”。
简单的例子:
import pandas as pd
from sklearn.cluster import kmeans
data = [[x,y,x*y] for x in range(1,10) for y in range(10,15)]
df = pd.DataFrame(data,columns=['input1','input2','output'])
model = kmeans()
model.fit(df[['input1','input2']],df['output'])
df['predictions'] = df[['input1','input2']].apply(model.predict,axis=1)
Run Code Online (Sandbox Code Playgroud)
由此产生的维数误差:
ValueError: ('Expected 2D array, got 1D array instead:\narray=[ 1.
10.].\nReshape your data either using array.reshape(-1, 1) if your data has
a single feature or array.reshape(1, -1) if it contains a single sample.',
'occurred …Run Code Online (Sandbox Code Playgroud) 问题:
在训练过程中,我的模型的性能看起来相当不错。然而,sklearn 的classification_report 的结果几乎所有地方的精度、召回率和 f1 都为零。我做错了什么导致训练性能和推理之间如此不匹配?(我使用 Keras 和 TensorFlow 后端。)
我的代码:
我使用valiation_split参数生成两个生成器(训练、验证),如下所示:
train_datagen = ImageDataGenerator(
rescale=1. / 255, validation_split=0.15)
train_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical', subset="training")
validation_generator = train_datagen.flow_from_directory(
train_data_dir,
target_size=(img_height, img_width),
batch_size=batch_size,
class_mode='categorical', subset="validation", shuffle=False)
Run Code Online (Sandbox Code Playgroud)
我shuffle=False在validation_generator中进行设置,以确保它不会混合图像和标签的关系以供稍后的评估。
接下来,我像这样训练我的模型:
history = model.fit_generator(
train_generator,
steps_per_epoch=nb_train_samples // batch_size,
epochs=epochs,
validation_data=validation_generator,
validation_steps=nb_validation_samples // batch_size,
verbose=1)
Run Code Online (Sandbox Code Playgroud)
性能还可以:
Epoch 1/5
187/187 [==============================] - 44s 233ms/step - loss: 0.7835 - acc: 0.6744 - val_loss: 1.2918 - val_acc: 0.6079
Epoch 2/5
187/187 …Run Code Online (Sandbox Code Playgroud) 我想使用sklearn的MLPClassifier
mlp = MLPClassifier(hidden_layer_sizes=(50,), max_iter=10, alpha=1e-4,
solver='sgd', verbose=10, tol=1e-4, random_state=1,
learning_rate_init=.1)
Run Code Online (Sandbox Code Playgroud)
我没有找到损失函数的任何参数,我希望它是mean_squared_error。是否可以根据模型来确定?
scikit-learn ×7
python ×6
grid-search ×2
keras ×2
pandas ×2
python-3.x ×2
h2o ×1
java ×1
numpy ×1
pip ×1
r ×1
scoring ×1
tensorflow ×1
ubuntu ×1