在分类问题中,RF 分类器根据多数投票给出最终响应,例如对某个事件是或否。
另一方面,我还可以看到一个带有事件最终概率的向量,例如 0,83。如果我有 1000 个估计量,这个概率是如何计算的,是每棵树的 1000 个概率的平均值吗?
clf = RandomForestClassifier(max_depth = 4, min_samples_split=2, n_estimators = 200, random_state = 1)
clf.fit(train[columns], train["churn"])
predictions = clf.predict(test[columns])
predicted_probs = clf.predict_proba(test[columns])
print(predicted_probs)
test = pd.concat([test, pd.DataFrame(predicted_probs, columns=['Col_0', 'Col_1'])], axis=1)
Run Code Online (Sandbox Code Playgroud) python classification probability random-forest scikit-learn
以下是模型数据集Naive Bayes Classifier训练的代码。我想通过考虑模型来训练和分析其性能。我们怎样才能做到呢。movie_reviewsunigrambigramtrigram
import nltk.classify.util
from nltk.classify import NaiveBayesClassifier
from nltk.corpus import movie_reviews
from nltk.corpus import stopwords
from nltk.tokenize import word_tokenize
def create_word_features(words):
useful_words = [word for word in words if word not in stopwords.words("english")]
my_dict = dict([(word, True) for word in useful_words])
return my_dict
pos_data = []
for fileid in movie_reviews.fileids('pos'):
words = movie_reviews.words(fileid)
pos_data.append((create_word_features(words), "positive"))
neg_data = []
for fileid in movie_reviews.fileids('neg'):
words = movie_reviews.words(fileid)
neg_data.append((create_word_features(words), "negative"))
train_set = pos_data[:800] + neg_data[:800] …Run Code Online (Sandbox Code Playgroud) 我有两个变量 X 和 Y。
X 的结构(即 np.array):
[[26777 24918 26821 ... -1 -1 -1]
[26777 26831 26832 ... -1 -1 -1]
[26777 24918 26821 ... -1 -1 -1]
...
[26811 26832 26813 ... -1 -1 -1]
[26830 26831 26832 ... -1 -1 -1]
[26830 26831 26832 ... -1 -1 -1]]
Run Code Online (Sandbox Code Playgroud)
Y的结构:
[[1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [1252, 26777, 26831], [25197, 26777, 26781], [25197, 26777, 26781], [25197, 26777, 26781], [26764, 25803, …Run Code Online (Sandbox Code Playgroud) 所以,我正在尝试使用 7 个面部表情制作一个情绪分类器。我知道为了使用整数标签而不是 0 和 1,需要使用稀疏分类交叉熵,并且需要将输出层激活设置为 softmax,但它没有按预期工作。
我正在使用这里的数据集https://www.kaggle.com/ashishpatel26/facial-express-recognitionferchallenge
import pandas as pd
import numpy as np
from PIL import Image
import random
import matplotlib.pyplot as plt
from keras.models import Sequential
from keras.optimizers import RMSprop
from keras.layers import Conv1D, MaxPooling1D
from keras.layers import Activation, Dropout, Flatten, Dense
emotion = {0 : 'Angry', 1 : 'Disgust',2 : 'Fear',3 : 'Happy',
4 : 'Sad',5 : 'Surprise',6 : 'Neutral'}
df=pd.read_csv('fer.csv')
faces=df.values[0:500,1]
faces=faces.tolist()
emos=df.values[0:500,0]
for i in range(len(faces)):
faces[i]=[int(x) for x in faces[i].split()] …Run Code Online (Sandbox Code Playgroud) 下面是使用 pytorch 为两个回归任务构建 DNN 的示例代码。该forward函数返回两个输出 (x1, x2)。用于大量回归/分类任务的网络怎么样?例如,100 或 1000 个输出。对所有输出(例如,x1、x2、...、x100)进行硬编码绝对不是一个好主意。有一个简单的方法可以做到这一点吗?谢谢。
import torch
from torch import nn
import torch.nn.functional as F
class mynet(nn.Module):
def __init__(self):
super(mynet, self).__init__()
self.lin1 = nn.Linear(5, 10)
self.lin2 = nn.Linear(10, 3)
self.lin3 = nn.Linear(10, 4)
def forward(self, x):
x = self.lin1(x)
x1 = self.lin2(x)
x2 = self.lin3(x)
return x1, x2
if __name__ == '__main__':
x = torch.randn(1000, 5)
y1 = torch.randn(1000, 3)
y2 = torch.randn(1000, 4)
model = mynet()
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=1e-4)
for …Run Code Online (Sandbox Code Playgroud) 我正在分析大量图像并提取主要颜色代码。
我想将它们分组为通用颜色名称范围,例如绿色、深绿色、浅绿色、蓝色、深蓝色、浅蓝色等。
我正在寻找一种与语言无关的方式来自己实现一些东西,如果有我可以研究的例子来实现这一点,我将非常感激。
以下代码使用随机森林模型为我提供一个显示特征重要性的图表:
from sklearn.feature_selection import SelectFromModel
import matplotlib
clf = RandomForestClassifier()
clf = clf.fit(X_train,y_train)
clf.feature_importances_
model = SelectFromModel(clf, prefit=True)
test_X_new = model.transform(X_test)
matplotlib.rc('figure', figsize=[5,5])
plt.style.use('ggplot')
feat_importances = pd.Series(clf.feature_importances_, index=X_test.columns)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')
Run Code Online (Sandbox Code Playgroud)
然而,我需要对逻辑回归模型做同样的事情。以下代码会产生错误:
from sklearn.feature_selection import SelectFromModel
import matplotlib
clf = LogisticRegression()
clf = clf.fit(X_train,y_train)
clf.feature_importances_
model = SelectFromModel(clf, prefit=True)
test_X_new = model.transform(X_test)
matplotlib.rc('figure', figsize=[5,5])
plt.style.use('ggplot')
feat_importances = pd.Series(clf.feature_importances_, index=X_test.columns)
feat_importances.nlargest(20).plot(kind='barh',title = 'Feature Importance')
Run Code Online (Sandbox Code Playgroud)
我明白了
AttributeError: 'LogisticRegression' object has no attribute 'feature_importances_'
Run Code Online (Sandbox Code Playgroud)
有人可以帮助我哪里出错了吗?
我试过:
from lazypredict.Supervised import LazyClassifier
Run Code Online (Sandbox Code Playgroud)
但得到以下回溯:
---------------------------------------------------------------------------
ImportError Traceback (most recent call last)
<ipython-input-1-f518cae57501> in <module>
10 from sklearn.linear_model import LogisticRegression
11 from sklearn.ensemble import RandomForestClassifier
---> 12 from lazypredict.Supervised import LazyClassifier
13 from sklearn.model_selection import GridSearchCV
14 from sklearn.metrics import accuracy_score
~\AppData\Roaming\Python\Python38\site-packages\lazypredict\Supervised.py in <module>
14 from sklearn.preprocessing import StandardScaler, OneHotEncoder, OrdinalEncoder
15 from sklearn.compose import ColumnTransformer
---> 16 from sklearn.utils.testing import all_estimators
17 from sklearn.base import RegressorMixin
18 from sklearn.base import ClassifierMixin
S:\anaconda\lib\site-packages\sklearn\utils\testing.py in <module>
5 from . import …Run Code Online (Sandbox Code Playgroud) True然后,我在训练 LightGBM 模型时使用“is_unbalance”参数,将其设置为。下图显示了我如何使用此参数。
我的问题是:
is_unbalance正确吗?scale_pos_weight代替is_unbalance?谢谢!
我需要帮助来确定合适的激活功能.我训练我的神经网络来检测钢琴音符.所以在这种情况下我只能有一个输出.音符在那里(1)或音符不存在(0).假设我引入了一个0.5的阈值,并说如果输出大于0.5,则存在所需的音符,如果小于0.5则不存在音符,我可以使用什么类型的激活功能.我认为它应该是硬限制,但我想知道是否也可以使用sigmoid.
signal-processing classification machine-learning neural-network
classification ×10
python ×6
scikit-learn ×4
boosting ×1
keras ×1
knn ×1
lightgbm ×1
nlp ×1
nltk ×1
probability ×1
pytorch ×1
regression ×1
tensorflow ×1